Conf42 Machine Learning 2021 - Online

Serverless Deep Learning

Video size:


Would you like to run inference in the cloud using automatic scaling, built-in high availability, and a pay-for-value billing model? In this talk you are going to see how to bundle your ML model to run serverless inference in response to events and where it’s suitable to do so.


  • AWS senior solution architect Nicola Pietroluongo talks about how to deploy your machine learning model as a serverless API. Using the AWS serverless application model or AWS SAM, we create a machine learning container running on AWS Lambda exposed via Amazon API gateway.
  • AWS Lambda lets you run code with zero administration. With the pay as you go model, you don't have to pay for unused server time. Before getting into serverless machine learning solutions, you need to carefully validate your use case and define clear KPIs.


This transcript was autogenerated. To make changes, submit a PR.
Hi, I'm Nicola Pietroluongo, AWS senior solution architect and one of my responsibilities to help customers innovate and unlock new possibilities with machine learning. Today I'm going to talk about how to deploy your machine learning model as a serverless API. Let's start to say that only a small fraction of real world machine learning systems is composed of the machine learning code. There is a vast and sometimes complex infrastructure surrounding a machine learning workload with its own challenges. What if we can minimize or remove some concerns? And I'm talking about issues around model deployment and serving the part when you have done all your work of creating a machine learning model and you are going to face questions as where to host your model to serve predictions at scale cost effectively, and how to do that in the easiest way. The answer to where to host is AWS Lambda, a serverless compute service that lets you run code without provisioning or managing servers. AWS Lambda has many benefits as subsequent automatic scaling, high availability, and a bay as you go model. But more importantly for this session, it supports function deployed as container images. In other words, you can bundle a machine learning model and its code in a container image and deploy as a serverless API. Now that we know the where, let's see how to deploy a container on lambda. The answer to how is to use the AWS serverless application model or AWS SAM. It is an open source framework that you can use to build serverless application on AWS. It provides a single deployment configuration, built in, best practices, local debugging, testing, and more. AWS SAM allows you to define your serverless application using a template. Sum template are an extension of AWS cloud formation templates. AWS Cloudformation is a service that allow you to provision infrastructure as a code. Here you can see a sample template composed of three blocks. The first block instructs cloudformation to perform serverless transformation. The central block creates a lambda function connected with Amazon API gateway with code and all necessary permission. API Gateway is a fully managed service that handles all the tasks involved in accepting and processing API calls. Last block helps AWS SAM to manage the container images. To summarize, with few lines of code, this template creates all we need to run a serverless machine learning container. It exposes a public endpoint we can call using a post request to run inferences. We don't need to worry about all the configurations, role and permissions needed by those resources to run. SAM will help us on that. This is effectively what we're using to create in this talk, but without writing one line of code. Because AWS SAM provides comma line tool the SAM CLI, they make it easy to create, manage and test serverless applications, and it's available for Linux, Windows and macOS. You can use the CLI to build, validate and test serverless application and integrate with added resources as well as database. Let's see what we can run with the SAM CLI. We can use SAM init to generate pre configured template SAM package to create a deployment package build and deploy as you can imagine to build and deploy an application. Finally, some local to test the application locally. Now it's time to see this tool in action and deploy a machine learning model as serverless API I'm going to show how to use the SAm CLI to create a machine learning container running on AWS lambda exposed via Amazon API gateway. In this way, a client application can make requests to API gateway which will invoke the lambda function and return the inference output. We are going to generate the template with some init, build the solution with some build, test locally with some local create a container registry. This is optional if you already have a container registry. Deploy with some, deploy and finally test the deployment. The first step is to run some init to initialize the project. First we need to choose between an AWS quick start template or a custom one, and I'm going to choose quick start template. The next question is about the package we would like to use. The choice is between a zip file or a container image. The choice will be an image. AWS provides already base images for common tasks, but you have the possibility to create your own. For this demo, I'm going to use the Python three eight base image. Now it's time to select a name for the project. I'm going to accept the default sum app. At this point, sum is fetching the required file to create the app. The final step is to select which type of application we would like to run. As you can see, there are some pre configured scenarios to get started quickly. Hello word Pythorch scikit tensorflow xgboost I'm going to choose Pythorch machine learning inference API. At this stage, Sam generates a directory with all the required file to run the application. Let's move inside the application directory. I'm going to use the three command to show the directory structure. Here you can see all the file generated by Sam. The quick start I've chose contains a sample model to identify handwritten digits. There is some template yaml to generate the infrastructure we saw before. An example of it, a sample training file to train the model, an event file to test the API, and more. Two files that are particularly important are the docker file to build the container, the app file which contains the code. Let's inspect the docker file. You can see here the base image previously selected in the from statement and other statements to bundle and run the application. At the very bottom you can see the docker CMD command that specifies what needs to be executed when docker container starts. In this case, it will execute the function called lambda handler inside the app file. So let's have a look at the app file. As you can see is a python file with all required statements to run inference, preprocessing steps, model load and here you can see the lambda handler function which runs the inference and returns a JSON output with a prediction back to the main folder. Now it's time to build the application with sum build. In short, this operation build and tag a docker container locally before we saw a file called event JSON which can be used to test the application. Let's zoom out a bit. The file contains a JSON request with a payload body which is the base 64 representation of an image. I can decode the body and show you the image with this statement. As you can see it's the representation of a ramp tree. Let's try to test the application locally with some invoke using the event JSON file which contains the representation of the number three as we saw before. As you can see, the response is exactly what we expected predicted. Label the number three to recap we used the quick start to generate the code and related assets. We builtin the docker image and test the container locally. Now we are entering in the deployment phase to deploy the solution. We need to make our local container available for cloud resources. One way is to create an Amazon elastic container registry or ECR and push the docker image there. ECR is a fully managed service to store, manage and deploy container images. We need to authenticate the request before creating the registry in ECR and this statement allows us to retrieve temporary credential. This series of commands are run with the AWS CLI, not the AWS sum. You might notice that we need to substitute region and account id in this statement. I've reducted some parts, but this is how the request might look like. The authentication is successful. Now we can create a repository called ML demo with this command. AWS ECR create repository this is the output and we need to copy the repository URI which will be used during the deployment phase. The final step is to run some deploy guided and follow the instructions. The first step is to bow to give a name to the stack. The second is about choosing the region in which the stack will be deployed. In the next step, we need to specify the docker image and we're going to use the uri we saw before a confirmation step to apply the changes. Some will need permissions to set up the resources. This step tells us that our API doesn't have any authorization method. This is okay for this demo, but it's good practice to secure the access. All those choices are going to be saved in a configuration file which will make the next deployment faster. Let's accept the default name for the config file and keep a default environment for this configuration. At this stage, SAM is pushing the docker image we built locally into ECR repository. Finally, everything is ready to be deployed. As you can see, the stack has been created successfully. That's wonderful. Job done. Sam created all resources for us, the API gateway and the lambda function. We saw earlier that the API gateway is publicly exposed and can be used by an application to run inferences. So if we want to test our cloud stack, we need to grab the API gateway endpoint which is actually part of this output. So scrolling up a bit we can see the API endpoint. Let's clean up a bit terminal and as a very final step we can test our serverless machine learning model, making a request against the API endpoint. You can see we can use curl to create a post request and send the base 64 representation of the number three. Let's send a request and celebrate. The output of the inference is what we expected to recap. We use the sum CLI to generate and deploy a serverless machine learning model. The model and the application code has been bundled in a container and deployed in a lambda function which resides in a private network, while an API gateway has been deployed to handle public API requests. As a final conclusion, before getting into serverless machine learning solutions, you need to carefully validate your use case and define clear KPIs. With the serverless approach, you run code with zero administration and with the pay as you go model, you don't have to pay for unused server time. Moreover, you benefit from continuous scaling. To date, serverless machine learning solutions are more suited when performances are not a big concern and when you work with batch processing since everything runs independently in parallel. So it's important to be aware of the service quotas to see if they affect your use case. For instance, AWS Lambda supports container images of up to ten gigabyte in size. And this leads us to the final conclusion, which is to continuously test and validate your assumptions and AWS. Sam surely gives an advantage in terms of fast prototyping and experimentation. That's great if you want to innovate faster. Thank you.

Nicola Pietroluongo

Senior Solutions Architect @ AWS

Nicola Pietroluongo's LinkedIn account Nicola Pietroluongo's twitter account

Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways