Transcript
            
            
              This transcript was autogenerated. To make changes, submit a PR.
            
            
            
            
              Good day everyone. The topic for today would be machine
            
            
            
              learning and machine learning engineering in these cloud with
            
            
            
              Amazon Sagemaker.
            
            
            
              I am Joshua Arvin Lat. People call me Arbs.
            
            
            
              I am the chief technology officer of Nuworks Interactive Labs.
            
            
            
              I'm also an AWS machine Learning hero and I'm the
            
            
            
              author of Machine Learning with Amazon Sagemaker cookbook book.
            
            
            
              So feel free to check this out. So here we
            
            
            
              have about 80 recipes to help data scientists and
            
            
            
              developers and machine learning practitioners perform ML
            
            
            
              experiments and deployments. So you will see that with just
            
            
            
              a couple of lines of code, you will be able to perform a lot of
            
            
            
              things with Amazon Sagemaker. So let's start with machine
            
            
            
              learning. No machine learning talk is complete without introducing
            
            
            
              this quickly. So what is machine planning? So machine learning
            
            
            
              is about creating something
            
            
            
              which helps you perform an intelligent
            
            
            
              decision without having to be explicitly
            
            
            
              programmed to do it. So one example of this would be,
            
            
            
              let's say we have a picture of a cat. So with your
            
            
            
              machine learning model, your machine learning model would then decide
            
            
            
              if it's a cat or not a cat. So even
            
            
            
              without human intervention, the machine learning model should be able
            
            
            
              to know if it's a cat or not a cat. And it
            
            
            
              can make use of a lot of training data.
            
            
            
              CTO help it prepare and generalize a model
            
            
            
              which can be used to identify new
            
            
            
              images and process new images if they're cats or not
            
            
            
              cats. So this is a very simplified example,
            
            
            
              but you would definitely get a better understanding
            
            
            
              once you have more examples on what machine learning
            
            
            
              can do for us next.
            
            
            
              When doing machine learning, we will
            
            
            
              definitely start with very simple examples in our local machine.
            
            
            
              But once we start to work with teams, once we
            
            
            
              start to work with more complex requirements,
            
            
            
              it becomes essential that we start using
            
            
            
              machine learning frameworks and platforms to make our lives
            
            
            
              easier. So why is this important? So let's say that
            
            
            
              we were to build everything from scratch.
            
            
            
              There's a chance that the other person in your team would
            
            
            
              have no idea what you just built, unless of course,
            
            
            
              you document it properly.
            
            
            
              You share this ways of
            
            
            
              working with your code through with documents
            
            
            
              and sample source code. But the problem there is
            
            
            
              that you will be building everything from scratch, and that will
            
            
            
              take time. And the advantage of using machine learning frameworks
            
            
            
              would be that these machine learning frameworks and platforms
            
            
            
              are already complete in a sense that
            
            
            
              they already have a lot of features and capabilities
            
            
            
              built in already because a lot of people are using them. So of
            
            
            
              course, as these people around
            
            
            
              the world are using these tools, the tools are being updated,
            
            
            
              even if you yourself haven't
            
            
            
              encountered this yet. So once you were to encounter
            
            
            
              these specific requirements, then you would probably just
            
            
            
              need to use that machine learning frameworks or platforms existing
            
            
            
              capabilities, which would save you time. Of course there will
            
            
            
              be cases where you will build something from scratch, but try to
            
            
            
              make sure that it's practical and it makes sense.
            
            
            
              So this is one good example of practical
            
            
            
              applications of machine learning, and also
            
            
            
              possible pragmatic and practical solutions
            
            
            
              using existing tools or services or capabilities
            
            
            
              of existing platforms. So if we look at these left
            
            
            
              side, we can see here that, yeah, there's anomaly detection,
            
            
            
              product recommendation forecasting, image and video
            
            
            
              analysis, document classification and language translation.
            
            
            
              Just a few of what we can do with machine learning.
            
            
            
              On the right side, we have the possible solutions.
            
            
            
              So how can we solve an anomaly detection
            
            
            
              requirements with just a few lines of code? Yeah,
            
            
            
              we can make use of sagemaker random cut forest algorithm,
            
            
            
              which is already optimized for the cloud. So it
            
            
            
              has made use of existing random
            
            
            
              cut forest algorithm, and then the AWS team
            
            
            
              optimized it to make it work with Sagemaker and
            
            
            
              the cloud resources for product recommendation
            
            
            
              we can make use of Amazon personalize, another service in
            
            
            
              AWS, which is built to solve this type of problem
            
            
            
              for forecasting requirements. We can make use of Sagemaker
            
            
            
              deeper algorithm. So it's similar to random cut forest
            
            
            
              where we just make use of an existing container
            
            
            
              image that the AWS team has provided for
            
            
            
              us, so that all we need to do is make use
            
            
            
              of that container and perform planning
            
            
            
              and deployment to solve forecasting requirements.
            
            
            
              And the same goes for the other items in this list.
            
            
            
              So of course, you won't need one to two
            
            
            
              teams of learning the nitty gritty details of
            
            
            
              how these things work. These advantage here is that even
            
            
            
              if you are a newbie,
            
            
            
              you will be able to get something to work within four CTO 8
            
            
            
              hours. And that's pretty cool.
            
            
            
              So instead of spending six months to one year just
            
            
            
              trying to get everything to work, because you built something from scratch,
            
            
            
              you can have something which is already working.
            
            
            
              You can present a proof of concept work, CTO,
            
            
            
              your boss, or to your clients. And then once you
            
            
            
              have approved a certain budget, then that's the time you can deep dive
            
            
            
              and let's say configure the hyperparameters, prepare a complete
            
            
            
              machine learning engineering system and workflows and so on.
            
            
            
              So the advantage here is that you can build something fast and also you
            
            
            
              can configure this into something that's production ready.
            
            
            
              So what can sage maker do for us? And what is Sagemaker anyway?
            
            
            
              Sagemaker is the machine learning platform
            
            
            
              of AWS, which helps you work with
            
            
            
              more complex and custom requirements.
            
            
            
              AWS has a lot of machine learning services, but what
            
            
            
              makes Sagemaker amazing is that it has a lot of capabilities that
            
            
            
              help you migrate your machine learning
            
            
            
              requirements and workflows and code to
            
            
            
              the cloud with very minimal changes
            
            
            
              in your existing scripts. And what it
            
            
            
              offers and provides would be a certain level of abstraction
            
            
            
              when dealing with cloud resources. If you were to
            
            
            
              prepare and run simple experiments
            
            
            
              in your local machine, you may not need very
            
            
            
              large and very powerful instances
            
            
            
              or computers or servers. However, once you
            
            
            
              need to deal with production requirements and once you are
            
            
            
              going to work with really large files and really large models,
            
            
            
              you will start to realize how hard it is to get this
            
            
            
              working in the cloud because of course your local machine wouldn't
            
            
            
              be enough to get these requirements running.
            
            
            
              So here what sagemaker can do for us, which is just one
            
            
            
              of the cool things with Sagemaker, is that
            
            
            
              with just a single line of code change, you will
            
            
            
              be able to configure these infrastructure strength
            
            
            
              needed to run a certain part of the ML workflow.
            
            
            
              So for example, if you look at these screen in data
            
            
            
              preparation and cleaning, if I need two
            
            
            
              instances of a certain instance type,
            
            
            
              all I need to do is change one line of code and
            
            
            
              then that's going to work right away. And the advantage
            
            
            
              here also is that the instances,
            
            
            
              these automatically get deleted after the data preparation
            
            
            
              and cleaning step has completed, meaning you'll
            
            
            
              save money because it's not running at all, and you won't pay for
            
            
            
              anything which is not running in AWS,
            
            
            
              let's say in model training and hyperparameter tuning.
            
            
            
              You can see here that, okay, that training and hyperparameter
            
            
            
              tuning step will take time. So there,
            
            
            
              all I need to do is specify six instances
            
            
            
              of a certain type. And if I need to have a
            
            
            
              really strong instance type there, then yeah, I can just configure it there.
            
            
            
              And when I need to deploy something, and I'm aware
            
            
            
              that I'm going to pay for every r that
            
            
            
              that instance is running, of course I would choose a small instance type
            
            
            
              because of course the instance needed for
            
            
            
              deployment may not necessarily be the same
            
            
            
              instance type needed for training and will need less resources
            
            
            
              during deployment. So there we can specify one, and with
            
            
            
              just a single line of code change, we'll be able to get
            
            
            
              this working right away, which is pretty cool. So again,
            
            
            
              the infrastructure abstraction component of sagemaker
            
            
            
              already solves a lot of problems for us, because that
            
            
            
              directly maps to the cost of owning this entire
            
            
            
              thing. So of course, enough of the concepts let's
            
            
            
              take a look at a bit of code and how does this work?
            
            
            
              So you can see these source code in the repository here.
            
            
            
              So in GitHub you have Amazon Sagemaker cookbook. So feel
            
            
            
              free to check that out so that you can see all the other code
            
            
            
              snippets. So you will be surprised
            
            
            
              that all it takes is a couple of lines of code to
            
            
            
              get something working with Sagemaker, of course you will need to prepare your
            
            
            
              data, you will need to perform model evaluation.
            
            
            
              But if we were to perform training, it would be very
            
            
            
              similar to some of the existing libraries fit function.
            
            
            
              So what happens here? So first we
            
            
            
              initialize the estimator over here, and then we
            
            
            
              set the hyperparameters so we can see these, that we're
            
            
            
              dealing with a machine learning algorithm that
            
            
            
              deals with time series analysis requirements.
            
            
            
              So we have here concepts,
            
            
            
              length, time prediction length, and so on.
            
            
            
              Because we're trying to make use of the deep AR forecasting
            
            
            
              algorithm of Sagemaker, we specify
            
            
            
              the data channels on the right hand side. As you can
            
            
            
              see here, data channels equals train and test dictionary.
            
            
            
              And then with one line of code we can perform the training step
            
            
            
              with fit function and we pass the data,
            
            
            
              these data channels as the argument.
            
            
            
              Next, if we need to deploy it,
            
            
            
              all we need to do is a single line of code which is deploy.
            
            
            
              And you can see here that it's magic.
            
            
            
              So here we run the deploy function.
            
            
            
              We just specify the instance type and the instance count,
            
            
            
              and there you go. All we need to do is wait for probably three
            
            
            
              CTO five minutes and then that production
            
            
            
              level endpoint is already working. So we won't have
            
            
            
              to worry about the DevOps side of things.
            
            
            
              We won't have to worry about the engineering side of things because that's
            
            
            
              already handled by Sagemaker. So we don't have
            
            
            
              to worry about that. And if we need to delete that endpoint,
            
            
            
              it takes one line of code as well.
            
            
            
              So what's the best practice when dealing with this
            
            
            
              type of approach? So we can optimize cost by
            
            
            
              using transient ML instances for training models. And this
            
            
            
              is automatically being done by Sagemaker.
            
            
            
              So during training and even processes,
            
            
            
              we can select the type of instance
            
            
            
              or server that's going to run
            
            
            
              these processing script or scripts.
            
            
            
              So in the first example at the top, we can see here
            
            
            
              that we have a large instance.
            
            
            
              At these bottom we have a two x large instance.
            
            
            
              So these, of course the two x large instance is
            
            
            
              more expensive than the large instance, but you won't
            
            
            
              probably feel that cost much, especially if that instance
            
            
            
              runs for only two minutes because of course,
            
            
            
              if you were already using AWS for quite some time,
            
            
            
              you may notice that, okay, if an instance is running for
            
            
            
              24 hours per day, times seven days, times four weeks,
            
            
            
              then of course the cost will add up and you will significantly
            
            
            
              fill that cost when you check the bill. But if you
            
            
            
              are running the training instance in just two minutes,
            
            
            
              then it's not that pricey. And increasing
            
            
            
              the size of the instance is preferred here because it will
            
            
            
              significantly decrease the amount of time used for
            
            
            
              training. And given that we're
            
            
            
              dealing with transient ML instances, you won't need to
            
            
            
              have a separate program or code just to delete
            
            
            
              the instances. The instances will
            
            
            
              be created and then will automatically be deleted
            
            
            
              after the processing or training jobs have completed, which is
            
            
            
              pretty cool. Before, you would have to program that.
            
            
            
              Now all you need to do is run the fit function, and then
            
            
            
              after the fit function has completed, then the instance would get
            
            
            
              deleted automatically. So your next question would
            
            
            
              be, so, do I need to create everything
            
            
            
              from scratch again? Now that I found out about this new platform?
            
            
            
              The answer would be no. Sagemaker has been designed
            
            
            
              to help existing machine learning practitioners
            
            
            
              migrate and work with their existing
            
            
            
              code and set of scripts and work to
            
            
            
              sagemaker with very minimal modifications.
            
            
            
              And there are a lot of options and layers here.
            
            
            
              Of course, if you're just getting started, you can make use of the
            
            
            
              built in algorithms, as you can see on the left side, in the
            
            
            
              middle, you can even bring your own container or container image.
            
            
            
              The advantage here is that you
            
            
            
              can compile and prepare
            
            
            
              and build your own container image with
            
            
            
              all the prerequisites there. And if you have something,
            
            
            
              let's say an R package,
            
            
            
              an R script, where your model
            
            
            
              is going to be built using those existing
            
            
            
              custom scripts, then yes, you can also port that to sagemaker by
            
            
            
              bringing your own container. And on the right side, you can
            
            
            
              even bring your own algorithm and make use of
            
            
            
              these smooth integration with existing machine
            
            
            
              learning frameworks like Tensorflow, Pytorch.
            
            
            
              You can even make use of hugging face transformer
            
            
            
              models there. So the advantage there is that in the
            
            
            
              different things that you have worked on, there's a counterpart for
            
            
            
              it in Sagemaker. And you'll realize that,
            
            
            
              oh, I didn't expect it to be that smooth and that
            
            
            
              flexible. So what's the best practice?
            
            
            
              The best practice here would be to choose
            
            
            
              what's best for you. You will be given a lot of options,
            
            
            
              and given that sagemaker is flexible, all you need to
            
            
            
              do is CTO, be aware of the features,
            
            
            
              and what would be a good metric for that?
            
            
            
              The metric for that would definitely be time,
            
            
            
              because the less time it would take you to
            
            
            
              build something or to prepare something, then that's
            
            
            
              probably the right way to go. Of course,
            
            
            
              you will have other things to worry about, let's say the evaluation
            
            
            
              metrics, the cost and so on. But one of
            
            
            
              the factors you need to take note of is time. If you can build
            
            
            
              something in 3 hours, I would prefer that
            
            
            
              over something which can be built in three months.
            
            
            
              Because after three months the requirements may have changed,
            
            
            
              your clients may have changed their mind, or maybe that
            
            
            
              would be too expensive already. Because if you were to think about cost,
            
            
            
              it would involve the cost of the infrastructure,
            
            
            
              resources, the other overhead cost,
            
            
            
              the cost of paying the employees, and so on.
            
            
            
              So with less time, you'll definitely save a lot.
            
            
            
              So make sure that you take that into account because time
            
            
            
              will always be a multiplier.
            
            
            
              That said, how can we save time? You can save
            
            
            
              time by making use of existing features,
            
            
            
              and being aware of these features is the first step.
            
            
            
              So let's take a step back and see why do
            
            
            
              we have so many features here? The reason why we have so many features
            
            
            
              here is that there are a lot of different requirements other
            
            
            
              than training and deploying your model. Of course, when you're starting
            
            
            
              to learn about machine learning, you'll start off with training
            
            
            
              your model, deploying your model, and then evaluating your
            
            
            
              model. But in reality, there's a lot more things you
            
            
            
              need to worry about once you need to work with teams, once you need to
            
            
            
              work with different requirements, once you need to work with legal
            
            
            
              and other concerns, you need
            
            
            
              to worry about. So first, let's look at
            
            
            
              the upper left side, sagemaker processing.
            
            
            
              So sagemaker processing is there to help you process
            
            
            
              your data with a custom script.
            
            
            
              The advantage of using sagemaker processing is that if your
            
            
            
              local machine or the machine that you're using
            
            
            
              is not able to process a large amount of data,
            
            
            
              you can make use of sagemaker processing using the same infrastructure
            
            
            
              abstraction capabilities that you're using with training your model.
            
            
            
              So if you have big data like data,
            
            
            
              then you can use sagemaker processing and just use a large instance
            
            
            
              to get the task completed within two to three minutes or something.
            
            
            
              With sagemaker experiment. So the one just beside
            
            
            
              sagemaker processes here at the upper middle
            
            
            
              corner. With sagemaker experiments, we can make use of
            
            
            
              that to manage multiple experiments. Of course,
            
            
            
              you will not be running just a single experiments, but with Sagemaker
            
            
            
              experiments you can run a lot of experiments and
            
            
            
              not worry about the details on how to connect the
            
            
            
              different artifacts. It will be much easier for you to audit
            
            
            
              experiments which have been performed in these past. So you can check
            
            
            
              it out, especially when you need to get things
            
            
            
              working in production and in work in general,
            
            
            
              with automatic model tuning on the upper right hand corner with
            
            
            
              just a couple of lines of code, which we will show later,
            
            
            
              you can see here that we can get the
            
            
            
              best version of a model using
            
            
            
              automatic model using. So what happens here is that
            
            
            
              we'll be able to test a lot of different hyperparameter
            
            
            
              configurations and prepare and build different
            
            
            
              models, and then we just compare the models and get the best one.
            
            
            
              With automatic model tuning, all you need is probably two or
            
            
            
              three additional lines of code in addition to what you
            
            
            
              saw earlier. And these, you'll see that, oh,
            
            
            
              that's magic. Again, with very minimal code changes,
            
            
            
              you'll be able to have something which automatically
            
            
            
              gets and prepares the best model for you. So we'll discuss
            
            
            
              that later with a couple of examples with built in algorithms.
            
            
            
              We can see here that we have about
            
            
            
              17, I think 17 built in algorithms
            
            
            
              which can be used to solve different machine learning requirements. So some of
            
            
            
              these algorithms can be used. CTO deal with
            
            
            
              numerical data, can also deal with text data,
            
            
            
              and you can also deal with images and even time series
            
            
            
              analysis stuff.
            
            
            
              So you can already get started with built in algorithms so that
            
            
            
              you won't have to use your custom containers and
            
            
            
              algorithms, especially if you're still getting started. And most
            
            
            
              of the time, these algorithms are not
            
            
            
              just on par with what you probably will build,
            
            
            
              but it's probably already optimized in
            
            
            
              solving most of the use cases. There's also machine
            
            
            
              learning and deep learning framework support. So the
            
            
            
              great thing here is that if you're already using Tensorflow
            
            
            
              or Pytorch or Mxnet in your projects,
            
            
            
              then with very minimal adjustments, you can
            
            
            
              already port that and use it with Sagemaker. With Sagemaker,
            
            
            
              clarify the 6th one, you can use that
            
            
            
              to detect pretraining
            
            
            
              and post training bias. It can also be
            
            
            
              used to enable ML explainability.
            
            
            
              And we'll discuss that later in detail, and you'll
            
            
            
              see that it can be used to help you manage
            
            
            
              the other production requirements which you may encounter later
            
            
            
              on when you have to deploy your model, especially the legal and ethical concerns
            
            
            
              surrounding the type of problem that you're trying
            
            
            
              to solve. Sagemaker debugger we'll actually
            
            
            
              discuss this in detail in these next set of slides. But sagemaker
            
            
            
              debugger can be used to debug your experiments
            
            
            
              in near real time in cloud environments.
            
            
            
              So later you'll realize that debugging experiments
            
            
            
              locally and debugging experiments in the cloud are
            
            
            
              quite different because of course when you're using and working with
            
            
            
              different instances and servers during training and
            
            
            
              there's an error somewhere, how do you debug that, especially if you're dealing
            
            
            
              with a distributed setup?
            
            
            
              Sagemaker feature store Sagemaker feature store
            
            
            
              is used for feature
            
            
            
              store requirements from the name itself. So you will have
            
            
            
              these offline feature store, and you will have the online feature store.
            
            
            
              And the offline feature store can be used to
            
            
            
              deal with data which can be used for training, and then
            
            
            
              the online feature store can be used to get data which can be used
            
            
            
              for the prediction parts.
            
            
            
              Sagemaker autopilot is there to help you with your automl
            
            
            
              requirements. So with very minimal human
            
            
            
              intervention, probably just the initial configuration part,
            
            
            
              you can just pass in your planning data and
            
            
            
              then run, and then after a few minutes you
            
            
            
              will have a trained model. So that's pretty cool because you
            
            
            
              can make use of AutomL and Sagemaker has proper
            
            
            
              support for it. Sagemaker Studio
            
            
            
              so Sagemaker Studio is there to help
            
            
            
              us have an interface and basically
            
            
            
              a studio which has a lot of features and capabilities
            
            
            
              integrated already so that things
            
            
            
              would be pretty smooth when you're dealing with experiments
            
            
            
              and deployments when using Sagemaker. So they're continuously
            
            
            
              upgrading this studio. CTO make it easy for
            
            
            
              you to run your code and then there's an interface
            
            
            
              for it so that it's very practical for you to work on
            
            
            
              real life experiments. Sagemaker Groundsuit
            
            
            
              is there to help you prepare your data. Sagemaker model
            
            
            
              monitor from the name itself, it's there to help you monitor deployed
            
            
            
              models, manage spot planning if
            
            
            
              you're aware of what spot instances are. Those are used to
            
            
            
              further reduce the cost when performing training.
            
            
            
              So with managed spot training, you won't have to worry about the
            
            
            
              nitty gritty details when you're using spot instances,
            
            
            
              because all you need to do is update a couple of parameters
            
            
            
              and then you'll be able to save on costs, especially when you're dealing with
            
            
            
              large instances during training.
            
            
            
              Sagemaker pipelines, second to the last, will be
            
            
            
              able to create complex machine
            
            
            
              learning workflows with just a couple of lines of code.
            
            
            
              And then finally Sagemaker data Wrangler is
            
            
            
              used to help you prepare your data using
            
            
            
              an interface. So these are just a few of
            
            
            
              the capabilities and features of Sagemaker. You might
            
            
            
              be overwhelmed right now, but do not worry because we will choose
            
            
            
              about four or five of them, and we will discuss this in
            
            
            
              more detail over the next couple of minutes.
            
            
            
              What's important here is that you should have
            
            
            
              that mindset or way of thinking that
            
            
            
              maybe the problem that you want to solve has
            
            
            
              already been solved by an existing tool or framework.
            
            
            
              And if you were to use Sagemaker,
            
            
            
              probably one of the customers of AWS has already requested
            
            
            
              for that already, and there's already a solution already prepared
            
            
            
              for it. So before trying to build something on your own,
            
            
            
              check if all you need to do is add one to two lines of code
            
            
            
              in order to solve your problem. It's not about creating
            
            
            
              the coolest solution out there, it's about solving your problem
            
            
            
              in the shortest time possible with the smallest
            
            
            
              amount of expense. Because if you will get the same
            
            
            
              output, or even better, why not use something which already built
            
            
            
              for you? So let's start first with sagemaker debugger.
            
            
            
              So here you will start to see more code, and this will help you
            
            
            
              understand how easy it is to use Sagemaker in general.
            
            
            
              And actually some parts of the code here are
            
            
            
              just snippets which are already used
            
            
            
              in other snippets, as you see in the previous slide.
            
            
            
              So here at the bottom, this is the same estimator initialization
            
            
            
              code. And what's happening at the top here is that
            
            
            
              we're just initializing these debugger objects
            
            
            
              and properties there before passing it to the estimator object.
            
            
            
              So there all it takes is probably three
            
            
            
              additional lines of code, and sagemaker debugger is already
            
            
            
              enabled. So what's happening here, what's happening here
            
            
            
              is that every two steps we will save some
            
            
            
              sort of snapshot data, and then
            
            
            
              it will save that in Amazon S three,
            
            
            
              and then we'll be able to debug that
            
            
            
              and have more visibility on what's happening inside.
            
            
            
              And we can specify here
            
            
            
              that we need to have a rule that the loss should not
            
            
            
              be decreasing, so the value
            
            
            
              there should not be decreasing. So if that rule
            
            
            
              is violated, then we'll be able to detect that during
            
            
            
              the execution phase of the planning step.
            
            
            
              So you just specify the configuration with sagemaker debugger,
            
            
            
              initialize the estimator object with a debugger configuration
            
            
            
              specified and enabled, and then you just run the experiment
            
            
            
              normally so you won't have to worry about going
            
            
            
              deep into these actual execution of the container inside.
            
            
            
              Sagemaker and debugger will do its magic
            
            
            
              for you. Pretty cool, right? Let's look at our
            
            
            
              automatic model tuning with Sagemaker.
            
            
            
              With model training and tuning, we can see here that all
            
            
            
              we need is a bunch of hyperparameter
            
            
            
              configuration ranges, and we will have
            
            
            
              multiple planning instances
            
            
            
              running at the same time. The advantage
            
            
            
              these is that without much
            
            
            
              change in your code, you'll be able CTO
            
            
            
              improve your existing experiments and
            
            
            
              make it run ten times or 100 times more
            
            
            
              without having to worry about the details.
            
            
            
              So if you were to look at this slide,
            
            
            
              you'll see that the estimator initialization step is
            
            
            
              just these same. The same way goes for the set hyperparameters
            
            
            
              call function call. So if you look
            
            
            
              at the lower left section during the initialization
            
            
            
              of the hyperparameter ranges section, we specify
            
            
            
              the continuous and integer parameter ranges
            
            
            
              for minimum child weight, max step and eta,
            
            
            
              and then we initialize the hyperparameter object
            
            
            
              with those configuration, and then we just call the fit function.
            
            
            
              So the cool thing here is that we just added three to four lines
            
            
            
              of code, and then we call the fit function. And then there you go.
            
            
            
              It's going to run for probably 15 to 20 minutes,
            
            
            
              and then after 15 CTO, 20 minutes, depending on your
            
            
            
              configuration, then you'll get the best model based
            
            
            
              on the objective metric target.
            
            
            
              So if the target is validation area under
            
            
            
              the curve, then it will select the model
            
            
            
              with the best value for it. The next one would
            
            
            
              be ML explainability.
            
            
            
              So of course there's a
            
            
            
              way for us CTO know which features are
            
            
            
              important without having to understand the
            
            
            
              actual algorithm. There's a difference between interpretability
            
            
            
              and explainability, but with explainability it
            
            
            
              will allow us to know which
            
            
            
              features actually contributed the most to an existing
            
            
            
              output to an output. So if
            
            
            
              you look at the screen here, we have feature one and feature
            
            
            
              zero, the first two features contributing the most
            
            
            
              to the actual output. And feature two and
            
            
            
              feature three did not really contribute much to the
            
            
            
              output, meaning that if we have new data,
            
            
            
              there's no point changing the values for feature
            
            
            
              two and feature three because they don't really contribute to the final outcome.
            
            
            
              So if there are production columns and then there's a target column,
            
            
            
              we're pretty sure that feature one and feature zero contributes
            
            
            
              the most CTo the final outcome. So how do we prepare
            
            
            
              something like this? We prepare something like
            
            
            
              this and get this type of output using shaft
            
            
            
              values. So shaft values help us
            
            
            
              understand the output and the model better.
            
            
            
              So how do you do that? With Sagemaker, we do that
            
            
            
              by just configuring the
            
            
            
              ML explainability job. So you
            
            
            
              initialize the sagemaker, clarify processes,
            
            
            
              you configure the
            
            
            
              data config and the shap config objects and
            
            
            
              these. After that you use the run explainability
            
            
            
              function and wait for probably three to seven minutes
            
            
            
              to get that completed, depending on the size of your data and
            
            
            
              these type of instances that you're using. So after three
            
            
            
              to seven minutes, you'll get something like this,
            
            
            
              and then you'll be surprised. Okay, I didn't have
            
            
            
              to learn much about shaft values, but with just using a couple of
            
            
            
              lines of code, I got what I needed. And you
            
            
            
              can use that to further improve your analysis of
            
            
            
              your experiments. So next, let's now talk
            
            
            
              about deployments.
            
            
            
              The advantage of using Sagemaker
            
            
            
              would be that it has great integration with
            
            
            
              the other services and features of AWS.
            
            
            
              Of course, you may have your own tech stack
            
            
            
              for it, but you'll be surprised that Sagemaker probably has
            
            
            
              some sort of integration, let's say with kubernetes,
            
            
            
              or even with lambda and so on, or if
            
            
            
              you're dealing with a new service,
            
            
            
              let's say app runner or something. You'll be surprised
            
            
            
              that you can deploy sagemaker models there, and even
            
            
            
              in easy to instances. But let's start first with a couple of examples
            
            
            
              and patterns which may be applicable to you already.
            
            
            
              The first one would be deploying the model inside these lambda function,
            
            
            
              so you will save a lot of cost there. But of course there are trade
            
            
            
              offs and you won't be able to use the other sagemaker features with the lambda
            
            
            
              function. But it's really good for simplified
            
            
            
              model deployments. We can also create
            
            
            
              a lambda function that triggers an existing sagemaker endpoint, so that you can
            
            
            
              prepare and process your data first inside the lambda function,
            
            
            
              and then trigger the sagemaker endpoint, and then process the data
            
            
            
              again before returning it back to the user. So you can combine
            
            
            
              lambda and API Gateway to help abstract the
            
            
            
              request and response calls before passing it to the sagemaker
            
            
            
              endpoint. The third one in the list is the API gateway
            
            
            
              mapping templates, where you won't need a lambda
            
            
            
              function at all to trigger a sagemaker endpoint.
            
            
            
              The fourth one involves deploying the model
            
            
            
              in Fargate, and you'll be able to use containers
            
            
            
              there in Fargate. Here's the cool thing here.
            
            
            
              If you were CTO, make the most out of Sagemaker.
            
            
            
              There's a lot of features and capabilities there,
            
            
            
              which just requires probably three to four lines of code,
            
            
            
              and you'll be able to get something like this. So the first one would be
            
            
            
              Sagemaker multimodal endpoint. Of course, it would be weird
            
            
            
              to have a set up where you have one
            
            
            
              endpoint for each model. You'll realize
            
            
            
              that you can actually optimize this and have, let's say, three models
            
            
            
              deployed in a single endpoint. And it not only will it
            
            
            
              help you reduce cost, it also enable you to perform
            
            
            
              other cool things. Let's say a b testing where you're
            
            
            
              deploying let's say two models at the same time, and then
            
            
            
              you're trying to check which model is performing better. And you can
            
            
            
              also deploy a model inside a lambda function with the lambda functions
            
            
            
              container support. So there are a lot of variations here.
            
            
            
              And being aware of these variations is the first step.
            
            
            
              And having the developers skills to customize the solution
            
            
            
              is the second step, especially once you need to customize
            
            
            
              things a bit based on your use case. Now let's
            
            
            
              talk about workflows. So work automated workflows
            
            
            
              are very important because you
            
            
            
              don't want to run your experiments manually every single
            
            
            
              time. Of course, at the start you
            
            
            
              will be running these steps manually because
            
            
            
              you'll be experiments if it will work or not. But once you need to,
            
            
            
              let's say, retrain your model, it would be really tedious
            
            
            
              to do that every month or every two weeks and
            
            
            
              running the experiment again and again. What if there's some sort of
            
            
            
              automated script or automated pipeline which
            
            
            
              helps you perform these steps without you
            
            
            
              having to do it manually? So for example,
            
            
            
              after one month there's data uploaded in an s
            
            
            
              three bucket or storage. You want your
            
            
            
              automated workflow to run. And if these model
            
            
            
              is, let's say, better than your existing model,
            
            
            
              then you replace it. And yeah,
            
            
            
              you can do it automatically with the different options available with
            
            
            
              Sagemaker. So this is these first one. So this is a very
            
            
            
              simplified example. Of course, we won't discuss the more complex examples
            
            
            
              here, but these are the building blocks to help you prepare
            
            
            
              those more complex examples. So here
            
            
            
              this is using to help you prepare a linear workflow
            
            
            
              where you have the training step,
            
            
            
              the build step, and then the deploy step. With just a couple of lines of
            
            
            
              code and using the sagemaker SDK
            
            
            
              and the step functions, data science SDK,
            
            
            
              we'll be able to make use of two services, the first one
            
            
            
              being sagemaker and then the second one being the data science
            
            
            
              SDK. And with very
            
            
            
              minimal changes in your existing sagemaker code,
            
            
            
              you'll be able to create a pipeline like this
            
            
            
              one. And you can make use of the
            
            
            
              features of step functions to help you debug
            
            
            
              and keep track of the different steps being executed
            
            
            
              during the execution phase.
            
            
            
              The second option would be to use sagemaker
            
            
            
              pipelines. So with Sagemaker pipelines,
            
            
            
              you can do the same set of things as what you
            
            
            
              can do with the sagemaker and data
            
            
            
              science SDK combo. But here you
            
            
            
              can make use of the dedicated sagemaker pipelines to help you prepare
            
            
            
              your model. So this one came in much later,
            
            
            
              after more people have requested for it. And you
            
            
            
              can see here that, wow, you can have an
            
            
            
              interface, a UI chart
            
            
            
              or graph like this, and you will know what's happening
            
            
            
              in each step. And let's say that you want to know
            
            
            
              the details after each step has executed,
            
            
            
              let's say the metrics during the train step. Then you
            
            
            
              can just click on these train step box
            
            
            
              and then you'll see the metrics and these other details
            
            
            
              there. So this is the source code for
            
            
            
              it. You'll see here that with just a couple of lines of code
            
            
            
              added to your existing initial Sagemaker
            
            
            
              SDK code, you'll be able to create the different steps.
            
            
            
              So most likely probably two lines of code, two to three lines
            
            
            
              of code for each block. So let's say that you have
            
            
            
              the processing step and then you have the train step. Then probably
            
            
            
              you'll need four additional
            
            
            
              lines of code because of course, in addition to the original code
            
            
            
              that you have, where you have configured the different step, let's say
            
            
            
              estimator initialization, Sklearn processes
            
            
            
              initialization step. These, you will make use of these
            
            
            
              sagemaker pipeline's counterpart objects,
            
            
            
              and you will link those and then just chain
            
            
            
              those AWS you can see here, create a chain and
            
            
            
              prepare the pipeline by combining all the other steps.
            
            
            
              And in order to run this, all you need to do is run these execute
            
            
            
              function. So there, that's pretty cool.
            
            
            
              And you'll see that the more you
            
            
            
              use a certain platform like
            
            
            
              Sagemaker, you'll realize that, hey, there's a patterns.
            
            
            
              If I need something like this, I won't have to worry about
            
            
            
              changing the other parts of the code because it's probably just
            
            
            
              a configuration code change away. So again,
            
            
            
              I'm going to share this slide again so that you can have
            
            
            
              a quick look at the different features and capabilities of Sagemaker.
            
            
            
              But what I will tell you is that Sagemaker continues to
            
            
            
              evolve, even today,
            
            
            
              probably in one month there's a new release or
            
            
            
              new capability or new upgrade CTO existing features,
            
            
            
              and it's better for us to stay tuned there by,
            
            
            
              let's say, checking the AWS blog. And yeah,
            
            
            
              so again, it's not just powerful, it's also
            
            
            
              evolving. And the great thing here is that the more
            
            
            
              features and capabilities that you're aware of,
            
            
            
              the more you can make use of Sagemaker and
            
            
            
              further reduce cost, because the importance of
            
            
            
              a good professional would
            
            
            
              be his or her own ability
            
            
            
              to optimize and solve things using the
            
            
            
              knowledge and expertise using specific tools.
            
            
            
              So again, thank you very much and hope you learned a lot
            
            
            
              of things during my talk, and feel free to check out my book
            
            
            
              machine learning with Amazon Sagemaker cookbook because that will help
            
            
            
              you understand Sagemaker better with the 80
            
            
            
              recipes there, which are super simplified
            
            
            
              to help you understand something, even if it's your first time using sagemaker.
            
            
            
              So there. Thank you again and have a great day ahead.