Conf42 Cloud Native 2024 - Online

Develop and Productionize the AI and ML algorithm in Cloud Environment

Abstract

Discover the secrets of creating and deploying AI/ML algorithms in the cloud! Join me in examining the complexities of producing cutting-edge models, leveraging my 15 years of experience. Let’s look at the future of AI in the cloud, optimizing development workflows for maximum efficiency and impact.

Summary

  • Deepak is an associate director for data science and machine learning at Novartis. Today he will talk about development of an machine learning model and productionizing the ML algorithm in cloud environment.
  • Before starting any machine learning algorithm development, we define the problem statement. What we are going to solve here is natural language processing tasks. When we take a model training process, we divide the data set into training, testing and validation. Then how do we deploy and operationalize or industrialize the model in cloud environment?
  • When it comes to natural language processing, machine has to understand the language. Similar way we are building a machine learning or AI platform or AI machine to perform a specific job. Now what is the response I have to make which would be in human readable format?
  • Hugging face is a framework or library to solve most of the NLP problems. It has around 4000 models or which can be deployed in cloud. Ideally we use a transformer based architecture models to develop our models. How do we fine tune the model?
  • ML Flow is a platform or is an API library which can be injected into your model development process to perform all the model tracking and model experiments. This really helps in performing in multiple iteration of experiments and to get the tracking of the models.
  • ML flow tracking is ideally used to track all the experiment results. Everything get stored in a database, ideally in cloud environment. ML flow comes with a default UI where all the model experiment can be visualized.
  • Our idea is to define a framework, do a model training and productionize the model, or deploy the model in cloud. Once the training has been performed with the train data, we can use evaluation or validation data set to evaluate the model.
  • How do we take the model to production? That is another interesting problem which can be solved by using Pytorch serving. This has an architecture where you can have multiple model can be served in a single python serving instance. For more questions please feel free to reach out to me after the event.
  • Once a model has been built, all the models will get stored under model store. From there via inference API we can pick the model. This is a holistic step involved in creating the model or developing the machine learning algorithm or deploying the model in cloud environment.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello all. Thank you for joining for the presentation. So myself, Deepak. I'm working as an associate director for data science and machine learning here at Novartis. I'm also responsible for generative AI product deliverables and building novel machine learning algorithms and deploying that into production. Today I'm going to talk about development of an machine learning model and productionizing the ML algorithm in cloud environment. So in the recent era, the most of the time the data scientists spend time in creating or developing machine learning model, it could be starting from linear regression, logistic regression, or nave bayesian or dradum forest decision trees. Any algorithms let them take. Finally, the ERa has been moved from traditional or classical machine learning algorithms to large language models. So now deploying the large language models or productionizing the large language models in cloud is the biggest challenge we have. Not only productionizing and how we scale with high inference is another important aspect to consider. All right, let's move to the next slide before getting into the development of algorithms. First, let's understand what is the development lifecycle of any machine learning algorithm. Now, to choose any problem statement, we have to clearly define what is a problem statement we are trying to solve. It could be an image classification, text classification or language translation question and answering or predicting the next sentence or document summarization, text summarization or there are many tasks involved in the problem statement. So before starting any machine learning algorithm development, we define the problem statement. Once we define the problem statement, then we have to start with the collectioning of data that is called a data collection or data acquisition and gathering. So typically in a domain based when we are working in an organization which is specific to healthcare or infrastructure financial investment banking, there are many scenarios where we cloud get some real time data for model training. Excuse me, but in the case of when you are trying to do some research, then we go for open source data set. But ideally we have to get the data set or we have to collect the data set and understand the data. As part of understanding the data, we may have to do some amount of data preprocessing techniques which we'll see later. After that we have to see what is the performance metrics you are going to define to achieve the objective which we have defined. Let's take a text classification problem here. The classification result can be measured in the performance metrics of recall, precision and accuracy. So these are all comes under confusion matrix. So we have to define what are all the performance metrics we do before performing any model training activity. Now further, we have to evaluate the procedures. When we take a model training process, we divide the data set into training, testing and validation. So we have a split of around 80 or 680 2020 or 7015 in this kind. When I say the numbers are in percentage, right, we have to split the data in train test and validation, and we have to see whether the model can perform well with the validation and test data. Again, the metrics we use based on confusion metrics. Next, coming to data preprocessing and cleaning when it comes to data preprocessing, okay, we got the data. In case of text, what are all the process we follow? In case of natural language processing, we remove ASCII and special characters and we do stop words removal followed by stemming and lamatization. If required, we may go for parts of speech tagging or some kind of an embeddings, which may be required before building a model. Now, once we come to the construction of a baseline model, so when we are using models like random forest or bayesian network or generative adverseial networks, Ada boosting, exg boosting, when you're using models like this, we may have to think about the, we don't need to think about the baseline model because they are already a base models where we use a data set to further train and we do the prediction or classification with that algorithm. But the approach which we are going to talk about, baseline model is bit different. What I'm saying now, once I walk you through on the slides, you will understand what I mean by the typical base models. Then we have to fine tune a model. So the fine tuning a model comes up with hyperparameter tuning and that techniques also. I'll talk later here, but ideally we have to get a pretrained model and fine tune a model to perform a specific task. Then how do we deploy and operationalize or industrialize the model in cloud environment? All right, so we have discussed about the lifecycle of ML algorithm. Let's move to the next slide. Before getting into a model training activity, I clearly wanted to define the problem. What we are going to solve here is natural language processing tasks. As I said, natural language processing could be language translation or entity recognition, or it could be a spam detection, or we do some amount of part of speech tagging, text generation or document summarization. Question answering there are many natural language processing tasks involved, but ideally for this use case or for this demo, I'm going to walk you through or take you through on text classification. Right, let's move on. Now let's come to the natural language processing as a concept human understand English as a language or any other language which he has been known to from the birth. But when it comes to natural language processing, machine has to understand the language, right? So when I say machine, so whatever how humans interpret the language under response. Similar way we are building a machine learning or AI platform or AI machine to perform a specific job. That is what the natural language understanding has been given to the machine. So how human have an understanding by reading the text. Similarly by having or building a model, but to perform like a human to have a natural language understanding based on that, it determines the answer. Machine understand. Okay, this is the natural language understanding I got. Now what is the response I have to make which would be in human readable format? Again, we can make an output as a natural language generation, which could be text abstraction, text summarization, or we can do any natural language classification job. This is what I'm going to walk you through. So right now I put a Bert model here. If you could see in the center of the picture. But yes, I'll elaborately talk once I walk through the next couple of slides. Ideally we pass an input and we ask the model to classify. Then here it could be a spam or ham. So based on that it performs it. Let's move on to the next slide. Okay, hugging face now you would have heard this is getting very popular. Now. Hugging face is a framework or library to solve most of the NLP problems. They have built 40,000 models around. They have built by now as of today which are having all as a pretrained model and some amount of instac based model or fine tuned model also available. Now we are going to use an agingface platform to perform our model training. Okay, now as I said, agingfare is the most popular framework which has been used by right now, sorry. It has around 4000 models or which can be deployed in cloud which is based on Pytorch or Tensorflow. Even Keras library are supported in hugging phase. Ideally we use a transformer based architecture models to develop our models. Now when you are talking about hugging phase, as I said, there are 4000 pretrained model and for each task they have a separate model. Let's say when we want to perform a text classification they have Bert, Robota, distal Bert XLM, Robota. Similarly for language translation they have Marian, Mt. Bard and T, five. For V and chat bots they have GPT, GPT-2 and now we would have got GPT-3 and four as well. When it comes to named entity recognition. Again, we can use the Bert model. Ideally, I'm going to talk more about the Bert model. The reason why I've kept Bert here is Bert is nothing but a bi directional encoder representation for transformer. It is based on the transform architecture or all the attention is you need based on that they have a transform architecture. In that way, Bert has been built once it came in 2018 or 19. Then it shook the industry to think about the whole machine learning development has been taken into a next space or next level. Okay, now, so we are going to use BERT model and we are going to fine tune the BerT model. Bert model has comes up with its own strength like it is based on masked language modeling and next sentence prediction. So if you want to know more about the Bert model, I have a separate video. Please go and have a look into that. Now. When coming to the Bert now, Bert can perform multiple tasks, but as a general model, you can do a downstream job to make it specific to a domain or specific to a task which has to be performed. So yes, Bird can perform text classification or text generation or next sentence productionize question and answering. Similar like a chatbot, it can also perform. But how do we fine tune the model? Right, we have a prechained model, then we fine tune a model based on the data set. Then we deploy the model. We deploy the fine tuned model in production in a cloud environment. All right, let me walk you through the as I said, we are going to take the text classification example. Our objective is to understand the sentiment. Let's say this is an amazing model. Then we are going to say whether it is positive or negative. That's what the classification job does. Now, I'm taking a binary classification here. Going to call that as a positive or negative here. So this is an example I'm going to take. Now I'm going to walk you through on how we can perform model chaining. But before getting into model chaining, I want to tell you about ML Flow. What is ML Flow? ML Flow is a platform or is an API library which can be injected into your model development process to perform all the model tracking and model experiments. What I mean by that, we can build many models and many iteration of models reasoning. We have to fine tune the model. We have to change the parameters of the model. Once we keep changing the parameters of the model every time, model will have a different outputs. What could be an output here? It could be an precision recall accuracy, f one score. F two score. There are many elements we consider as part of model development activity. So there could be a scenario if in case of text classification, we have seen positive or negative, how much I'm more inclined to positive. If I always wants a positive, I should not miss any false negative means. If algorithm says it is a negative, but actually it is a positive, I should not miss these kind of scenarios, right? So if I should not miss any false negative, then I'll be focusing more on recall. Similarly, when the algorithm says okay, it is a positive and algorithm says it is a negative, then again it comes under false positive, right? So it misses the crucial element. Right. Now, to handle these kind of scenarios, we need a tracking platform which is called ML flow, which is used to record and track all the experiments along with the results. But I can also show you a quick sample on how the code looks like by having an ML flow and without an ML flow before that, this is how the model experiment looks like. When I talk about model experiments, let's say I'm going to train an algorithm and I may train n number of times. So I wanted to know based on which seed and which parameter my model really performed. Well, considering the scenario, I'll take all the historical experiments which I performed that would be tracked in ML flow, which you can see each, along with the timestamp, you can see the model which I ran, and if you go deep and along with that features, what kind of features I configured, then I get an accuracy, precision, recall value, whatever. I have that in a metrics for confusion matrix. This really helps in performing in multiple iteration of experiments and to get the tracking of the models. All right, now coming to the code. So typically what we do to train and model, we load an input data and we extract some of the features. And I'm using an Ingrams to extract the features. Then I'm going to train and model, and I'm going to compute the accuracy. Now, what version of my code was this result from? No idea. To perform this, we need an ML flow tracking, which is ideally used to track all the experiment results. Now let's see how the code looks like with ML flow. So with pythonic way, by having a packages import having ML flow and ML Tensorflow, then we say ML flow start run as a run. Then we start to log the metrics. Then we keep training our model along with fine tuning the parameters. In this way, everything get stored in a database, ideally in cloud environment. We configure with an S three bucket of AWS service, Amazon Web service. Then once the setup is done, then we add an implementation accordingly to make an ML flow start, and then iterate the model multiple times and keep having experiment results get stored. ML flow comes with a default UI where all the model experiment can be visualized, which I've shown you in the earlier slide. Now coming to the model training. So now the reason why I kept explaining about the experiment and tracking and all. When you start the model training framework, you should have all the experiments needs to be tracked somehow. Right now I've taken a small example code of how do we perform the model training activity. So ideally we are going to use a BERT model which you can see somewhere, which I am using a pre trained BERT model. Then I'm using auto model for sequence classification reasoning. I can put this instead of BeRT model. I can try with robota XLM, robota distal, Bird, Biobird, GPT-2 or GPT-3 any of the pretrained models I can put here. So when I build a framework I have to call the transformer library. Then I use auto tokenizer and auto model for sequence classification and load the pretrained model and in the next line I'll show the code further. But before that we are using a GPU machine to run all this model training activity because this is a large language model. Then again when we are doing a fine tuning, it recurs a GPU machine with CUDA library to perform this activity, the model two device and the torch device, by using a CUDA library specifies the GPU would be let's say I'm using an eight GPU, the processor would get split into multiple GPUs and it starts to perform the model training activity. The reason why we're using auto model so this is a framework which we can build and by passing in the command line prompt the model framework. Based on that we can further train the model. Now coming to the model train which you can see is an abstract class to perform the model training which has been given by transformer architecture. Then I can start training model will get trained. Then I can keep changing my training. Hyperparameters it's based on learning rate, number of epochs and lock size. And there are additional parameters which we can use which mainly we use learning rate and number of epochs which is used for fine tuning the model parameters. Right? Now, once we train the model, then we use a data loader, then we use a model fit to train the model and along with that hyperparameters tuning, the fine tuning job is nothing but take a pretrained model and fine tune the model to a specific task. In Bert we are going to perform a text classification for that give an input data set and keep training the model until you get the accuracy or recall and precision to a certain benchmark. 85% or 95%. How much would you require based on the problem definition or problem statement which you defined? Now we have done with the model training, then we have to evaluate the model. As I said at the beginning of the conversation, when we get the model we have to do the model evaluation metrics, then split the data into train test and validation. Then once the training has been performed with the train data, we can use evaluation or validation data set to evaluate the model. Then further we can use a prediction logic which could be based on logics or softmax classifier or neural network in the behind. I don't want to go deeper in that. Our idea is to define a framework, do a model training and productionize the model, or deploy the model in cloud. That's where our focus is. If anything, please feel free to reach out to me after the presentation or after this live event. Then we can discuss or take the conversation further. Now talked about model training and model evaluation and we can use model accuracy, sorry, model prediction and based on the model prediction we can compute the confusion matrix score that can be used for taking the model to production. Now how do we take the model to production? That is another interesting problem which can be solved by using Pytorch serving, right? So now when we say PyTorch serving of ML models, Pytorch serving is nothing. But we have built a model in Pytorch and how do we serve the model in production to achieve the low latency and scalability problem, right? So as you could see, once you train the model and you evaluate the model, and if you feel the model is good enough to take to higher environments, then you have to convert the model into Mar file. So which is nothing but a torch serve which we've been using. And we have a model store where we have to convert the model into Mar file and the model has to be deployed into a pytot serving inference place. So there is a logic which we have to follow. We have to build the docker image. Once we have a model, then we have to deploy that into the ECs that we can take in the next slide. But overall torch serving will help us to deploy the Mar files inside a model store. Then it will have an inference API so that via invoking an API we can call the model prediction results. So internally this has an architecture where you can have multiple model can be served in a single python serving instance. Right. Now, ideally this can be used for an API invocation to call all the models which has been deployed inside the Python serving model again. For more questions please feel free to reach out to me after the event. Now once we have the Pytotch serving, this is the most interesting piece. So we have trained and deploy or sorry productionize the model in cloud right now whenever you talk about the model training activity. Once the Mar file is generated, we have to push inside the S three bucket because all the model can be stored in s three bucket because that is a huge file and is a blob storage or it's a file storage. S three bucket from Amazon can be used to store the models. Then we can use an ECR elastic container registry to push the model. Or we have to push the image into an ECS or EC two instance. There we could see we build every model as a docker image. Then once we have a docker image, then we have an EBS elastic load balancing server or EBS storage is used in the backend to connect to the ECS and the model can be stored over there in the model store. Then from there via inference API we can pick the model by writing a python function or Python code that will be deployed as a docker image or docker container. Then it can pull the model and it can perform the inference logic or it can do the prediction. So now you can see how the old model is getting developed from the time we start the model development to productionize the algorithms. Now let's talk more about the AWS cloud environment. So already we have a sage maker, but let's not use a sage maker or sage maker endpoint. But ideally we are saving a cost literally by having ECs, ECR and S three bucket and perform a model training in a GPU machine. And once the model has been trained, push the file with and we can write some scripts to push the trained model into S three bucket. Then once that is done we can use Jenkins Ci CD pipeline to push the docker image into an ECS container which underneath it uses Fargate or EC two. In my case I refer to EC two, right? As I said, once a model has been built, all the models will get stored under model store. Once the models are stored under model store, there is a management API, an inference API which by using an pytotch serving command where we can provide inference API for the applications to consume. I think this is a holistic step involved in creating the model or developing the machine learning algorithm or deploying the model in cloud environment right. Any further questions? As I said, you can always reach out to me after the event. All right. Thank you for watching my video. Have they had.
...

Deepak Karunanidhi

Associate Director - Data Science | Machine Learning @ Novartis

Deepak Karunanidhi's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways