Conf42 Cloud Native 2024 - Online

Mastering Generative AI: Harnessing AWS GenAI for Your Solutions

Abstract

Unlock the power of Generative AI with AWS GenAI! Dive into Amazon Bedrock, Trainium, Sagemaker, and more. Discover how to build, scale, and optimize your AI applications efficiently, cost-effectively, and at scale. Don’t just keep up with the AI revolution, lead it!

Summary

  • Samuel Baruffi will present a session called Mastering Generative AI, harnessing AWS Genai for your solutions. He will talk about the solutions within the AWS platform that allows you to run, create and operate vector databases. And he always like to end the session with a demo.
  • In order for this generative AI explosion and evolution, what is actually powering them behind the scenes is what we call foundational model. The good thing about those models is they are generic and general on their own. And we are going to talk about how AWS will help you choose different models for your use case.
  • Another thing that a lot of companies are doing now is customizing foundational models for their own data. There are different strategies. You can use retrieval, argument generation, or you can use agents to actually retrieve the data. On top of having foundational models, you can also customize those to even be more expert on the data that you have control of.
  • Companies are building use cases to enhance customer experience, boost employee productivity, or optimize business processes. In order to run those models, those models require a lot of computational, specifically parallel computational. AWS is also in the front of innovating our own chipsets.
  • Amazon Sagemaker is the AWS machine learning AI platform. Once you run those models, you can actually deploy those machine learning models at scale. Sagemaker Jumpstart also has the capability for fine tuning. It's kind of pay as you go, but for the whole instance that it requires.
  • Most customers are nodding to the exercise of training large language models or fine tuning. They really have use cases that they want to boost customer experience or increase employee productivity. How can they keep the data that they're going to be running through those models? Secure and private, right.
  • AWS have introduced last year a service called Amazon Bedrock. It is the easiest way to build and scale generative AI applications with foundational models. There are different pricing modes on bedrock, but you start with which is the most.
  • We currently have seven model providers available for you on Bedrock. The very important thing to keep in mind with bedrock is we are democratizing the ability for people to consume different models for a specific use case. And that is really important because you can decide how you want to build your applications.
  • Entropic is one of the top leading research AI companies in the world. They have shocked the industry with very performant and set of models of three different models called cloud tree. Claw three models are already available on bedrock as we speak right now.
  • Bedrock has a feature called knowledge base that makes all this process of running retrieval augmented generation very simple. Another functionality is the ability to enable generative AI applications to execute steps outside your model.
  • On bedrock we have a functionality that is currently in preview, but it's called guardrails. It allows you to create consistently safeguards, including on your models. You can create filters for harmful content both on the input that you're sending to bedrock and also the output that bedrock will tell you.
  • Bedrock's batch mode allows you to efficiently run inference on large volumes of data. One last nice feature about Bedrock is model evaluation. Thousands of customers are using bedrock to build generative AI on top of pretty much every single industry.
  • AWS has a wide variety of different databases that support vectors. Depending on the use case, you might choose one versus the other. Very soon Aurora, Amazon, Aurora and MongoDB are going to be made available as well.
  • AWS has a service called code Whisper which is AI powered code suggestion. You receive real time code suggestions for a variety of programming languages. Some features are only for enterprise, but most features are available for free.
  • You can actually get started and play around and test some of the models by just going to the playground. On the chat AWS, well, we can compare models. If you want to generate images, you can actually generate images quickly. Few more things I want to show you.
  • How can I call those models on bedrock using a programmatic way. Each model have a specific body and format that the model providers have configured. And then I wait for the response. I parse the response into JSON and then I print a response. Now there might be applications that you're trying to build that require streaming.
  • AWS bedrock is amazing because with very simple API calls, I can call different models with different configurations with different I parameters. All what I've done here is probably less than a penny because it's all on demand. Last thing is I'll recommend if you want to look for some of the code that I've used.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Welcome to my session. My name is Samuel Baruffi and today I'm here to present a session called Mastering Generative AI, harnessing AWS Genai for your solutions. Let's look at a quick agenda. What I'm going to be covering on my session, I'm going to start a presentation talking at a very high level about generative AI and the big impact in the world and also into applications being built today. Then I'm going to talk about at the very infrastructure level, what AWS is doing with our own chipsets called inferential and trainium, and also talk about a wide variety of EC two instances with Nvidia graphic cards. After that I'm going to start talking about more on the application services and the platforms that you can build models and use models. So I'm going to talk about Amazon Sagemaker, I'm going to talk about Amazon Sagemaker Jumpstart and then I'm going to spend the majority of the time talking about Bedrock. Bedrock is our foundational models as a service is the ability for users and companies and organizations to call a single API, choose different models from large language models to text generations from embedding models, and easily receive the response and actually build solutions on top of that. Bedrock is a very exciting service that has a lot of features baked in and being shipped as we speak. And I'm going to cover some of those features as well. Then I'm going to jump into a single slide that talks about vector databases. I'm going to talk what is a vector database, why it's important for generative AI solutions. And of course I'm going to be talking about the solutions within the AWS platform that allows you to run, create and operate vector databases. Then to finalize the presentation piece of my session, I'm going to quickly talk about code Whisper, which is a very exciting tool for developers that can actually have a companion helping with generation of code on many different types of programming languages and also infrastructure as a code. And then I've done quite a few talks with comfort e two in the past. I always like to end the session with a demo. So I'm going to do a demo potentially using bedrock and showing you how easy it is to use bedrock, some of the functionality of bedrock, and hopefully showing you with a simple piece of python code how easily you can call bedrock using AWS SDK and receive a response from a specific large language model that you're going to choose. So let's just get started. So what is generative AI? Right and how are those generative AI powered? The most important thing to be familiar with is, in order for this generative AI explosion and evolution, what is actually powering them behind the scenes is what we call foundational model. Foundational model. You can think about an AI model that has been actually pretrained on vast amounts of instruction data. Those state of art models that are going to be talking later on in my presentation are known to be trained across pretty much the whole Internet, right? Like, if you think about it, like, all the amount of data that the Internet has, that is actually all the data that those models are being trained, then it contains a large amount of parameters that makes them capable. By learning all this data, which are the parameters, they make their own word interpretations, and they actually can learn very complex concepts by that. And there is a technology that was invented, I think, 2017, by a group of researchers from the University of Toronto that is called transformers, and that is the architecture using neural networks that allows generative AI and large language models to actually generate text. The good thing about those models, those large language models, is they are generic and general on their own. So it's not only trained for a specific use case, it can actually solve a lot of different use cases. And I'm going to talk in a moment some of the use cases that companies are starting to build. And these number of use cases are pretty much infinite because you can think about those large language models as a model that can answer any type of question. And of course, we need to evaluate performance and how big those models are and how smart they are. Not every model is exactly the same as others. So there is that. And we are going to talk about how AWS will help you choose different models for your use case. Another thing that a lot of companies are doing now is customizing foundational models for their own data. So let's say you are a financial company and you really want to have a specific model super well trainium, to know every single piece of information that your company has. There are different strategies. You can use retrieval, argument generation, or you can use agents to actually retrieve the data, or you can actually customize foundational models. Customizing foundational models are also known as fine tuning. That is also a capability that you retrain the model to recalculate the weights of those models, to add your data, that potentially your data was private, it was not available on the Internet. And now we want to actually have those models also expert on your own data. We are not going to talk in depth about those strategies, but that's something that you just need to be aware that on top of having foundational models, you can actually customize those to even be more expert on the data that you as a person or potentially organization have control of. So some of the use cases that we've seen the industry actually building. So there are kind of three main categories, and this is just a very small amount of use cases that companies are building. But the first category is you want to enhance customer experience. So using chat bots, virtual assistants, conversational analytics, you can personalize a specific user interaction, because those models are really good at behaving and answering like humans, even though they are not humans, but they answer like a human. It makes really good to actually enhance customer experience, right. Especially with chatbots. The other section is you can boost employee productivity and creativity. So you can think about conversational search. If an employee or a customer wants to really retrieve an information from a wide variety of data set, instead of just doing a quick, simple search, you can actually have conversational with those documents. You can do summarization, you can do content creation, you can do code generation like we are going to be talking about code Whisper. And the other category is you can also optimize business processes. So let's say you want to do a lot of document processing, you want to do data documentation. Let's say you have a specific form that needs to be filled. Given a specific data, you can join those two sets of data and ask the model to potentially feel in a specific way, right? So there is a lot of use cases that companies are building or have built already to enhance customer experience, boost employee productivity, or optimize business processes. So that is just something you need to keep in mind. Now, in order to run those models, those models require a lot of computational, specifically parallel computational. One good thing about GPUs, graphic process units are that they are inherent good at calculations that can be run in parallel. So the reason why I'm saying that it's really important for us to understand how AWs have a wide variety of selections of different instances. You can choose to run those models. So AWS have, since 2010, a very good partnership with Nvidia. Nvidia have been the leader of having very powerful GPUs. And those GPUs are really good for large language models. You can see here, there are widely variety of instances that offer specific instances, specific GPUs from Nvidia. The most powerful one is the P five EC two instance that comes with the Nvidia age 100 pencil car. Those are really big GPUs that can run very big, large language models, and they have been used for many companies, specifically on the generative AI solution. But apart from the good partnership that AWS have with Nvidia, AWS is also in the front of innovating our own chipsets. So we have graviton, which is a general CPU, but we also have accelerators called trainium and infra. So trainium is the GPU accelerator that is focused on giving you better price performance, up to 40% better price performance than other comparable GPUs for training your models. So it's a very specific car that has been built by AWS to allow you to have a better performance and price when you're training machine learning models. And then once you have those models, you need to run inference on top of that. And AWS have also created inferential chipsets. We have two different chipsets, inference one and inferient two, that also give you a better price performance to actually run inference on top of those models. I'll highly encourage you to just quickly search about those. There is a lot of good documentation. Feel free to reach out to me on LinkedIn as well if you have any questions. Now, let's move from the infrastructure level to actually platform and services that AWS offers customers to actually run those models. Right? But in a day, organizations are looking, how can they run those models and use those models for all the use cases we've just discussed? So, the first platform I want to talk is Sagemaker. So, Sagemaker is the AWS machine learning AI platform that encompass a wide variety of features, from building new data sets, cleaning new data sets, enriching new data sets, training different models, different machine learning models, neural networks, you name it. You can actually run on Sagemaker once you run those models, you can actually deploy those machine learning models at scale. You actually manage the computational for you. You have monitoring, you can actually have evaluation. You can do a roll of the automatically fine tuning and distributed training for big models on top of that. And you can actually, after you've trained, operating all those models on Sagemaker. This is a very, in the last six years since Sagemaker was launched, we have introduced many, many new innovations like automatically model tuning. So you can deploy and train the model. And as you train the model, we will find the right parameters to actually fine tune and tune the model for the better performance of what you're trying to achieve. And when I mean model, I'm not only talking about large language models, I'm talking about any type of machine learning model that you want to actually build, train, and deploy. Now, when it comes to generative AI, Amazon Sagemaker has a very specific feature that really helps developers to get quickly testing different larger language models. The name of the feature is called Amazon Sagemaker Jumpstart. Sagemaker Jumpstart is a machine learning hub with foundational models made available that you can literally just click and deploy. It'll give you the recommended instance types that those models should run, depending on the size of those models. So right now there are more than hundreds of different models that have built in algorithms that have been pre trained with foundational models. So a lot of those models are available on hugging face. Some of the Amazon Alexa models are also available for you to deploy. The good thing is this is all UI and API based with a single click of buttons or simple API calls. You can actually have a machine, an EC two machine running on Sagemaker actually hosting that model, and all the things that you need to do manually by running to actually run that model and do inference on that model will be taken care of for you. We have a lot of notebooks, Jupyter notebooks that have examples on how you can actually do that. And the good thing about Sagemaker Jumpstart, some models that are available on Sagemaker Jumpstart also have the capability for fine tuning. So if you want to customize a model, let's say the Falcon model, 180,000,000,000 parameters you want to customize with your own data, you can go on sage maker Jumpstart and you have actually an ease walkthrough way of fine tuning those models. One thing to note here, that is going to be very important to differentiate it from other services like bedrock that I'm going to talk in a moment. Sagemaker Jumpstart helps you run those models, but where you're running those models is actually on a EC two instance that you're paying every single minute or second that you're running those, even though if you might not be using that, you are actually paying for that instance. Those models are actually running on your account, on your ECQ, on your sage maker that behind the scenes runs ECQs, but you're paying for those. So it's kind of pay as you go, but for the whole instance that it requires, right? So you're not paying per tokens, you're paying for the whole instance. That is something that you just want to watch out because depending on the models, it can be very expensive. But let's continue our journey regarding generative AI. So when we look about what customers are asking for generative AI is which model should I use for a specific use case? How can I move quickly? Most customers are nodding to the exercise of training large language models or fine tuning. They really have use cases that they want to boost customer experience or increase employee productivity. For example. They just want to reinterate run POCs very quickly. And most important, how can they keep the data that they're going to be running through those models? Secure and private, right. That is a very important thing. You have your data. Your data shouldn't be used to train new models if you don't want to. And they should be kept secure encrypt by default. So with all those three questions being asked by customer, AWS have introduced last year a service called Amazon Bedrock, which is the easiest way to build and scale generative AI applications with foundational models. And we talked about foundational models. Those are the models that, large language models that are very big and they can do a lot of general tasks. What does Amazon bedrock offers you? First, it offers you a choice, a democratization of leading foundational models with a single API. And this is one of the most amazing things about bedrock. You can use the same service and the same API, just choosing a parameter of your API by choosing what model you want. And the model list has been growing every single month. And you see in a moment what are the models and model providers that bedrock currently offers. But you can expect the model list and those capabilities to grow as we speak. You can also run retrieval augmented generation on top of that. And I'm going to keep that on hold for now because I'm going to be talking about a feature on bedrock that helps you do that. You can also have agents that execute multiple steps tasks by running lambda and calling your own APIs or outside APIs automatically. And most important, bedrock is security, private and safe. Every data that you put to bedrock is not going to be used to train your models. It's encrypted by default and nobody else has access. This is really important to keep in mind. You can also have vpc endpoints from bedrock so the data never leaves your VPC. It goes through your VPC to a vpc endpoint to bedrock where it hosts the service. One important thing to note about bedrock, different than Sagemaker Jumpstart, you pay AWS, you go. There are different pricing modes on bedrock, but you start with which is the most. I guess the way we start with bedrock, it's called on demand. So depending on the large language model, the foundational model that you pick, you're going to have a price per input token and output token. When you're talking about text you have a different pricing mechanisms for image generation. But for now let's just keep it simple. You're going to pay for that, right? So it's just the traditional AWS cloud approach of pay as you go. And that becomes very promising because instead of paying for big instances to run those large language models for you, you can experiment, iterate and create new products very easily by still keeping your application in your solutions very cost conscious. So now let's just quickly talk about some of the what are the model providers that are available on bedrock? So the way bedrock is architected is you have model providers. So those are companies that have trained foundational models. And each model provider, we have a different foundational models available on that. Right. So here you can see a list of seven different model providers that you can pick from on Amazon Bedrock AWS, the date of this presentation. So today is March eigth 2024. As I'm recording this session, we currently have seven model providers available for you on Bedrock. So you have AI 21 that has the Jurassic two models available for you. Then you have entropic. And I'm going to talk about entropic in a moment. But they are state of the art models with a very big performance. Then you have cohere. With cohere you have both text large language models and embedding models as well. So if you want to create embeddings for your vector database, Cohere also offers you with very performance embedding models. And it was just introduced I think a couple of weeks ago, Mistral AI. So you have two different models with Mistral AI. The Mistro seven D and the mixture mix of exports which is eight models that are put together into a single API. Very good performance. Then you also of course have meta with Leomachu which is an open science model. Then you have stability AI. So stability AI is one of the leaders research labs for image generation. So the stable diffusion XL 1.0 is a model that allows you to generate images. So instead of just generating text, it actually generates image. You input a text, a cat walking in the park. It will actually generate an image for you with a cat walking in the park. Then you also have our own models from Amazon. Those are called Titan. So Titan models offers you a text to text model, traditional large language models. It also offers embedding models and it also offers image generation models. Right? So it's a set of models available for you. The very important thing to keep in mind with bedrock is we are democratizing the ability for people to consume different models for a specific use case. And you can go right now on Bedrock webpage and click on the pricing and you see the different pricings for each model. Depending on the performance, the size of those models, you might be paying a specific price. And that is really important because you can now decide how you want to build your applications. Remember, it's the single API call and you do not need to manage any infrastructure. That is something I want to highlight as well. All the infrastructure and GPUs to actually run those models, which is very complex and it takes a lot of capacity, is taken care by AWS for you, and that is something very beautiful that you should be using and taking benefit of. Now, I really want to talk about the partnership that we have with entropic. So entropic is a longtime AWS customer, and entropic is one of the top leading research AI companies in the world. And I'm going to talk about some of the models that they have published in the last years. But AWS, Amazon have invested heavily. I think we've announced a $4 billion investment last year. So entropic has a very good partnership with Amazon. And the way we showed the results of that partnership is, well, first of all, let's talk about the story about entropic very quickly, right? So if you look here at the timeline, we are in a very fast paced environment. In 2019, GPT-2 was launched from OpenAI. Then some researchers have published some papers about the performance of transformers. And GPT-3 was launched sometime in 2020 with Codex as well. Right? Most of the people that have founded entropic were employees from OpenAI. So they left on OpenAI in 2021 and they found Entropic. You can see how quickly they went from founding a new research lab and actually publishing very good and making available very good models. So they published some papers in 2021. Then in 2022 they finished training clot, right? And they have something called Constitution AI. I would highly recommend you to search is their whole way how they take very important care on safety and alignment for those models. And then in 2023, they have released the first cloud one model. Then they have released one of the first companies to release 100,000. Context window. Context window just means how much text you can put per request. Then after that, in 2023, they have released cloud two, which was a big improvement from cloud one. Then a couple of months after they've released cloud 2.1 with more improvements and performance. Then this year, actually last week or this week? To be honest, Monday this week. They have, I would say, shocked the industry with very performant and set of models of three different models called cloud tree. And I'm going to talk about those. So cloud tree comes with three different models. The first one is cloudtree haiku. And you can see these on this graph is cloudtree haiku. You can see the intelligence is a very performance model, but most important is a very low cost and very fast inference model. Then they have also released cloud tree Sonnet, which is their mid tier model, which is very, very intelligent. It beats all the previous quad models in terms of benchmarks, and it's in the middle when it comes to cost. As a matter of fact, claw three sonnet is much more performance than any previous quad models, but is actually cheaper than cloud two and cloud 2.1 models. And then of course, client tree, opposite is the most intelligent model and has actually beat state of the art models on most benchmarks. And you can see this data just search cloud tree report paper, you see all those benchmarks that are available. So now how this entropic incredible performance and innovation we call three model impacts bedrock? Well, because the relationship that Amazon has with entropic claw three models are already available on bedrock as we speak right now. Claw three sonnet, which is their mid tier model, very big performance with very good price. All these models are multimodal. Let me just say that multimodal means you can input text, but also you can input images. Previous quad models could only receive text as inputs. Those models can actually receive image as input. So you can put an image and you can ask questions about that image. You can actually put multiple images per input. And all those three models actually have that capability. And all those models have now an even bigger context window. Not only they have bigger context window, but the claim is that with this bigger context window, doesn't matter where you put the text on those context window, the performance remains very similar and very good, which was not actually true in previous models and actually not in the industry as well. Claw three oppos and cloud three haiku are going to be made available on bedrock very soon. They are currently not available, but they're going to be made very soon. Now that I talked about it, let's just talk about some of the functionality that bedrock allows you to use. So first, you can actually use those foundational models as it is. But if you really want to fine tune and customize those models with your data, because you really believe you've tried rag, you've tried prompt engineering and you're not achieving the performance your use case require. In my opinion this should be the last resort. But if you need to privately customize models, you can actually use right now, bedrock supports to automatically customize those models. You put your data on s three in a private s three bucket. You connect that s three bucket to bedrock and bedrock will automatically fine tune and customize those models for you. Currently you can customize models with Titan, Cohere and Lemachu. Very soon we are going to open the ability to also customize cloud models and potentially other models from other model providers. So the good thing about this functionality from bedrock, if you have done some fine tuning customization from auto in the past, it actually requires a lot of science and it can be very complex. Bedrock completely removes that. You just put your data on s three, you go on bedrock and you point the data from bedrock on s three and you choose the model. And behind the scenes bedrock, you just customize new models and you notify when your specific model has been trained. No one else will be able to use this model. None of the data that you have actually provided from bedrock on s three will be used by anyone else or to train other models. It's just your model. And you can then consume that model and run inference on that model by making an API call. The same way you call API from Bedrock, you can call Bedrock API to use your own customizable model. Now another very good thing, if you don't need to customize your model is actually running retrieval augmented generation. Retrieval augmented generation, for folks that are not familiar is just the idea that, let's say you have a big data set of documents and those documents talk about the way your company operated and you want to have a chat bot that actually answer questions about those documentations, right? Well, those documents are likely private. So the foundational model, that model providers made it available on Bedrock, they don't know about your company operational procedures. But when you're creating a chatbot, you actually want to make that available for the model itself to actually consume. So what you can do, you can use what we call vector databases. And I'm going to talk in a moment on what vector databases are made available on AWS. But Bedrock has a feature called knowledge base that makes all this process of running retrieval augmented generation very simple. The way it works is you go on bedrock, you first create an S three bucket and you put all your documents on this s three bucket. It's your s three bucket. Nobody has access. Then you go on bedrock you choose which model you actually want to run embeddings. So you can choose between Titan for now, Titan and cohere embeddings are just going through those documents, converting those texts into vector numerical, vector vector representations. And then finally you choose a vector database. And right now you have a variety of databases that you can select from. I think there are four options right now that you can select and those numbers are going to be increasing in the future. But you can select, for example, the open search serverless vector database. Then automatically bedrock will run the embeddings on the data that is on s three will store the vectors on your open search vector database. And finally, which is, let's say you want to run this chatbot. When you ask a question, let's say you ask a question, what is the HR policy for vacation in New York as an example, right? What bedrock can do, it can then retrieve your vector database by running what we call semantic search. It can find the specific chunks of text that are very likely to respond my question. You copy those chunks of text into bedrock and then you run your question, plus the combination of chunks of text that has been retrieved from the database, from the vector database, and you send that to your foundational model. Then the foundational model, let's say cloud three, will see all the chunks of text that talks about vacation policy, New York. And you see your question. And then based on the information that you have provided, because now the model has access to the chunks of data that has the answer, will be able to provide an answer for you. That is what is called retrieval augmented generation. And you can actually run very simple with knowledge bases. So that is one capability that you can run. It's all managed for you and you can choose different models to actually run. Another functionality is the ability to enable generative AI applications to execute steps outside your model. So let's say you have an API where if someone on your chat bot asks the question about what is the current price of this stock, right? The model is not going to be able to answer that question. Or probably if he answered that question, it's going to hallucinate, meaning it's not going to be accurate, right? Because the training data from that model was probably months ago or years ago. What you can do on bedrock, you can use agents for bedrock. What allows you to do is to provide. So you select a model, let's say cloud model, you provide the basic of set instructions, then you choose different data sources, maybe different APIs, and then you specify the actions that it can take. So the example I provided, right, you can say if someone asks you about the pricing of a stock, you need to call this API. Here is the open API spec of my API and this is how you can call the API. So what agents for bedrock do you ask? A question for your model. Your model realizes it needs to actually make a action, take an action on that request. Behind the scenes, what bedrock will do, we will actually call Lambda, which is a serverless compute platform with lambda. The model will actually trigger a lambda. The lambda code will already be prebuilt. Behind the scenes you call the API that you have told bedrock to do and then that API will come with a response, let's say the value of your stock. And then you return to the larger language model to provide you with the response. This is just one example, but what you can do with bedrock, you can break down and orchestrate tasks. You can invoke whatever API on your behalf so you can do a lot of automation. And the capabilities here are really infinite. It's just you configuring those agents properly so you can do a lot of chain of thought as well on top of that. So moving on, on the ability that, what are the ability that we have for making the responses very secure and safe? On bedrock we have a functionality that is currently in preview, but it's called guardrails. What guardrails allows you to do is to create consistently safeguards, including on your models. Doesn't matter if they are fine tuned or agents what it does, you can create filters for harmful content both on the input that you're sending to bedrock and also the output that bedrock will tell you, right? So I'll give you an example, right, let's say the example you see here on the screen. Let's say someone asks you about investment and device on your chat bot and you don't want to have that input and actually output to be sent to the customer. Right? So what you can do, you can create those filters and you can say these topics deny and then this is the response you should be giving back if someone is trying to ask questions about investment advice. So you don't get into legal complaints or problems that you might get into the future, right? So this is one of the capabilities that is available on bedrock and it's called guardrails. Another functionality that I'm going to talk about it is batch. So everything I've talked about so far is you just run an API call and you receive a response. Pretty much synchronous, right? API call goes in, API call comes back. There are some use cases that don't require live interaction, but you want to run a lot of inference for a lot of documents in a batch mode. So what Bedrock can do, its batch mode allows you to efficiently run inference on the large volumes of data. So you can put the data on s three, you can put different json files with the prompt and the data you want to run behind the scenes. Bedrock, you grab those files, you run the inference, you save the results of those inference in another s three as the result, and it's completely managed for you. And once the batch is completed, you can get notified and you can do a lot of different automation. So you don't need to write any code for handling failures or restarts. Bedrock would take care of that for you. And you can run that with base foundational models or your custom trainium models as well. One last nice feature about Bedrock is model evaluation. It's still in preview, but model evaluation is a really good feature of Bedrock. As you saw, Bedrock offers you a wide variety of model providers and models available. From those models providers, it can be really complex to evaluate those foundational models and select the best one. So what model evaluation on bedrock allows you to do is to choose different tests. And those evaluation tests can be either automatic benchmarks that the industry use and bedrock makes available, but you can also create your own human evaluation. You can have actually humans evaluating the response from different models and rating those models without actually knowing which model it is. So there is no bias into the place. And you can have your own data sets of questions and you can create your own custom metrics or use the metrics that comes with it. So some of the metrics that are there are the accuracy, are the toxicity and the robustness of the response. And you can see here a screenshot of a human evaluation report across different models being tested and automatic evaluation report. So I've talked a lot about the different features that bedrock makes available for you. But one of the things that is important to highlight is right now, thousands and thousands of customers are using bedrock because the capability, the democratization, the flexibility and the feature set that bedrock allows them to build generative AI on top of pretty much every single industry, right? So you can see big names like Adidas, you can see names like the BMW group, Salesforce and many, many others so highly encourage you to test bedrock because it's a really cool feature. Two more things before we go to the demo is we talked about the retrieval, augmented generation and the need for vector databases. And I just want to quickly tell you the story about vector databases on AWS. AWS has a wide variety of different databases that support vectors. As you can see here, we have six databases that are now supporting vector databases and depending on the use case, you might choose one versus the other. The important thing here is to understand that AWS is giving the flexibility to pick and choose from the database that makes the most sense for you. So a very popular database on AWS for vector is opensearch. So OpenSearch has a functionality for vectors and you can actually even run OpenSearch serverless for Vector database that have a very good performance and price. But you can see here documentDB now has support for vector. MemoryDB for redis has also support for vector RDS for postgres. So if you're running a SQL database and you have a relational use case and you also want to run specific vectors, you can run pgvector which is a plugin library for postgres that can run also on top of both RDs and Aurora postgres. If you're doing graph databases, you can actually run on top of Netun. And as I talked about it right now, the direct integration for knowledge databases on bedrock supports open search, redis, enterprise, cloud and Pinecone. But very soon Aurora, Amazon, Aurora and MongoDB are going to be made available as well. So that is about vector database. The last thing I want to talk about it is the capability for code generation and code assistant for developers. So AWS has a service called code Whisper which is AI powered code suggestion, as you see here on the small video that has actually let me play it again, the small video that is demonstrating here, in this case it's a JavaScript code. You can provide a single comment, in this case, parse the CSV string and return a list of songs with positional or position original chart date, artist title and ignore lines. We're starting with hashtag, right? Then you just click tab and it automatically returns the code generation. This is pretty cool. And the way it works is you just have your ID and there is support for a variety of vs code, jetbrains, cloud, nine, lambda, Jupyter notebooks. There are supports for pretty much all the popular IDs out there. Install the plugin from AWS that has code whisper support, then you can receive code suggestions, and code whisper can actually do more than that. You receive real time code suggestions for a variety of programming languages like Java, JavaScript, go. Net, and many, many others, but not also programming languages. If you're building infrastructure, AWS, a code terraform or cloud formation, you can also have suggestions for those. On top of being an assistant for developers and improving productivity quite significantly, you can also have a security scam. So the code that is being suggested for you can actually give you security suggestions to make sure you're writing actually secure code. And you can also have reference tracker for different licenses on open source on the data that has been trained. So if whatever suggestion is being given to you has been trained from an open source repository that has a potentially prohibitive license, you can actually have that warning telling you. And if you are an enterprise version of code whisper, you could say developers should never receive recommendation for code that has been generated on this specific license that is prohibitive for my business use case one of the great things about code Whisper is code whisper for individuals are free. We are only one of the only companies that have an enterprise grade product that if you're using for an individual user like not a company, you can install codebisper created an AWS builder account. You don't even need to have an AWS account, you just need to have a login with build Id we call builder ID. You can use it for free. Some features are only for enterprise, but most features and most important feature which is code suggestions are actually available for free. So I highly encourage everyone to take a look on this and then hopefully this was a good overview of the offerings on AWS for generative AI, most important for bedrock. So I'll pause here and I'll come back sharing my screen to actually do a presentation and a demo on how can you utilize some of those functionalities in the real world? Actually clicking buttons and making API calls and writing some code. Awesome. So let's quickly jump into the demo. Very simple. I have logged in into my AWS account. I can search here the bedrock service. I'll go and I'll jump inside the bedrock service. Right now bedrock is available in a few AWS regions. In this example we are using North Virginia US east one region. If I click here on my left side you can see the menu, you can see some examples how to get start. You can just open those on playground here. If you click on the provider you can see the providers that I just actually showed to you on the presentation. You can see some of the base models. So each provider, for example entropic here have the cloud models. You can see all the different cloud models that are currently available. So I have for example cloud three sonnet, which is the median model that have just got released this week. Have cloud 2.1, cloud two and cloud 1.2 instant. In this case, I don't have any custom model, but if I had trainium before a custom model, I would see the list of training models here. If I wanted to customize a new model, could just go here, create a new fine tune job, or continue fine tune job. But the thing I want to show you, you can actually get started and play around and test some of the models by just going to the playground. So if you look here, the playground, you have the chat option. And what I really like about the chat option, I'll just give you first example how you can actually talk to a model. Let's say you want to talk to Claude and you want to talk to the new Claude tree model, which is one of the most performance in the industry. So let's just say, write me a poem about AWS and its ecosystem. Just a simple poem here. So I can put the entry here. Because this is a multi modality model, I can also put an image, I'll do a demo of an image in a moment. You can see all the configurations of the hyperparameters, like temperature, top P, top K. The length of the output can be controlled here. In this case I'm just keeping for 2000 tokens maximum. You can see this on demand. If I click run, it's actually calling the model and then it's actually generating, in this case generating a poem for AWS and its ecosystem. Right. You can see that it's pretty cool. One thing that I really like about the playground are the following, as it's finishing generating here, if we click on the three dots on the top menu, you can export it as JSON and you can see streaming preference because you are streaming. But the other thing that I like, you can go down and you can see some model metrics. So you can see this actually took 15,000 milliseconds. You tell me how many input tokens, how many output tokens, and this is the cost, it's 0.0. Because it's less than 0.0, there is like be a zero point something that this will cost. Right. What I really like about it on the chat AWS, well, we can compare models. So let's say I want to compare claw three versus claw 2.1, right? And I'm going to talk about it here. Let's see. Talk about the word economy in the 99 year, right? And I can go and I can run. So it's going to run both models at the same time and I will be able to compare the performance of both models, this is just like by reading them. So let's just wait a little bit. So you can see here he has outputted. So cloud 2.1 has outputted. I can see here, compare the response, but I can see down below here how many tokens each one of them had and so forth. Now you also have a text playground instead of a chat. It's just like you send one request and you get the response. What I like about this, you go here and you select the model. Let me show you what I like about it. And let's say write a small poem about New York City, right? Let's just run this. What I really like about this. So it's streaming back. But the best thing about it, if I click on the three dots, I can actually see the API request. And this is actually how I would actually call this model through API. Right? In this case it's using AWS CLI, but you can see the message here and all the formats get properly configured for me. And in a moment I'll show you some python code on how you can actually do that. Few more things I want to quickly show. If you want to generate images, you can actually generate images with stable diffusion stability, AI and Amazon Titan. So I can say create an image of a cat in the moon. Let's just ask this for the model. Let's see what actually outputs for me. And then you could do whatever you want with that image, right? So you can see it's cat with the moon. There's very simple image. We can say create picture of a cat that is super realistic. Let's see if it does more like instead of you saw, there was more like a paint with the moon in the background. Let's see if this does. And this is what I'm doing here is just prompt engineering. I'm not using anything specific. There you go. You can see an image here that is a more realistic cat image, right? Remember I talked about some of the other features like guardrails. You can see the guardrails here, you can create the guardrails, you can create the knowledge base, you can create all the agents. Those are the things you can do. One of the things I want to highlight, if you were to start using Bedrock for the first time, the first thing I would recommend you doing is actually going model access and enabling those models to have access on your account. You don't pay anything for just enabling them, but if you don't enable them, you can't use, and it's as simple as just going on manage model access, selecting the models you want. In my account you can see I have access to all those models, right? So it's pretty straightforward. But now let's jump in into some code, right? Likely most people that are here watching my session are probably developers or people that do code. So how can I actually call those models on bedrock using a programmatic way. So the first example here I'll show you is just calling cloud, right? So you can see I import bodo tree which is the AWS ssdk for python. And then I instantiated the bedrock runtime from the SDK. You can see the bedrock runtime in the region. Here is the payload. So I'm providing the model version. So this is quadrisoned then the body. Each model have a specific body and format that the model providers have configured. And you can see that in the documentation. I can show you the link in a moment. But once you have actually this, you create a model. In this case you create a prompt. In this case just saying write a text about going to the moon and its technical challenges. Right? Then I create that payload into JSON and then finally call the single API that I've talked to you about, bedrock. So we always call bedrock invoke model. And this is not using streaming. I'll show any streaming version in a moment. And then I wait for the response. I parse the response into JSON and then I print a response where exactly the text from the response is. So if I go on and run pythoncloud py, I'm going to call behind the scenes that's actually calling bedrock, sending the payload that I request to claw tree. Running the prompt. Remember my prompt is talked about the technical challenges about going to the moon. I write a text about that. So once he actually returns the text from bedrock then I will be able to just see the text. And there you go. You see it wasn't streaming. So it's a pretty long text saying embarking on a journey to the moon. Present multitude of technical challenges. Not going to evaluate this but you get the gist, right. So this is example one. The second example is calling the same bedrock API but for a different model. So you can see here I'm also invocating a bedrock. And then I'm just saying can you write me a poem about apples? Right? So let's just call this python three titan Ui. So now this is calling the Amazon Titan text model. And you can see it was very simple poem about apple. Now there might be applications that you're trying to build that require streaming. Like a chatbot, you don't want to make the user waiting for, I don't know, like a minute to get a response back. Sometimes those models take a while to finalize the whole text completion so you can do streaming. So in this case, very similar to what I've done before. This is a demonstration of using clotry sonnet, but with a streaming right. So I'm not using multimodality yet. This is just text. So I have an input text and you see this just creating the input payload. Then you can see here the API that I call is just a little bit different. It is the invoke model with response streaming. So what bedrock does, as soon as you start receiving some chunks of text from the model, we actually output back for you. And then here it's just like, as you get the response, just display that for me, just do a print on the console for me. And then on my main here, I'm finally providing some model IDs and providing the prompt. In this case, what can you tell me about the, what can you tell me about brazilian economy? And then I'm just starting the border tree with bedrock and then calling the function that I created above. So if you go here, clean the screen and you go cloud streaming, and we try to run, oh, sorry, python three, apologies for that. Python three cloud streaming. So it's invoking my row and you can see now it's streaming the response back and it's actually getting the response about the brazilian economy. And you can see here it finalized. So I even predicting, like, okay, why it stopped because it was end of the turn. It finalized the response and also how many output tokens you got. So that is just one example. The other example is I want to use clotrisonet because it's a multimodal large language model that also accepts images as inputs. So what I want to do, I have this image of a very cute cat. I want to provide this image to the model. And you see here what I'm doing. Very similar again, but now what I'm actually doing, I'm receiving the cat image. I'm encoding that on base 64. Then I'm providing on my messages for clot tree as a content. You can see I'm providing now an image and a text. So the first I'm providing the image AWS base 64 mastering. And then I'm saying as the prompt, write me a detailed description of this photo and then upon talking about it. So that's the request. And if you see down below here, I'm invoking the model. So this is not using streaming. Remember the example before was using streaming. In this case it's not using streaming. So it's going to send everything, it's going to process the response. Once the response is finalized it's actually going to show me and it's going to just print the result. Let's just quickly do this prod multimodality again. I keep using Python two instead of Python three. Apologies for that. Now I'm going to run Python three. Hopefully this is going to start printing the whole description. And you can see here the image show a close up portrait of a cat with striking green eyes and a sweet brownish gray fur coat. The cat says face a slightly stern, yet alert and yet alert and attentive expression. So it talks about the cat in the image. It's very accurate. And then like I said, he writes a poem, emerald depth gaze. So king blah blah blah blah blah, he talks about it. So you can see bedrock is amazing because with very simple API calls, I can call different models with different configurations with different I parameters. And this is pay as you go. All what I've done here is probably less than a penny because it's all on demand. I'm not paying for any provisioned capacity because I don't need in this example. Last thing is I'll recommend if you want to look for some of the code that I've used. I based myself on this GitHub public repository called Amazon bedrock samples. You can go here introduction to bedrock and you can see some examples. For example cloud tree. You can see the example for cloud know with the image. This is the one that I've used so highly recommend you get in there. And last, if you really want to look in more in detail into each model and the hyperparameters on how you call, you can go on the AWS bedrock documentation. Within the foundational model submenu we have the model inference parameters and when you click on specific models, for example cloud, you can see the different cloud completion and cloud messages API. On the messages you can see here, you can see some code examples. So you have a very descriptive documentation for you as a developer to actually take a look and deep dive. So that is all I had to show for today. Hopefully it was very useful. Feel free to connect with me via LinkedIn on Twitter if you have any questions. And happy coding, happy genai applications and I hope you find this useful. Thank you so much.
...

Samuel Baruffi

Senior Global Solutions Architect @ AWS

Samuel Baruffi's LinkedIn account Samuel Baruffi's twitter account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways