Measuring the Performance of Serverless Applications with Generative Models in Amazon Bedrock

Video size:

Abstract

Unlock full observability for serverless apps using generative models in Amazon Bedrock. Learn to collect logs, visualize key metrics in CloudWatch, implement OpenTelemetry traces, and track model response performance to optimize reliability, efficiency, and costs.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everybody. My name is Jason Z. I'm so happy to be here. Thank you for joining today. Session Today we're gonna talk about about measuring the performance of servers application with generative A models and Amazon bed. My name is Hazel Science and I am from Guatemala. I am an AWS serverless hero. I am really passionate about se talking serverless all the time, so you can find me in AWS community Days Summits and all text related events in Latin America. And. I'm also the creator of q virtual assistant that helps and try to revolution how is the way that we are learning about AWS. So I write also technical articles, so if you want to go and see my content and or even just connect with me to. Talk about several desk. Please follow me on my media. This is the, I leave the QR in here for. You can follow me and I'll be glad to talk to you. So let's get started. What we are going to learn today, because there was a really fancy table in the. Talk today, but what is really what we want to learn here, so I don't know about you, but when I started to learn about GAI, like a couple of years back I started to test all the different, models that come to the market, right? Like everybody else, I started testing and when I start evaluate them, I see the publicity and say something like this in the image in here. So in all the ads they create, like I want an image that can say this about an eye, for example, and create something like this. And what's so awesome? So I say, oh, this is great. I wanna test this. But when I started to test it, I realized that my image, it wasn't exactly the same as the ad. I didn't know that happened to you, but it happened to me a lot. And I was really frustrated and skinny. Something like this, like I have a character with three mouths instead of one and, or sometimes even I have more hands or more legs than a character or something like that. I just started thinking about I want to build application with Gen ai. Not only that, right? For me, but actually use it in my business. So it's supposed to be easy and straightforward, and they never should fail and sound easy, right? But for some reason it wasn't. So why so hard? When I started to actually build and incorporate this different tools on it, I realized that, and I learned that with Gene AI application, it is different about where we are used to work before. Why? Because in my previous systems related with banking e-commerce and all the different kind of application, all that program were deterministic. And what is that? That means that when I send an input, I always get the same output because the logic inside the program, it was. Like a bunch of ifs. Let's say if I, you give me a one, I will return you. The word yes. If you give me a two, I will return you the word no. So every time that I say one, I'm gonna have the same answer, and every time I say two, I'm gonna have exactly the same answer. So everything is the same one and over again. But now is different. With Gene AI application, we have a pro, a program that is based in. Probabilities. It is probabilistic what this concept mean. That means that I can have the same input over and over again but my output may change. Why? Because the models were actually going to give you as an answer is based about the probabilities. That could be the right answer. I. So the models have so much information that they not always going to return you the same. Probably a hundred percent. Ask the same question and the model know that 98% of that that users like what one kind of question they're gonna provide you that answer. Yeah. But it's probably not the one you want. Because you could be the 2% that it wasn't looking that answer, right? That's why is more difficult to actually create an application that give you, that, provide to the users the output that you want based on probability, and that opens a new door for me when I started to actually build this kind of application. Said. Okay, now I know the problem. I know why so hard and evaluate a little bit in more details. So now that I know that my previous application was deterministic, that I know that is fixed result, that it depends on fixed rules because it's was a bunch of it, is it always the same? So I know what it works in my previous projects. An example of this, it could be like a binary search and all that kind of application that is already out there, right? But prob probabilistic. I don't have fix the result. I don't have the same rules all the time. It depends about, not only about what is the outcome that I want, it's about what the model that I'm used, because each model is trained with different data. So the output that can provide me there are different. So now I have a new variable that I need to consider what model I actually I do need in my business for actually can. Get the output that I need, right? And it may vary each time that I ask the same question. So if I have a hundred user, then I send the same question. I'm not always going to get the same answer. So what kind of examples are the can I have in here? What I have all the ML models like Amazon, Nova Cloud chapter bt. Every model is different. Every model processed information differently and collects different kind of data. So that's why their outputs change. So if I still do want to have application with Gene ai, because I do, because. I need to have it in my business to actually grow and take all the capabilities that Gene AI offers to me into my business. What should be my goal? How can I knowing that I have this kind of challenge, can I face it? So the first thing that we need to understand is what are the AI objectives because. It is important to be aware of that in a business decision. So the first one, I want quality. I don't want to have an image with three mounts. I want that my output, they always have in the high quality that the user needs. My user needs, right? So that's an important fact. The second. The second one is I want the best performance. I don't know if this happened to you, but I use some of these tools and when I create in one model, I get the answer really fast. And when I test the other one, it was more slow that the previous one, and maybe it's the lowest and the quality is not that good. So which one should I use? Was the faster right and the best quality. But it comes to, into a balance because you don't only want that new application be a hundred percent accurate all the time, but I want to, this application should be fast because I don't want my users go to the author. To the other company to get the information that I want to provide them, right? And for that I need to be fast. So performance is really important here. And the other one that is maybe not that impactful to the users but is impactful to my business is that there needs to be cost effective. So I want to have that this application have high quality. We need to have best, the best performance into the in the market, but I also. Need to be careful with my prices because with my cost itself, because gen AI is expensive, the model use is expensive. So I need to be wise when I select what model should I use, and I need to be aware that maybe I don't need such intelligent models for some tasks, but for others I do. So I need to learn how to balance that. Okay. And with that, if I get the three, these three key objectives, I'm going to create a fully good gen AI application. Okay. So to get here, we need to understand what are the building blocks that CON consists on? General applications. No. Because it is a little different. If you have experience creating another kind of application, like a e-commerce application or a banking application with g ai is a little different. And this is why, this is the blocks that we have to start. We have the confidential model hub. This part of the application, it, it has the communication. Of the models itself. So what models are you going to use in your application? Because in some cases we want to build our own model. In other cases, we're gonna use the one that we already have in the market and it's, that is good, but this is the phase when you decide that. Okay. The second part is the data foundation. Do you remember that? I said that all the models has. Different outputs because the data is different. So in your business is the same. I can have business that it provides the same service, that another 10 company, but it's mine is gonna be different because my data is different. It provides something new to the users and that is why this other layer is really important, because depends on the quality of your data. Your application, the quality of your application will be different. The third layer is about model adapting and prompting. And this is something really important. We, you can have the data that you need, you can select the best model you can, but if you don't know what is the right prompt to use that model. You will probably will not get the best results. So this is the point when prompt engineering is come to the game and start changing the things. Okay? The other part is Gene AIOps. You already know what the true layer basic of a of an A gene application, right? But to actually have a successful application running with different cycles of deployment, you need to get in place, your software development lifecycle, and that is this part of Gene AI Ops. How are you going to deploy that application? You are gonna use a infrastructure as a code for that all that kind of stuff. And what is the continue, did you gonna have into different versions and all that? And something that is important and we cannot forget about that, is that you already have probably, you already have some application with your clients. They already know, and they are familiar how this application's going to be connected to your new application that is powered with gene ai. So that is the other block that we need to take in consideration. And this part, and to this point, we know what are the components that are part of a good gene AI application. Okay? But we need to add something else in here. We need to keep in mind that if you gonna have this kind of application, you need to have in place a good government and you need to have the country plan. What is that? You need to be aware about your user, what kinda access they can have, what the security you gonna have in your application to not be be attacked and all that kind of stuff. But counterplan, it's gonna be the one they can help you to improve your application into the future. Okay. Why? Because it's open this world that has been called observability and it allows you to actually, you can monitor what is going on in all the different blocks that I disrupt for. Okay. And for the last layer then the last block that we're gonna have in here is not that it's not important because it is. All the infrastructure that is running below this application that it make it to actually work. So in this session, we're going to see, and we're gonna focus only in these three blocks. We are gonna focus in the foundational model hub. How do the models actually, we can collect information from them and all the data foundation. Okay. And we're gonna talk about. Observability And why observability? Because if we want to have this, the three objectives of this applications, the quality, the cost, and the performance, we need to be, before we can decide what we use, we need to be aware what we need, how our application is being, how is working behind. Because with each interaction, I can learn about it and I can improve it. For the gene I application, this is the key factor. If you don't observe what is going on in your application, you can improve it. So you're probably gonna be left behind because the other companies, they are watching their own application and they're improving itself. In some cases, they're gonna take the decision about, I'm going to change the model. I'm going to change. I'm going to improve my prompting. I'm going to change my data because I'm not providing the data that I need. That kind of decision needs to base on numbers and numbers is going to be provided to you using observability. Okay. This is an example about what is a foundational model health like in a high level, right? You can have a gene application and you're gonna have a gene AI gateway that is going to be the point of access for all your gene AI components that is gonna be part of their application. That could be like, you can use models in Amazon Bedrock, for example. Or you can create your own models. You remember that? I said you can have those. Yeah, you can have both. You can choose to have your own model created from scratch, or you can start using the tools that is already in the market like Amazon better that it provides you the comp, the capability to, you can use the. Other models out there from Cloud for yeah, Tropic Lama Nova. You can use all that models just by using the Amazon Metro, API, for example, and in some way you can have the benefit of all the different models using just one language to do it right. And the other part it was the enhancement of data connotation. That is one example of that is our wrap application. So when you talk about gene AI project applications, you need to put your data. To this application for they can answer or they can use it to train and provide you outputs that you need that will be custom to your clients. So that's why it's so important how you are going to provide that information to the application. And the rat comes up. You can have ETL process. You can have applications like Confluence, JIRA GitHub. You can have a different input that's going to provide that information to your customers. You can even have document PDF files, c, s, B, files, whatever you need. That is your data. Even you can have APIs and you can have databases that you need to provide that information to your application for the action can using to be trained, right? So in here that comes to concept one, the Amazon Belt drug knowledge base that will allow you to have all the different, files that I was mentioned before, like easily get all that information together, put it in a vector database, and the models can actually use that database to get the information right. You can have actions, you can have tools that you can incorporate in here. But the main goal is to get your data right and you can also have Amazon bedroom Gogos that is going to be the one that helps you to not only have the information, but to protect yourself and protect your data. To be reading, read it, and mal intent. For some of the users, for example, what kind it, I don't know, but they can be like the informa, the sensitive data from your customers. It's just something that you get in your data, but you don't want that that application can have it or you don't want. Yeah. So you don't want your application. Another application can have that information because it's protected. So you can do it with guardrail, okay, so what are the key, what we need to follow to actually implement observability? We need to have these four layers in place. One, the company level. You need to have metrics in your infrastructure. You need to have invocation errors, latency utilization, because to order to identify, to have a performance issue, you should know how your. Resources is performing. So you need these metrics. Yeah. Layer two, you're gonna get metrics from and the traces. So when you work with models you, the traces come be really important because you need to know. What actually is gonna happen when the user asks for something and all the different stuff that is related with all the different components. So this is why it came. Important part layer three, the metrics and analysis related to Gores. How many how many users try to get sensitive data for customer, how many users can actually get trying to get. Information about your cast or something that is important. That is something that you need to keep an eye because you probably need to change the way that your data is being populated to, to your models. And the fourth and last one is the user feedback. The user and feedback is important because when you are interacting, let's say like a chat bot. If the answer that provided to the user it wasn't good enough, the user can give you that feedback and can say, yes, I, it was good. I don't like it, I hate it. All of that kinda stuff. You can get it from there and you can improve yourself because remember, each time you have an answer, it's gonna be a different answer and could be a good one that satisfied what the user need or can be another one that it doesn't do it at all. Okay. So that's why the before layer is important. You can do it if you already have an application, you can do it step by step. So you can start step one. Then pro plan to have this layer two, the layer three, and the level four, but the it is in this order because it is important that you start to do and you need to when you start. The science, your application, you need to be aware that this is the road that you must follow to actually can have the most observability and the most decision data related, improve. And for this example, I'm going to use Amazon CloudWatch. That is a service for AWS that is used to collect all the, not only the logs, but also trace and collect all the information from our s from the services that is involved in this in this kind of application. Okay. So with that, we can do it a demo. If you want to do it yourself, you can do it, you can just scan this. I will try to give the link. But you can download in video. I create a small application in GitHub, and you can just download it, you can clone it, you can deploy it in your own AWS account, and you can start. Just deploy it, use it, and start to see the dashboard about how you are collecting the data. Okay. Okay, so let's take a look about the team. Something that is important in here is that, to actually, if you're going to use some bad, you need to collect your metrics and that is not at the default picture that is turned. You need to go and do it yourself. I'm doing it in here really fast, only to see how simple it is. You can send it to a, an extra bucket or you can send it directly to, to cloud only to CloudWatch. The target that you can have all the different inputs that the models are actually working. So if you see here, I'm gonna, in here in the example, you're gonna have an API that is a expert music. So you can ask it anything about the music. And each time the user sent a question a model is being indicate at the backend. You will get the answer about that question, and if you go to see now you, I turn off the logs. If you see here, you can get now the information about that request. You will note what model was involved. You can know what is the a, the question that the user made, what is the answer that the model provide and most, and really important, the token that's being used in here. It's gonna help you to get what exactly the same, the cost that is is getting to you by using this. For example, if you want to compare to a different model, you can have the same implication to two different models. You can count the tokens and you will know which one was more expensive or which one takes the longest while the latency and all that. And only like enabling that part. Okay. The other kind of of metrics is the one you use with x-ray and you use with CloudWatch, and you use also with CloudTrail. In this example I have in the Lambda, I use a specific trace to actually can be aware the timing that is consuming the. All the different steps in the function. And in the ripple you can see the example and you will get the information not only to the invocation, the time that the Lambda takes to run, but you will also know how long the Lambda takes to execute the model. And you will know the timing of the metrics, for example. So in here, if you see if I can go, if it can go to CloudWatch. See the, now you are capturing all the different locks in here and you, we can go to x-ray because it is instrumented to x-ray and you can see that you're not only gonna get what is the locks inside the Lambda, but you also get this kind of diagram. They show you what all are, all the different. Services in both to get that answer. Okay. So this is the cool part that you have like different all is separate. You have the trace at the level of the code that you need and you can go much deep as you like and you need in your business. Okay? So what kind of the, lambda, I recommend that I always have this, the one that I use in the models itself that is really important. The logs that you see in the beginning about what's the answer to the models itself. And with that, you can also make something really cool in, in cloud that is related with the dashboard. So by the fold, you can go and create this dashboard. You only select that you want to have. Like bedrock dashboard and you will have the in implication of the models, the input token. And in here it, this kind provides you a visual when you can have the comparation between the different model that you're using. If you have, for example, like 10 models that you, you will actually be able to see it. Okay. So this is only the implementation of phase of layer one and layer two for layer three. That is part of the guardrails. And number four, that is an end user input. I didn't have it in the demo, but it's like the next steps that you can have with this basic observability. Okay. So with that I said I, so with that's all that I have. Really thank you to be here. I'm really, sLA to be part of. With that, I am that will be all. I'm really proud to be here in this space, so let's connect with me and have a nice day. Thanks for watching. Bye-bye.

Slides

Download slides (PDF)

See all 61 talks at this event!

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Measuring the Performance of Serverless Applications with Generative Models in Amazon Bedrock

Video size:

Abstract

Summary

Transcript

Slides

Hazel Andrea Saenz Giron

Cloud Software Architect @ Caylent

Join the community!

Featured event

2026

2025

Info

Conf42 Observability 2025 - Online

June 05 2025 - premiere 5PM GMT

Measuring the Performance of Serverless Applications with Generative Models in Amazon Bedrock

Video size:

Abstract

Summary

Transcript

Slides

Hazel Andrea Saenz Giron

Cloud Software Architect @ Caylent

Join the community!