Build and orchestrate serverless generative AI applications

Video size:

Abstract

In the journey to build versatile serverless generative AI applications, discover the roadmap from concept to realization, all within an efficient, model-driven design setup. From principles to prototyping with the tools, learn how to bring your ideas to your users in a scalable, efficient manner.

Summary

Today I'm going to be sharing how you can get started on AWS for building and orchestrating serverless workflows for generative AI. Generative AI is a type of AI that can create new content and ideas. There have been some amazing breakthroughs through using foundational models in different industries.
AWS says Gen AI has three macro layers. The bottom layer is the infrastructure. The middle layer is access to large language models and other FMs. At the top layer are applications that take advantage of Gen AI quickly. The ability to adapt is the most valuable capability that you can have.
AWS step functions is service that allows you to create workflows. These are workflows that allow you to move output of one step to the input of the next step. The way step functions integrates with these services is through two ways. First is SDK integrations and the second is optimized integrations.
Step functions can be used to orchestrate interactions with foundational models. This makes it easy for developers to add speech to text capability to their applications. There are two new optimized integrations that we have provided. A demo shows how an application uses all of this together.
Another powerful way of showing what bedrock is capable of through step functions is chaining. What this does is this emulates a certain conversation that you can have with an LLM. As the execution progresses, you'll see all these states changing the colors based on how the foundational model is responding.
Step functions has the ability to call virtually any SaaS application from a workflow with the integration with HTTPS endpoints. It provides a way to connect AWS services with services that are outside. One other way you can do this without writing code is by public HTTPs API integration on step functions.
Step functions is a great way to actually leverage chaining. It simplifies the way you invoke your foundational models. Once you have tried the feedback, you can then generate that video, sorry, the avatar for the video. All of this is serverless.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi there, thanks for joining the session. Today I'm going to be sharing how you can get started on AWS for building and orchestrating serverless workflows for generative AI generative AI has taken the world by storm. We are seeing a massive shift in the way applications are being built. A lot of this is through consumer facing services that have come out like chat, JPT by Openei, cloud by anthropic, and we are able to see and experience how powerful latest machine learning models have become. Generative AI is a type of AI that can create new content and ideas, including conversations, stories, images, videos and music. Like all AI, generative AI is powered by machine learning models. But generative AI is powered by very large models that are pretrained on vast amounts of data and commonly referred to as foundational models. Now, throughout the session and also in conversation that you'll have out there, you'll see foundational models being interchangeably used with LLMs. Large language models just to understand LLMs are a subset of foundational models where LLMs focus on text. Specifically. There have been some amazing breakthroughs through using foundational models in different industries. A couple of these are where we see impacts in life sciences, with drug discovery being powered by Genei. This has enabled researchers to understand things like protein synthesis. In financial services, we see Genei being used to help create highly tailored investment strategies that are aligned to individuals, their risk, appetite, and also financial goals that they want to achieve. In healthcare, we have seen how physicians and clinicians can use this to enhance medical images and also to aid in better diagnosis. Think like a medical assistant. And in the retail space we see teams generating high quality product descriptions and listings based on product data that they already have. Now you'll notice a lot of the use cases for generative AI are for enhancing existing processes or experience that are already there. A question that usually comes is we already have services and applications that are out there. How do we take generative AI and then add that to enhance the experience versus rewriting everything from scratch? Now, to understand this, what you need to also understand is how you view generative AI. So from AWS's perspective, Gen AI has three macro layers and these three are equally important to us and we are investing in all of them. The bottom layer is the infrastructure. This is used to train foundational models and then run these models in production. Then you have the middle layer that provides access to these large language models and other FMs that you need and the tools that you need to build and scale generative AI applications which then use the LLMs under the hood. Then at the top layer you have applications that are built leveraging foundational models so they take advantage of Gen AI quickly and you don't need to have any specialized knowledge. Now, when you take this and map this against the services that we provide from AWS, you kind of see that the three stacks are kind of neatly segregated. At the lowest layer of the stack is the infrastructure. This is basically where you get to build cost effective foundational models. You train them and then you can deploy them at scale. This gives you access to our hardware, accelerators and GPUs. And also you get access to services like Amazon Sagemaker that enables ML practitioners and your teams to build, train and deploy LLMs and foundational models. Then at the middle layer we have Amazon bedrock. This provides access to all LLMs and other foundational models that you need to build and scale generative AI applications without you managing the whole infrastructure behind it, right? Without you actually managing the scale side of things. Think serverless, but for machine learning models for FMS, basically, then at the top layer are applications that help you to take advantage of Genei quickly as part of your day to day operations. This includes services like Amazon Q, our new generative AI powered assistant that is tailored to your business. So think like Personas, which are business users, data users, or even developers. You could use Q as part of AWS, a plugin that's already available for certain services, and then afterwards use that to get an enhanced operation capability. Each of these layers builds on the other, and you may need some or all of these capabilities at different points in your generative AI journey. A lot of what you see is in an organization, you'll have a mix of Personas that would use all three layers. You use specific services from those three layers to enhance productivity. Now, Amazon Bedrock is the easiest way to build and scale generative AI applications with foundational models. This is a fully managed service so you can get started quickly and you can find the right model based on the use case that you have. You can then also customize your model with your own data, and you can do this privately. Nothing feeds your data back to the base models, which then other customers would also have access to. This doesn't happen and you have the tools that you need to combine the power of foundational models with your organization data and execute complex tasks. All of this is with security, privacy and responsible EI safety, which you need to then put generative AI into production for your users. Now, there's a lot of models that are out there and from Amazon bedrock. These are a couple of models that we provide and one of the reasons that we went with this model is because everything's moving fast. Experimenting and learning is the key right now and also generative. AI as a technology is also evolving quickly with new developments. Now when things are moving so fast, the ability to adapt is the most valuable capability that you can have. There is not going to be one model to rule them all, and certainly not one company providing the models that everyone uses. So you don't want a cloud provider who is beholden primarily to one model provider. You need to be able to try out different models. You should be able to switch between them rapidly based on the use cases, or even combine multiple models within a certain use case. You need a real choice of model providers, AWS. You decide who has the best technology. This is kind of like where we have seen based on our building services that we want to provide the choice to customers, which is you. This is why we provide through Pedrog, access to wide range of foundational models from leaders like AI 21 labs, anthropic coher stability AI, also access to our own foundational models like Amazon Titan. And the idea is that we provide an API as part of this. So there is a layer, an API layer that provides you access to the large language models under the hood or the foundational models. And all you do is as a user or probably as a developer, you create the prompts in a certain format based on what the foundational model expects. You take that prompt or text embeddings if you want to tune that model a bit more, and then afterwards send that to the API layer and you can then get your responses back and then use that as part of your applications. Now there are a couple of ways you can use bedrock. One of the ways customers usually start is by writing code. And the way you integrate with Amazon bedrock is that you can use the SDK, right? So you use the APIs and then afterwards access the foundational models. So you load the libraries that has a bedrock API and then afterwards you can also access data in other places like an S three bucket. If you have data that's bigger than what's normal, you can then access it in s three bucket for input and even for output. You can then prepare the input and then handle the JSON to bring convert and then afterwards decode the responses. If the return data is image, it's an image of sorts. You can then store that in an S three bucket. Then if you have retries, then you'll have to do retry logic inside, and then afterwards, if you have any errors, you may have to have a certain condition, so on and so forth. You kind of get an ideas of what happens with code in general. Now, this is what the code would look like, but how do we actually look at providing simpler integration without writing a lot of code? And for this, you need to also understand the whole idea of sequencing. Right. How do you coordinate between multiple services? Because a lot of organizations don't just have one specific app, they would have probably a plethora of apps that power their business. And you want to understand how these services are going to talk to each other in a reliable and understandable way, because business processes usually exhibit different patterns based on the inputs that are coming in and what needs to be accomplished. Sometimes things need to be done sequentially. So in this case, let's say you have a number of lambda functions. So we'll use lambda as a proxy to understand this for different services. So you have a lambda one, and then you have a lambda two. Now, this is easy enough because you can have these in sequence. So lambda one invokes lambda two. But what if you have more than two lambda functions? What if instead of calling lambda two, you need lambda one to also call lambda seven before calling another service, or before calling a foundational model in this case. Now, if one of these services or functions fail, there's no easy recovery mechanism, and reprocessing previously executed steps becomes difficult. So we add some persistence inside. That's the next step. You have persistence because you have all these executions happening behind the scenes. And this way we can deal with state, right? Try to manage some kind of a coordination, try to understand which service is being executed at this point of time for this whole execution flow that's happening now. Because of this, you have to also collaborate all these functions. You need to manage this persistence mechanism. And there's no elegant way of coordinating flow or error handling between these services. And not every process is sequential. So, for example, you could also have certain processes that need to run in parallel, or perhaps it can follow different paths based on the input or what happens in an earlier step. That's a lot harder to do, and it gets even harder the more successful you are, because more people want to use the flow processes you've built out. You need to be able to handle errors as they occur. And this could be things like retrying calls, or it could be something as simple as following a different path in your workflow. All in said, this is all things that you can still do in code. This is something that has been done in code for quite some time. But what if your flow also needs a human as part of the process? For example, you need a human to review the output of a previous task to see if it's accurate, like a spot check for example. Or you've built out an application processing flow where the customer has requested a credit limit that exceeds the specified auto approved threshold. And then you need somebody else to come in and then afterwards review that request, and then after say okay, yes or no, depending on other data that they have. So that application needs to be routed to a human for this to work, and this continues. So as long as you have business processes that need to emulate what happens in the real world, you're going to have this amount of complexity that you need to build as part of your applications. So one approach to manage this complexity is that you don't have to write a lot of code and communication. Instead, try to visualize your sequences as part of a workflow. And this is where AWS step functions comes in. Step functions is service that allows you to create workflows. These are workflows that allow you to move output of one step to the input of the next step. You can arrange these in a workflow with conditional logic branches, parallel states, tools, a map state, or even specify wait states, like for example if you're running a job and then you need to wait for a certain period. Over here you can see a bit of an animation that shows you how you can choose a service. You then can then drag it from the left and then after put in the design view. Then the logic gets added. Then each step or action the workflow is configured. This also helps you to visualize how you can provide error handling and also specify retry and backup strategy. Step functions is serverless, so you only pay for what you use. It scales automatically, which also means that you can scale to zero. You're not paying when it's not being invoked. This is fully managed and provides a visual building experience using a drag and drop interface called workflow Studio. The visualization experience extends beyond building because when you run your workflow you can also visualize its progress with each step, changing colors as it moves forward and under the hood. What happens is this is using code which is using Amazon State's language, which is ESL. ESL is a domain specific language and it's JsoN based. So you can then declaratively create your workflows. So you provide that and we'll show some examples later. You can then take that ESL and then add that as part of your deployment pipelines so you can commit it to your repositories. You can also make pull requests on this so that other team members can collaborate. Now one of the things customers have told us with step functions, because step functions has been there for a few years, is that it integrates natively with 220 services and you can choose a service that you need to use as part of your workflow and take advantage of the benefits. Now the way step functions integrates with these services is through two ways. First is SDK integrations and the second is optimized integrations. SDK integrations, as the name applies, are provided by step functions by directly integrating with the AWS SDK. So that's over 10,000 API actions that you can use directly from your workflow without the need to write any customer integration code. Think blue code, which a lot of folks when they write serverless applications with lambda you tend to write. You can remove a lot of that just by using step functions. The other one is optimize integrations. Now the way they differ from SDK integrations is that each action has been customized to provide additional functionality for your workflow. So beyond just the API call, you also get certain things like for example where an API output is being converted from an escape JSON to a json object. So depending on the kind of integration that's bring, provided, those optimized integrations have that added value needed so that you don't have to then write extra code for maybe doing those manipulations. Now with any workflow and orchestration around, you need to have certain patterns that are provided, and these integration patterns by default are something that API actions can be provided with. So when you specify your workflow by default, it is asynchronous so the workflow doesn't wait or block for the action to complete. This is what you call as a standard request response call pattern. So you start the task or the work to be done and the workflow doesn't wait for complete, it moves on to the next step. This is great because it's efficient. You can continue moving quickly, but sometimes there are cases where you may need to wait until the request is complete and then you progress. And there is an optimized integration pattern called job run or also called sync. Because of the word dot sync that's added to the end of the API action. Then you also have a callback. This is what helps us to introduce a human into our flow and we're bring to see a bit of that in the architecture later. Now with these integrations that are available, you then have an idea of how you can take a business process and then afterwards integrate that across. But just to understand why this is important, let's take an example of a standard serverless application and show you why direct integration actually makes more sense. So here's a classic example. You're querying a database. So we have a lambda function that needs to get an item from a dynamodb table. So from a code perspective, what do I need to get started? I need the import AWs SDK to interact with the table. Then I need to set up my parameters to tell dynamodb what table I need to interact with. So this is like the table name, the partition key, the sort key, and then I set up my query so that there is a try catch block and then I return any errors. Now above that I also need to add lambda export handlers with my event object, my context object, and then add another try catch block to catch other errors. I may also need to convert data structures like for example an object to a string, for example, for other reasons. But you can see there's a lot of lines of code just to get one item from a dynamodb table. Now each of these lines is an area that something can go wrong. Because one thing you have to understand is code is also a liability, right? When you write code, you are responsible for the way it functions. You have to make sure that you're writing it securely, you're using the right set of dependencies, ensuring that there's no memory leaks and so on and so forth. Now when you look at it from a step functions perspective, what you can do is you have a single step that makes that item call to a dynamodb table and it's just a scalable, right? I can still configure things like retries, I can still catch any errors and then send that to a dead letter queue if I need to so that I can do a retry later. And if you notice, what happens is that this diagram isn't just a visual representation, this is actually showing how you can take a certain action and then after do that, take it from start till finish. And you can show this to other folks in your engineering team. You can also show this to business stakeholders so that they can understand what a flow looks like. So added value with of course the whole idea of errors and retries and the way it would look at when you actually add the nodes in the end with certain integrations is like this, right? So you have dynamodb, you have the getitem side, you have SQs send message, so on and so forth. One other thing during development, or even when you deploy a step function to production, is that you need to understand what's happening in the workflow and when things go wrong. And the way you do that is you have the execution flow where you can see different parts of the execution and then you can go within a specific execution, see the different states, what's happening within each state, what's the input and what's the output, and also look at things like how much time it takes to execute a certain state. And this is really critical when there are issues. So a great way to get all of that together and then see that in a single pane. Now let's dive into an actual use case, right? And we have a demo towards the end. I'll show a couple of demos in the middle, also about bedrock and integration, and then one where it looks at an application that uses all of this together. So let's say you have an application that has videos being uploaded, and then these videos need to be transcribed, right? So we already have a service that's available called Amazon transcribe. And in step functions, all I need to do is I can drag in a transcription job start node, so I can drag that in and then afterwards say, okay, fine, for any image that, and then trigger that step function for any video that comes in, for example, just kick in and then afterwards do a transcription of that video. So automatic speech recognition happens. And this makes it easy for developers to add speech to text capability to their applications. This integration is super powerful. This allows you to just have this without any code that's needed. Now let's say I want to also do something beyond this, right? So I want to take that transcription and I want to add some additional stuff. And this is where generative AI can help us. So I want to create multiple titles and descriptions for a video. I want to ask a human to provide feedback based on what choice they want to have from the titles and then also create an avatar for the video. So you have text also, and you have also image generation happening. And the way you do this with step functions is you can look at optimized integrations for Amazon bedrock. Now there are two new optimized integrations that we have provided, and there's more that's been added ever since where the first one is invoke model. And this invoke model API integration allows you to orchestrate interactions with foundational models. So you call the API directly through step functions. You give it the parameters that are needed, you provide the prompt that is needed and then that gets sent to the foundation model. You get the response back and then you can continue using that. The second one is the create model customization job. Now what this does is this supports the run a job, the dot sync call pattern that we saw earlier. And this means that it is waiting for the asynchronous job to complete before progressing to the next step in your workflow. So say for example, you're trying to create a certain customization on top of the foundational model. It'll wait for that and then it'll go to the next step and then afterwards continue with that process. This is useful especially in data processing pipelines because you are trying to do some kind of fine tuning to the model. I'll quickly jump into demo so that you can actually see what happens with standard implementation with bedrock. Just quickly to understand if you're getting started with bedrock, you need to make sure that you have access to the models. Right now you have access to foundational models in two regions, that's North Virginia and also Oregon. When you go to the bedrock screen you will actually see there's a section called the model access. And this gives you a list of all the models that are available right now in those two regions. And if you're doing it for the first time, you will have to go and manage your model access and then grant access to it. You'll get that immediately unless it's a brand new model that takes a bit of time where you may have to submit certain use cases. In my case right now I have clot three that's in the pipeline. I'm waiting for the details to get approved so that I can get access to this clot three just got announced a few days ago, support in bedrock. So I have that immediately ready. Now let me jump in directly into a workflow. When you go to step function and you create a new step function, you're greeted with a blank canvas. You have a state box that's empty over here. In my case I already dragged in bedrock API and if you want to see the list of bedrock APIs that are currently available, you have much more right now where you can also manage operations on foundational models if you need to. Things like the custom models for example and listings, especially for processing pipelines, MLO Ops, so on and so forth. In our case I just want to do an invoke model. So I'm going to just show you what the configuration looks like. I have foundation models already selected, and these are the list of foundation models that are already available. As you saw in the previous screen. In this case I have selected llama. So llama two is already selected in this case, and now you can configure what are the parameters that need to be sent. What I'm doing over here is I'm just hard coding the prompt in another demo. Quickly after this I'm going to show where you can actually customize the prompts based on input that you may get from other applications or maybe from the user. In my case. All I'm saying is, okay, there's a transcript from a video in a paragraph. This is the same video you're going to see in the last demo. This is an interview between Amazon's CTO Werner Vogels and ex Amazon CEO Jeff Bezos. This is from 2012, so eleven years old, and all it does is it uses this transcript, and then I'm asking it to provide a summary of this transcript. So what I'll do quickly is I'll just do an execution, and we're going to see how it looks like when you do an execution. I'm not passing any input, it's optional because I've already hard coded the prompt over there. Once I run this, and within a certain execution history or a certain point of execution, you can see the actual path. You can see what are the different steps that are being executed. And with bedrock model already done in this case, you can see that the input just was an optional input that got sent out. And here is a summary that's come back from llama two. This is basically a summary of the transcript. It gives an example of what Jeff Bezos mentioned and what's the whole organization working on towards. This was eleven years ago. You also get other parameters like how much prompts were taken and generation token. All in all, without provisioning any large language models, without you actually managing the scaling side or even provisioning a large language model. So pretty cool. And the other thing, what you'll realize is with step functions you also are able to view the different states and how much time they took to execute. So really useful, especially if you want to debug certain things. If there's any failures that you get that. Also over here you can actually see those errors over there. Now another powerful way of showing what bedrock is capable of through step functions is chaining. And this is another demo application. What this does is this emulates a certain conversation that you can have with an LLM, with anything that's doing text, right? So, for example, you have a chat interface, and with any large language model, you have to always provide the context of, especially the history of the conversation that's happening, so that the next one can then understand the next conversation, or the response can be based on that conversation from before. So in our case, what we are doing is we're creating a chain, and in this case, I'm leveraging another foundational model called command text from coher. And what this does is this is reading a prompt. So it's going to read for the prompt from the input. So when you invoke the step function, you can actually have a look at what are the different parameters that are there in the object, in the json body, and then afterwards you can pick that out. In our case, what I'm doing is, I'm just saying, okay, dollar prompt one, send this as a prompt, and these are the maximum tokens. Now, in this case, you'll see this is a different syntax based on this model versus what was there for llama two. And all I'm doing is I'm adding the result of this conversation back to the initial prompts that are coming in so that we have context throughout this conversation. And now if I just go in and execute this, I'll just copy this from a previous one, because I want to pass a similar input. I'll just do an execution over here. In my case, I'm passing three prompts, if you notice, in the state also, I had three of them. And all I'm doing is, I'm saying, okay, name a random city from southeast Asia. Just want you to give some information, provide some description for it, and then provide some more description for it. So let's start the execution, and as you'll see, as the execution progresses, you're going to see all these states changing the colors based on how the foundational model is responding. So the first result is already in. So it says, okay, here is a random city from Southeast Asia. So it picks Ho Chi Minh from Vietnam. Packages that in as part of this result, one is already added in, and then afterwards sends it to the second conversation history. You'll see conversation result two. Here are two interesting aspects of the city, and it mentions certain parts of this. And then invoke model with three. And the output over here is that it takes in certain part. Now, with large language models being nondeterministic, a lot of times you have to be careful with how you send your prompts and then ensure that the context is remaining. Now, in a previous execution of the same workflow, I was able to get the third prompt and also make sure that it continues with the city, which was previously Ho Chi Minh. So what I would probably want to do is I would create my third prompt in such a way that I emphasize it clearly that this is the city that you're supposed to use. And probably the way I would do that is I would have certain parts in my inputs, which would probably take certain things like the city or other things and then enforce that as part of different prompts. But in a nutshell, you kind of see how you can do chaining in this case, and you can also bring that within this and have a bigger application that is using this. And we're going to talk about the architecture of this for the rest of the session. So let's continue with that use case of generating titles and descriptions for the videos in this case. What happens is that, like you saw earlier in the demo that I showed, you can select the large language model. In this case, Titan is selected. And what under the hood happens is that the ESL for Amazon bedrock looks something like this, right? So there's an invoke model action that's happening, and then there is a model that is being selected. It could be llama, it could be anything else. And then there is a dynamic input that's coming in. So dollar prompt, which basically means something else, is invoking the step function and then providing this prompt. Now you have also inference parameters that allow you to tweak the response that comes back from an LLM for various things like probability and other things. And when you look at invoking the model, you can also provide input and output. So for example, if your input is larger than 256 kb, because a step function can only take 256 kb of content text, usually in this case, what you can do is you can point to an S three bucket for input and for output. It's a good way to ensure that you're able to scale this application without facing the restrictions or the constraints by step functions. So this input and output is then used, and then you can change this and you can continue using this in different states within step function. One thing you'll realize is that in the first requirement, it was actually mentioned about creating multiple titles. Now, for example, we can continue using just the foundational models within AWS. But what if we want to access something that's outside, let's say for example, hugging phase, you want to access this foundational model from outside AWS. We want to then get the data, send that across, and then after get the response back and then continue in our execution. Now when you look at accessing a public API in general, it might look simple. Then the first question comes is what is the kind of authentication that you need, right? Is there basic authentication? Is there API keys? Is there oauth? Is there anything else token management for example. Then you also want to ensure that you're saving the secrets because you want to make sure maybe there's an access key for accessing the API. You want to keep that somewhere. Then you have input output handling. You also then have a graceful retry if something goes wrong. Then also rate control and so many other things. Now the way you would do that with AWS lambda for example, or maybe a container or virtual machine on EC two is that you would have your code running and then you would have these different services which would fetch the credentials, you would manage the token, you would then retrieve the request data and then afterwards invoke and get back the data, maybe store it somewhere else also if needed, this is what a resilient application would look like. One other way you can do this without writing code is by public HTTPs API integration on step functions. So step functions has the ability to call virtually any SaaS application from a workflow with the integration with HTTPS endpoints. So without using a lambda function, you can use huggingface for example, you can invoke an API and hugging face or maybe other APIs like stripe, Salesforce, GitHub, Adobe for example. And step functions now with this low code approach provides you a way to connect AWS services with services that are outside. And you can then take advantage of workflow studio because now you're dragging drop all of these things and then after putting that as part of the workflow together without changing or managing any code as part of this. So with such requests you can actually then put in your json object and then in the request body you can then mention okay, this is the kind of data that we are sending and this is what we are trying to retrieve back as part of that transformation. One of the ways that you can actually use this for integrating with HTTP APIs is that you can manage the errors also through step functions like we saw kind of you have that ability to do error handling. You can also manage authorization as part of that integration. You can also mention transformation of data because step functions already provides that for optimized integrations. So you can also leverage that if needed for things like URL encoding for request body and there's also a test state that allows you to execute that specific state without deploying that step function directly outside. So you can just execute that specific state as a test and then afterwards make sure that you're getting the kind of response that is needed. So with the task state that's available, you have that single unit of work. You can do an HTTP invoke and you can see that an existing resource field is available now and you also have the new option. And you can also then provide things like what methods are being invoked. For example, what's the authentication field that is there? The parameters block. This under the hood is actually using another service called Eventbridge. So Amazon Eventbridge is being used for API destinations because it has that ability to invoke or send requests to an API destination. So the same connection is actually being used as part of that. A lot of these parameters are actually optional. So when you're invoking a certain API, probably you're just getting a response back. You don't need to pass any query parameters. In this case, what we're doing is that we can add a request for headers and then anything else that's needed as part of the request. Now let's go back to a requirement directly. So in our case, since we want to generate multiple titles, we want to make sure that we're able to access one title from our model ourselves, and then after one from hugging phase. So we have a parallel state. Now through step functions, this allows us to use both the foundational models and you simply configure the endpoints on the right hand side. This way the parallel state will then execute and it will invoke each task in parallel. It then requires that each branch completes successfully for the parallel state to be considered successful. Now what happens if one of these branches doesn't complete successfully? Right? So what if something goes wrong? Maybe there's an issue in the call for one of our two FMs, and errors happen for various reasons. And if it's a transient issue such as a network interruption, you want to make sure that you're able to do a retry, and maybe you want to do that retry for a couple of times. And then you also configure something called as a backup rate to ensure that you don't overload the third party system. And for these momentary blips, it's just important. You just need to make sure that you have a retry mechanism of sorts. But what if that underlying error is actually something which requires a longer investigation, right, or a longer resolution? Time, because maybe it's not under your control, maybe it's independent of your team, and maybe it's somebody else who's managing it, or maybe even a third party. And what may happen is you may exhaust your retry strategy and then eventually that workflow step will actually fail. So you want to make sure that you're able to run this entire workflow, but then at the same time, if you can't, then you want to move it to an error state, or then move it somewhere else so that you can retry it later. So if you want to visualize this, this is basically what it looks like. So you have a success tip that then kicks off a parallel workflow. This parallel workflow has two branches, so you have bedrock on the left, hugging face on the left. And let's say we invoke the foundation model, we have some transformations we want to do using another AWS service, and that is an extra step. But let's say there is some failure in transformation because we have invoked something hugging face, and then when we get it back, something's not working. And this transcription job needs to continue. Right. There's some form that needs to happen, and we have stopped it over here before actually moving it to the next step, which is a human review. This is where you have the option of Redrive. Now, Redrive allows you to easily restart workflows, maybe because you have figured out, okay, there's a problem, and then maybe it's got resolved, and then you want to retry that workflow all over again so you can recover from failure faster, and you only pay for what you need. So you don't have to keep retrying it unless it's really necessary. So the way this works is you will have these two branches on the left and hugging face on the left. And let's say that when we invoke, we do the transformation, but it fails in the transformation step, so it gets fixed, and then after you come back again, and then you do a retry again once more, and this time the transcription actually kicks in because your transformations are already completed. And now it goes into the human review space if needed. So one of the other things you want to do also as a part of workflows is you want to have observability. Execution event history is very important as part of this because you have different states that are coming in, you have events being fired. You want to make sure that you're able to filter and drill down to what's actually happening within your workflow. This is kind of like where you can see execution Redriven and it also shows a count, a redrive count of how many times you are actually retrying that execution through Redrive. So cool. I think it's a great way to understand how you can actually manage events, especially errors in this case, and then ensure that your workflows are able to then continue properly. Now, with multiple titles out of the way, let's talk about asking a human to provide feedback. Now, having a human approval is an automated business process and it is super common. You have this as part of any approvals that are happening, probably in the banking space, in the financial space. You have probably also a human in the loop as part of maybe a foundational model that you have created or you have custom built, or maybe you're fine tuned and then you want to make sure that you're able to check the response that are coming in. Maybe you have an EB flow that's happening, right? For a few requests that need to come in, you want to have a human response that needs to happen, human review that needs to happen. So the requirement is super simple, but possibilities are endless when you need to do this. So step functions integrates with services in multiple ways, and one of the ways you can do this is through long running jobs of a service, and you want to wait for the job to complete and we'll use this integration pattern to achieve this requirement. So what you want to do is you want to make a call to a service integration. This passes a unique token and this token then gets embedded in an email. It goes to maybe a server or on premise and legacy server, or to a long running task in a container, for example. And then once that response is maybe it's reviewed, and then after they click on going ahead or not, it returns using the step functions API, send task success. And then the workflow continues from there. So as part of the send response and wait workflow, there will be a token that's sent out like I mentioned earlier, and this email notification is already there. Maybe as part of this use case at least, what will happen is there will be options that are being set. So choose the title that's being generated by Amazon Bedrock, or choose the title that's being generated by hugging face and then regenerate that. Now the last part of this requirement is creating an avatar for the video, which basically is an image in this case. And machine learning models, especially in the foundational model space, you have built in algorithms. We also have pre built ML solutions that are already available. You can probably invoke a third party API again for this case, and there are multiple ways you can do this as a part of bedrock. You also have access to stability diffusion models, so you can use that also as part of the step. What this does is in the end, once you have tried the feedback, you can then generate that video, sorry, the avatar for the video, and then you can store that in an S three bucket and then share that link later. Now, one of the things you'll realize when you want to create such a complex workflow is that especially with foundation models, you want to have this whole idea of creating chains of prompts, which we kind of saw in the demo. Now this is a technique of writing multiple prompts and responses. A sequence of steps. Step functions is a workflow is a great way to actually leverage chaining, so you can actually use this. And step function simplifies the way you invoke your foundational models, and you have state independency management already in place. You can then create chaining easily. You can also pass the state, as we saw earlier, pass that to the next state that's needed, the response of a state, and then pass that to the next one, maybe specific parts of the prompt also if you need to. And all of this is again serverless. So think of use cases like writing blogs and articles, response validation, conversational LLMs that we see a lot these days. Now, if you want to now take all of what we have seen and then put that in an architecture, this is what it looks like. So for example, you have an API gateway that a user would invoke through an application, and then that would then put in an event into a queue, and then this event in the queue then gets picked up by a lambda function, which then would trigger this step functions workflow. And in this case, what happens is that you have a lot of these steps already in place as part of the workflow. It sends the title and description to the user back, and then afterwards you can then send the chosen title and description as part of the human workflow, if needed, for the response, for the review part. Then as part of the final part where you have the generating the avatar, you actually get an S three presigned URL, because that avatar image gets created, generated, and then afterwards put in an S three bucket. So here's a demonstration of this final architecture. So there's a short video of an interview between Jeff Bezos and Werner Vogels. What's going to happen is that we want to generate a title and a description and an avatar for this video. So there's a simple UI that you saw earlier. This uses a websocket communication to talk to AWS iot core service. And once you select the button, it then sends that video's details. And then the workflow then gets executed from the lambda function. And then you see that that step starts kicking in. This gives a nice view of the execution. You have the color coding of the state. And with transcribe being used initially, you get the text back for the speech that is there in the video. And this transcribe job is asynchronous. So there is a wait loop that is there to make sure that we can wait for it to complete. Right, so that's the wait loop that's there in the beginning. And once that wait loop is done, wait loop is done. And using the get transcription job API, we get the final response from that transcription job. And then we read that transcript. And that transcript is available in an S three bucket already because transcription job will put it in over there. And then once that is read, it is then passed down to this parallel execution. As part of the parallel execution, we have this two calls that are being done. One to bedrock, one of the foundational models at bedrock, and then one to hugging face. In this case, we're just keeping it simple. So we want to just make sure that we're able to execute this back. And we want to then get user feedback now quickly, just to show you what the outputs look like. These are the inputs that are coming in. This is the transcript that's there, the prompts. You kind of notice that the models that are being invoked is also there as part of that. Here are the parameters, here's the conversation that's happening with the video and the s three bucket URL for the video and other things. And there's a task token that's already there. This is part of the review flow that is being invoked. So we have this task token that's being sent to the page. And this page is basically where someone can actually go in and then say, okay, do they want to select this title, the first one or the second one? So we select one of the titles, and then afterwards it goes down, and then it creates an avatar as part of that title that's created. It sends that as part of a prompt to one of the foundation models. So stability in this case, and once you have that, that gets displayed over here, and that's an avatar that is used now, for example, the team that uses it can then copy this image and then put it in because it's already there in s three bucket. Or probably it gets picked by another flow that then is used as part of their content publishing pipeline. Now, to know more about how you can build applications like this, there is a sample that's already available that has different use cases. Also covers things like the error retries. It covers prompt chaining and all the other parts of creating a complex workflow with step functions for generative AI. So have a look at this resource. A great way to do this, and also with our blog posts that are linked as part of this resource. So with that, I would like to thank you for attending the session and have a good day and rest of conf 42. Thank you so much.

Slides

Download slides (PDF)

See all 47 talks at this event!

Conf42 Cloud Native 2024 - Online

March 21 2024

Build and orchestrate serverless generative AI applications

Video size:

Abstract

Summary

Transcript

Slides

Mohammed Fazalullah

Senior Developer Advocate @ AWS

Join the community!

Featured event

2025

2024

Info

Conf42 Cloud Native 2024 - Online

March 21 2024

Build and orchestrate serverless generative AI applications

Video size:

Abstract

Summary

Transcript

Slides

Mohammed Fazalullah

Senior Developer Advocate @ AWS

Join the community!