LamRAG : From Data to Constructive Insights using Amazon Bedrock

Video size:

Abstract

“If you can’t explain it simply, you don’t understand it well enough.” – Einstein. This session will prove that Prompt Engineering, Vector DBs, RAG, Agents, and Multi-Stage Flows are complex only because people did not do a good job at explaining them. Get ready to embrace AI, this time truly.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone, my name is Sam and welcome to my session Hong Kong 42. Today we're gonna be talking about the different ways in which you can integrate an AI your application. So let me share my screen and we can get started. A big shout for Kong 42 for selecting my talk. It helps me spread the ledge with a community. So lamb, right from data to constructive insights. We're gonna be looking at the sample application and. We will talk through its journey, how we have evolved the usage of AI in that application. My name is Sandeep. I work as a principal solution architect at Tack. I'm also an AWS community builder. I've also been certified as an AWS, SA professional, and I love building service applications. I've been building serverless applications for the last six years, and once you start building serverless applications, that's not turning back. So let's get started. like I mentioned before, Aztec is a company that builds all applications for our clients, and we have a large engineering team as well. Now with this comes the next set of problems, which is we have multiple project managers who run different sprint schedules. Some of them are on common, and each of them have their own way of collecting feedbacks from the developers at the end of each sprint. So this data was pretty much spread out and it was not usable because it's not centralized and nobody knows what's happening. And the other problem we had is our folks did not like to give negative feedbacks. We don't know why, but every time we ask them to give a feedback, it's always positive. Everything is kick ass. And, if we try giving them a scale, rate your repairs between one to 10, everybody got that. Every single person. So we came up with solution to build an application with all feedback that we use internally. What it does is a simple interface, which allows you to create projects, manage them, and it also sends a notification to the user that, when the feedback cycle has started, now you need to provide feedbacks to your peers. And all the data is again, stored in one single place. So we have access to the data at any given point in time, everything is now centralized, but we still had the other problem, which was everybody getting 10, 10 ratings. That's when we started experimenting with bedrock and AI to see, how it can help us out. So we start with the playground. What we did is we started testing out a simple prompt, which basically says, I'm going to give you a feedback. Now you need to categorize them into three different categories, positive energy, feedback, and reliability, and rate them between one to five, one being the lowest and five being the highest. And here's the feedback that the user has received. So to do this, we started using Bedrock. And Bedrock has a nice, chat interface which allows you to test out different props, in Amazon Bedrock In the chat playground, you can select different AI models. Provide the same prompt and see what kind of responses each model is providing. So seeing them side by side helps you understand the kind of responses each model is giving, and you can select the right model for you. What it also provides is the metrics on which it works. So the latency, how much input to consuming, how much output photographer is consuming. So this also helps you to map out what model is going to cost you how much, and you know which one you should be using for your use case. With this being done, we got the prompt working. The problem was consistency. As you all know, writing a prompt is just the first step in building your application. The prompt has to be consistent irrespective of what kind of inputs you're going to give to it. The prompt must always deliver what it's supposed to deliver. So that is when we started stepping into prompt engineering. Now. These are the basic rules that we follow for prompt engineering. First thing you gotta do is set a persona or a role for your ai. In our case, the role of the AI is to be an evaluator, which basically takes the input or the feedback that the user has provided and take the elements out of it and compress reliability, productivity, and positive energy, and break that between quantify. Next, you provide an action. So you basically tell the AI how it needs to do the task, and you provide the positive and negative cases. So in our case, we had three categories, and if the comments were not adaptive to the one of these categories, it would rate it minus one. So these are some of the negative cases we had inputted. So the AI can let us know what exactly the feedback is referring to. Next, we provide the variables. In our case, the variable is just the feedback that is in input from the user. And then we also said, what is the output format? But we wanted adjacent output format, which has these three items, productivity and positive, negative. And it should have a value between one to five in negative cases is going to have a minus one. Now to do this in Amazon Bedrock, you have a prompt builder playground, which allows you to create a prompt and test out all the different variations of it as well. So you can set a prompt, you can set the variables that are there in the prompt in our case. That's the feedback, and you can test it up. It also allows you to create different variants of your prompt so you can see which prompt is working best for you. You can test it out against the same model, or you can test it out against different models based on your requirement. Now, if you look at this example on the left side of the screen, we are asking the prompt to be lenient with the ratings, and on the right side, we are asking you to be respect. It is just this one small line of difference which affects the output a lot. You can see the responses that is coming from the ai. It's the same model, but there's only one line of difference. This is how you experiment with your prompts on different types and see which is the best prompt that works for you. Now with all this being done, we created a chat bot interface where the user can, provide the input, provide the feedback, and three months go by. We have a lot of data. We have about 500 feedbacks across the organization. Now we start to think what we can do, with this data, can we make it more insightful to the user or to the people so they can start improving on themselves? And that is when we started looking into vector databases and drag. Before we jump into what we did with the application, let's go through some basics. What is a vector? A vector is basically a mathematical representation of the data that you have provided to it. So it's basically an array of numbers to oversimplified. A dimension is the property of the data that you're providing. So let's say we are providing fruit, which is apple or a fruit that is orange, and this for different properties, like what is the color? What is this sweetness? What is the sadness? All these becomes, each, becomes one of the dimensions, which is basically a characteristic of the data that you're providing. And what are indexes? Indexes are the entry points for your database. So it can search a process. So let's say you have an index for fruits, and when you provide the input, it's going to search on these indexes and find out what you're looking for. Okay, now what is embedding? Embedding is basically the process where your raw data with j text or any kind of data is taken, converted into vectors and stored into the databases, into the right indexes. This entire process is called embedding. And what is rag? So basically when you ask a question like list some red colored fruits, AI understands what is the intent of your question and creates a vector out of it. Now that vector is used against the indexes that got created and it does the nearest neighbor search, or there are multiple search patterns, but nearest neighbor search is the most popular one, and it identifies the data that you're looking for and provides the response. In this case, it will take that and provide the first what has happened. So if you were to do it in Amazon Bedrock, there's a very easy way to get started. You can start with something called, let's chat with the document. So Bedrock has this, functionality called knowledge Basis, and over there you can create different knowledge basis or all. You can start with the easy one, with the chat, we can talk about. As it sounds, you just upload a document in the portal and you can start chatting with it. It gives you a nice chat interface. You can provide a custom prompt if you want, but that's basically it. You just upload a file and ask questions on that file. Now to create your own database, there are multiple options. So firstly, you can create a knowledge base with a vector store, which is the most common pattern, but with the recent reinvent announcements. They also improve these functionalities. So now you can create a knowledge base with structured data stores like Amazon Redshift, which is the data warehousing solution. Or you can also create them on Berra Gen AI based indexes. So for our use case, we started with the simple one, which is the vector store. So the way you do it is you provide a data source. So in our case, we use S3. So all the feedbacks that we got from the users, we started uploading them into S3 bucket, and that's the source that we provided for the knowledge base. Next, you have to select the parsing strategy. Now, what is the passing strategy? If you're using, I mean if your data consists of J text or simple data, then the default Amazon bedrock parcel is good enough. But if your data is complicated, like it has some kind of media content, images, videos, audio, anything like that, then you probably need to use another model to do the parceling. For our use case, we use the default parcel. Next comes the chunking. Chunking is basically a way of splitting your data into si, into smaller, in smaller sizes and get it stored in the database. Think of it like a record that you do in a regular database, except that one chunk is stored in a particular data vector database. So this is very important. So here's what happened in our case. So we started with default chunking. So we selected default chunking, gave the S3 file, S3, market as a source. It did the chunking. And when we asked the question, can you provide, feedbacks about this user, it did provide the feedback and it did summarize it really well. But the problem is it summarized the feedback across all the users. So if I'm asking feedback for Sandeep, instead of giving just my feedback, it was giving feedback for everybody else as well. And that's when we understood that the default chunking that we're using is not working out. And then we tried our different options and eventually ended up with no chunking. So when you select no chunking, what happens is that every single file you provide in the S3 bucket is considered as one single chunk. So my file basically becomes one single chunk and everybody else is as well. So now when I ask the question about my feedbacks. It'll query only my set of feedbacks and summarize and answer the question that I'm asking for. Now, if you want some advanced chunking strategies, which none of the current ones fit, you can also use a Lambda function to do some kind of custom parsing and add your own logic on how you want to on these files. Next, you select the embedding model. So like I said, your data needs to be converted into vectors, and that is done by an embedding model. In our case, we use the Titan Embeddings, but there are other offerings available as well. Now, when you select the embedding, have a look at the vector dimensions that it creates. So in this case, the Titan Membranes creates thousand 536 dimensions. So every single piece of data that you provide is split in 2,536 dimensions, each of them having its own properties, which signify a particular characteristic of the data that you're providing. Next, you select the vector database that you want to create. Now, AWS has made it easy. You can just select a quick create and it creates a database for you. There's. You don't have to configure much, but if you're already an advanced user, you can select a vector database that you have created by yourself and provide the specific information that it is working for. So when you select the, quick create, there are a couple of options. So Amazon Open Search is the default or the most widely used vector store, but with the recent announcements. There is also support added for Postgres SQ and Amazon Neptune, which is a high-end analytical database, which, you can run graph queries. Graph queries is more, is like an advanced rag. so RAG does relationships on one level graph, does it on two levels. So that's the simplified version of it. So for our use case, we started testing out with open set serverless. So once you select, if you want to create your own vector databases, these are the options that are available for you. So once you select the database, on a quick create, all you have to do is click next and your data source is created. Now, after the data source is created, you need to do something called as synced. So that's when. Amazon Bedrock actually takes the file from S3, creates the embed and stores it into the database. Once that is done, now we can start, querying your database. Now there are multiple search options that are available by default hybrid, search work for us because it searches both on semantic and on text, but it's again, left to you based on your data choice. Next comes the number of source chunks. So our use case was very specific. We want only one particular file for one user, so the number of chunks it had to return is only one, but based on your use case, you can increase it to up to a hundred and, retrieve the data that you are looking for. And then you select the model that, you want to run the database on Next. Knowledge base also has a custom prompt that you can provide. So by default when you're trying to query a knowledge base, there's an buil prompt that Amazon use us. For some reason, if that doesn't work out for you and you want to, tell the knowledge base how it needs to query the database, you can do that. You can provide a custom prompt where you mention or you give a prompt, which says how the data needs to be understood and what kind of questions the users are asking for, and, to make more context out of it and give you the best or most relevant answers. So this is what the result looks like. So our queries, have two things. Email and query. Email is who's requesting it and what is the query. So let's say in this case, manic is asking, provide my top three points of improvements. So what this does is it takes monk's file, which has all the feedbacks that he has received so far, identifies the intent of the question, summarizes all of them, and gives the feedback. Same with the next user. We provide the email ID and ask for the say, points of improvement and things like that. So it is going to take the feedbacks that I have received, and it's going to summarize that based on the question that it is that you have asked. Now with all this in place. We enable this chat bot for. So as an admin, I can query other people's feedback to understand how they're doing, what improvement points I need to do on. So this essentially started as a mentor mentee relationship. So I. What we started doing is as mentors, we would query our mentee's feedbacks and ask questions like, what, how are they doing? Or, what are the things that they need to improve on? And then we get a summarize result based on what everybody has said for them. And we take those points and start guiding the mentee on how they should improve themselves. So this is how it started. And then we thought, why not just sustain, enable it to the users themselves? Then they can ask questions and see how they're doing and they can improve it. They can improve themselves. They don't, we don't need to have this, feedback sessions every time. They can query the question whenever they want and work on the points that they need to improve on. But there was one cache for this. The current chat bot, allows anybody to request anybody else's feedback. So that's when we started looking at agents. So, as an oversimplification, an agent is basically a custom functionality that you want to put in, to the thought to, to the thought process of an ai. So in Amazon Bedrock, you have, agents where you can create your agent, specific agent that you're looking for. Select the model that it needs to run on. And the beauty of it is the actual agent interception that happens inside an AWS Lambda, which is a serverless component, and you can create multiple agents for the same, multiple lambdas to be used in the same agent. And you can change them all using the prompt that you're writing. So when you create the agent, you provide the prompt, it says how it needs to interact with these agents to provide the response that you're looking for. But in our case, it's very straightforward. So we are telling the AI agent, you are going to receive a query, and this is the flow you need to follow. Your flow is identify the user ebit, identify what sort of question they're asking for. Is it a question for themself, or they're asking the question for other users. So create these two inputs and call an action group. So action group is basically that it needs to invoke a lambda. So this Lambda is going to get these two inputs, which is the email and what kind of query they're asking for. Inside the Lambda, we are calling a database, which is having the list of admin users and regular users. Now, if the question is by an admin, we allow the request. But if a user, regular user is asking a question for other people's feedback. We deny the request and we have instructed the agent in such a way that based on the response it receives from the action group, it should either allow the request or deny the request. So when you create the agent, the first thing you do is provide instructions to the agent. And you can also enable memory by default, it is disabled, but, if you want to have consistent or sessions, you can enable the memory and consume it accordingly. Next you create the action groups. The action groups are basically a bunch of lambdas, and, you provide the name of the action group in the instruction that you write for the agent and tell what it needs to do or how it needs to call this action. And what, what is the next action that it needs to take. Apart from this, to a agent. You can also provide the knowledge basis, so the knowledge base that we created previously. You can add them over here, and now the agent has access to the know action groups and the knowledge base. So you can basically tell it, take these three actions, query this knowledge base, and then call another action. But anything that you want based on the workflow that you're looking for, can be done by using action groups and knowledge bases in conjunction with N Agent. So this is how we define an action group. You start by giving a name. You select some basic parameters, and you create the Lambda. Now this is the beauty of it. So here I'm specifying two parameters, query type, and email. In my prompt, I have mentioned the instructions that it needs to make these two parameters or identify these two parameters and then call the action loop. So when the lambda is invo, it is always going to get these two parameters. Now, this is how the Lambda looks like, is just simple. code. We are just having a list of admin users and, when the input comes in, we are just sticking against that list. If it's an admin, allow the request. If it's not, you're just going to throw a response telling you're not authorized to run this query. And then the agent takes this response and, summarizes that and provides it to the end percent. And this is how it looks like. In the first one, the email is random at anstat io, which is obviously a fake email, and the query is list Sandeep's, top feedbacks. So basically some other person is asking my feedback and obviously random is not part of admin's group. So the agent is going to respond, telling you don't have access to run this query. In the next scenario, I, the email is sand t io, which is my email id. And I'm querying my feedbacks. just list my feedback and summarize it. Integrate 20 words, and of course the AI is going to query the knowledge base and provide the result for it accordingly. Now, once we added this functionality, once we are able to segregate the queries between what a user is asking and what an admin is asking, we enable that chatbot to all the users. everybody could look at their queries, ask questions about how they can improve themselves, or what are the more three positive comments, what are the negative comments, and, get a dynamic of how they're doing and move forward with that. At this point, we wanted to see what we can do more, what we can push the system to do more, and that's when we started looking into an ambitious plan. What we wanted to do was, we had the skill assessments for each of these users. so we wanted to see if we can integrate all of that into one platform. And at the end of it, if a user is going to ask a question, how can I improve my career? Improve, we take the information of what the feedback, of all the feedbacks that user has received. We take their skill sheets. We take their current role designations, what is their next role designation? Summarize all of this and point the user into the right direction to get their promotion. Imagine doing this. You don't have to sit on, on a call with your boss at the end of a year to know whether you're going to get the promotion or not. You can query this chat bot, say once in three months or once in a month, and see. What you required to do to get that promotion. We call this section, divide and Concur. So you can do all these things in a single prompt or a single agent, but it gets the system complicated every time. you write a prompt, which has multiple things, multiple steps to do, the AI becomes in such a system. That's why we call this divide and conquer, where we split the execution or split the functionalities into smaller AI chunks and consume it that way. So Amazon Bedrock has a beautiful way with, let's just do this. They're called pro flows. So pro flows are basically, if you're aware about step functions, is basically a step function for ai. It allows you to create different prompts and, in Lambdas S3 downloads, knowledge basis, agency, all these things, and it allows you to create a map out of it or a workflow out of it and lets you take the action that way. So in our case, we split the entire functionality that we wanted into smaller pieces, which allows us to give more control. The other advantage you get by doing this is each of these chunk. You can have a different AI to do that particular thing. So if you're having some small classification that needs to be done, you don't need to run a CLO 3.5 or a CLO 3.7, you might as well just run a smaller model, which is more cost efficient. It might take a little bit longer time, but it is way more cost effective than using a very large, expensive model. But this is what we did. So our first step is a prompt, which is to identify the intent of the user. So this prompt just identifies who is the user and whether the question is for them or for somebody else, and then it passes on to a Lambda, which does the role check for the user. So this lambda is connected to a database which has the list of admins. And the regular users. So based on the input that it's getting, it's going to identify whether the question is, from an admin or a user, or whether they're asking for themself or for others, and, provide the response accordingly. And then it goes into a condition block. In this condition, it checks the output that is received from the Lambda. So in the Lambda, we say next step is proceed, or next step is n. But based on this, it is going to take the next action. If the action is to end the execution, we have, it moves to a flow output, which is basically the end of the prompt flow. If it's not, then it goes into classified cushion. Now, this is where the beauty comes in. So we wanted to enable the chat bot to do much more than just summarize feedbacks, right? So this classified question identifies what kind of question the user is asking. Is that a question about their feedbacks or is it a question about their overall career at the company? So based on this, it is going to take the next steps. So after classifying this information, it is going to call the knowledge base. So the knowledge base has the information of all the feedbacks that the union has received. So irrespective of whether a question is about their career or whether the question is about their feedbacks, we need to query this knowledge base to summarize the feedbacks. After getting this information, it goes into this lambda, which is called Data enricher. Which gets the intent of the question. And, whether it is a question about their career or whether it's a question about the knowledge, feedbacks and it also gets the output from the knowledge base. So everything that the knowledge base is summarized and the raw data it gets offered. Now this lambda. Based on the, intent of the user, it is going to query multiple other sources. The other sources is going to query is our classification of, user roles. Basically, what is their current designation, what is going to be the next designation, and it is going to query their skill sets. What are the skills that they have and to move to the next role. What are all skills they need to have as a mandatory and inform It gets all this data from multiple sources that we have stored. And it combines all this information and sends it to the final prompt. So this is the summarizer prompt, which basically takes the input of all the data that we have received so far with the user question and so on, and then it summarizes all of this and provides a neat response to the user where the user can, use this bot to help themselves out. So this is what it looked like. So let's say, the user is asking a question about somebody else and there's a regular user, the execution fails, we throw a unauthorized error. Next, a user is asking a question about their feedback. So it queries the knowledge base, it doesn't query the role, it doesn't query the, skill sheets or anything like that. Just plain, feedback summary. And this is the interesting part. So in this. Particular query. The user is asking how to improve my career. What are my setbacks? So we query feedbacks, we query this person's skill sheets, we query this person's current role, what is the next role? And summarize all that information. And this is how the AI response, it provides the strengths based on all this information and based on what people have said is going to provide the areas of improvement. It also says to get your next, promotion what you need to do. and it's not just some, random information. All of this is accumulated out of the feedbacks that this person has received and the expectations that we have set for a particular role. So if you take these points seriously, then you will improve in your career and you will be in the right track for the next promotions. Another nice thing note about this is the query also asks, what are my setbacks for this user? He has not received the feedback, which is so negative that it has become a setback in his career. So the chat bot is going to respond to, there are no major setbacks. It is what it is. So imagine doing this in your company. You don't have to wait at the end of the year. To get to know whether you are getting the promotion or not. You can ask a chat bot once in three months, once in a month, anytime that is, and see if you're on the right track to get a promotion. Imagine how simple your conversation with your bosses at the end of the year by looking at these feedback, you can, you can provide the data to your boss telling why you need that promotion. Now coming to the pricing. So Amazon Bedrock, like any other, AI provider is charged based on the number of tokens you consume. both the input tokens and the output tokens. Now, if you're using the knowledge base, there is no explicit charge for using the knowledge base, but you're charged for the underlying vector database and the queries you run on top of it. It's basically like you're not charged for cloud formation, but you're charged for the resources that deploys a similar fashion. Same thing goes with agents. You're not charged extra for using the agents, but you're charged for all the resources you create under, like the Lambdas, the knowledge base, and the queries that you run, how many tokens it consumes. Now, coming to the time, this is the most, fascinating part, at least for me. To build the entire thing I showed in this presentation. It takes less than 30 minutes if you know what you're doing. It just takes less than 30 minutes. The first time I did pro flows and everything else. The entire time it was less than two hours. That is how easy it is to get started with services like Amazon Bedrock. It just makes your life so easier. You don't have to think a lot on what is happening. You can just build it, test it, and then understand what exactly is happening in the system. And that's it for my session. Thank you everyone for joining. You can reach out to me on LinkedIn, gi or you can email me at my official email id Sand patan stack. I, it's a pleasure meeting you all and I hope to catch up with you soon. Thank you everyone.

Slides

Download slides (PDF)

See all 40 talks at this event!

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

LamRAG : From Data to Constructive Insights using Amazon Bedrock

Video size:

Abstract

Summary

Transcript

Slides

Sandeep Kumar P

Principal Solutions Architect @ AntStack

Join the community!

Featured event

2025

2024

Info

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

LamRAG : From Data to Constructive Insights using Amazon Bedrock

Video size:

Abstract

Summary

Transcript

Slides

Sandeep Kumar P

Principal Solutions Architect @ AntStack

Join the community!