Vectoring Into The Future: AWS Empowered RAG Systems for LLMs

Video size:

Abstract

Dive into AWS’ innovative toolkit for Retrieval Augmented Generation (RAG) systems. Harness the power of vector databases, SageMaker JumpStart, and/or BedRock to supercharge your Large Language Models (LLMs) and reshape your GenAI landscape!

Summary

Samuel Baruffi is a solutions architect with AWS. He wants to talk about vectorizing into the future of retrieval augmented generation systems using large language models. And then in the end, a quick demo just showcasing the capabilities of bedrock and open search.
AWS is quickly growing the list of services and capabilities that support customers. With a single model now you can perform a combination of different tasks that in the past wouldn't have been possible. What is capable today might very quickly advance in the near future.
There is something called vector embeddings. Embeddings are semantic representations of words by translating into vector, mathematical vectors. They carry the semantic understanding behind the text that you are embedding. We're going to talk about some databases that AWS offers with the ability to store those vector databases.
With open search service on AWS for vector it supports. OpenSearch is one of the main vector databases on AWS. Vector support on AWS is also made available through document DB. Amazon Memory DB for Redis now also have the ability for vector storage. Last but not least, you also have Amazon Neptune analytics.
Amazon Bedrock is the easiest way for you to build generative AI applications on AWS. With Amazon Bedrock you have a broad choice of models, as you can see here. Bedrock has all the encryption capabilities, privacy capabilities, not using your data to train any of those models.
In my demo I'm going to show you actually in the demo is knowledge base for Amazon bedrock. What knowledge basis achieves is to automatically automate all the ingestion and retrieval for you on this reg system. With knowledge base you can use a single API call to do the retrieval and generation.
The service allows a no easy and scalable way to create generative AI on bedrock. We're using vector engine open search serverless database. Demo shows how to ask a specific question to a foundational model without a reg system.
A knowledge base for bedrock allows you to just retrieve the data or retrieve and generate, I'm going to show you both. The console, you can actually also run via APIs. Over three 3300 new features and service were launched by AWS in 2022. Very powerful rack system.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Thanks for joining my session. My name is Samuel Baruffi. I am a solutions architect with AWS and I'm very excited to talk about vectorizing into the future AWS retrieval augmented generation systems using large language models. So a quick agenda for today will be the following. We're going to quickly talk about what are foundational models, large language models. Then we're going to talk about some of the capabilities that are very easy and important to use when it comes to generative AI. Then we're going to talk about some limitations of those foundational models. Those models are amazing. It has revolutionized and it's still revolutionizing many, many industries across the world, but they have limitations. So we're going to talk about what are those limitations, and we're going to talk about potential solutions, especially using retrieval augmented generations, which Reg is short for. Then we're going to talk about what type of databases can help us, you know, improve those foundational models with Reg. So we're going to go through the list of currently supported databases offerings on AWS for vector. We're going to explain, explain at a high level the capabilities and the differentiations across those offerings. And then after that we're going to talk about Amazon Bedrock, which is a generative AI managed service on AWS that allows you to very easily consume different foundational models, both image generation text to text, and also embeddings. And then we're going to talk about Amazon Bedrock knowledge base, which combines the powerful rack systems into the bedrock ecosystem and it allows users to very easily configure retrieval augmented generation systems using bedrock foundational models and also using AWS vector databases that are managed. So we're going to talk about how those two words can come together to really empower a lot of companies and users to create very powerful generative AI solutions. And then in the end, we're going to do a quick demo just showcasing the capabilities of bedrock and open search. So without further ado, let's get started. So WiFi national models, right in a before transformers and generative AI was really powerful in the past, traditional machine learning models were really trained and deployed for specific tasks. So you might have some models that were for specific task generation, some models that were really able to do Q and a, some bots, some models that maybe were able to do some type of predictions. So they're really specific models, but you need to deploy all these different models to potentially achieve the combination or the collection of different tasks with generative AI and transformers, the foundational models, if you think about quickly on the traditional machine learning models, you'd have a lot of label data, and you train those models to that specific label data. Right? What foundational models? Using transformers enables users to actually do all of those tasks within a simple, not simple, but within a single model that has been trained with unlabeled data. So foundational models sometimes are also referred to as general models that have good word representations and can do a lot of different tasks that in the past, you need to select the different models. It is very powerful because with a single model now you can perform a combination of different tasks that in the past wouldn't have been possible. So generative AI, you can use for many, many different use cases. Here, it's just demonstrating into four different categories, the capabilities of generative AI. So you can enhance customer experience by having, you know, agent assistance or, you know, personalizations or chat bots that will help enhance your customer experience. You can also have boost, you can also help boost employee productivity with conversational search. Let's say you have vast amount of internal data and you want to make very easy for users internally to consume the data to improve the productivity. Foundational models can help solve that problem and a very good solution. You can also improve business operation. So if you're doing a lot of document processing that maybe before was done by manual labor, you can use those foundational models to potentially process, you know, some entity extraction or maybe document processing or maybe generation of documents. You can use generative AI, and then of course, creativity. With a stable diffusion models, you can create many different images. You can do video enhancements, you can create music. So those generative AI models are not only text generations, but you can generate images, videos. And this is a very fast paced, evolving technology. So what is capable today might very quickly advance in the near future. The text to text models are really, really powerful today. Images have become very powerful. And now we can see that video generations are just starting to get more powerful than ever. So what does aws in terms of generative AI? Right, so AWS is very quickly growing the list of services and capabilities that support customers. To use generative AI, we can, we have Amazon Sagemaker, which is the platform for any machine learning AI requirements, training, inference, evaluation, you know, data ingestion, data cleaning, you can name it. But when it comes to generative AI, Amazon Sagemaker has a, a foundational hub called Jumpstart, where you can actually deploy many, many different foundational models that are going to be, that you're going to deploy within Sagemaker. Sagemaker is going to deploy the infrastructure for you and it's going to match that infrastructure. But then we also have Amazon bedrock, which is a completely managed service with pay as you go approach, that you can select a variety of different model providers and models within those model providers. We're going to talk in a couple of slides into the future of this presentation that are going to present some of the models that are capable. And Amazon has also done a lot of innovation at the hardware level. So you can see with Amazon EC two, TRN one, which is training instances, which are instances that have proprietary innovative accelerators for machine learning training from Amazon that really optimizes the cost performance for companies that wants to train their own model. Those could be foundational models or could be traditional machine learning models. But we also have Amazon EC two, InF two, which is short for inferential two, which is a chip that is optimized for accelerating inferential inference from your machine learning models. Those could be foundational models or any other type of model. And then last but not least, Amazon Codewisper, which is a generative AI power coding assistant that helps developers with code completion security scams. You know, chat, you can chat with your code and receive recommendations and different helps in terms of fixing bugs and so forth. So those are the things that AWS offers for generative AI capabilities. And, you know, there are a lot more that goes within those services in terms of functionality and features as much as those models. Foundational models are really, really powerful. There are limitations of large language models and, you know, large language models and foundational models sometimes just use together. But you know, large language models are just models that can generate text or embeddings that really made it possible to use generative AI as we know today. But what are some of those limitations that we know? So first of all, there is really limited contextual understanding. So the model, because it has been pre trained, he only knows information to the date and is not going to know proprietary, you know, private information. So he has limited contextual understanding of what you are asking. You might be asking some question that is ambiguous and it might have, you know, a contextual limitation. In that sense, he also has lack of domain specific knowledge. So if you, if you work for company a and company a has a lot of private documentation that it was not on the Internet, or even if it was in the Internet, it might not be an expert on that domain specific. So they are known to not be super good in specific domains, especially if those domains don't have a lot of data on the Internet. Which most of those models get trained on top of it. So this is a big one, lacks explainability of interval ability. So it's very common that those large language models might hallucinate. And hallucinate is it just means when a response, an output from one of those models are generated is stating a inaccurate and not factual correct information, right. So there is very little explainability of why that information began to be. The way those models works is just predicting the next word and they might just spit it out. A lot of not factual, accurate data. And it's really hard to know why they have done that. So they lack explainability and interpretability. And again, inaccurate information. It's what we just described, which you might ask a question, the model might give you an answer that sounds very, very confident that answer is correct, but in fact, it's just a made up answer and is not accurate, not neither factual, accurate. So with that said, with the limitations that we know, how can we potentially, what are the solutions that we can put in place to help solve this problem? So there is something called vector embeddings. And what are vector embeddings? So vector embeddings are using these foundational models. Embeddings are semantic representations of words by translating into vector, mathematical vectors, float vector vectors. So you can think about if a user inputs, you know, New York and it runs into an embedded model, and an embedded model is just a large language model that is able to convert text into a array of float numbers in a vector. So you can see New York might be the vector representation of New York, might be the one you see here. There are different dimensions on vectors embeddings. The bigger the dimensions, the more data and the more float numbers you're going to have on the vector array. And why are vectors embedding? Is important because they carry with those numbers, with these mathematical arrays of flow numbers, they carry the semantic understanding behind the text that you are embedding. So, and we're going to talk in a moment why they are important. But it's really important that if you have, you know, terabytes of data that you want to store and you want to very easily retrieve that data based on semantic understanding. So you're not doing just an exactly match search, you're asking a question. And that question might be related to some of those, the context in your text that is also known as semantic search. So the numbers will carry a representation of the text in itself. So now that we might have generated. So it's very common on, when you have those type of limitations into large language models that we just described. One of the common and best approaches to solve that is to add an ability to retrieve the context from your vector space and add the vector, the text chunks that will be converted back from numbers into text as context to your large language models. But one of the challenges that you have once you create all these embeddings, let's say you have multiple documents internally and you want to translate all those documents, maybe PDF, into vectors, what do you do with those vectors? And here where vector databases play a big role. So you want to make sure you can store those vectors representations, those vector embeddings in a database. And then after you restore there in the database, you have the ability to retrieve by doing semantic search chunks of text that are similar to the question or topic you are trying to retrieve from. So how does vector database works or this vector embedding system? So if you think about in this diagram, you're going to have some raw data. You know, it could be images, it could be documents, it could be audio. For the sake of simplicity, for today's presentation, let's just focus on text. So let's say you have a word document and you want to create embeddings that behind the scenes are going to create vectors, the arrays of vectors for you. So what do you do? You create, you chunk that document into different pieces, because there are limitations of how many words you can create a vector. And it is of course very depending on the embedding model, the foundation embedding model that you use. But then once you have created the chunks, you go through the model and you say, hey, here is the chunk of text. Can you create a dense vector encoding for me? So that is where it creates the vector embedding space which will return an array of flow numbers for you that you create your vector embeddings. You can also create sparse vectors encoding, which is a different way to perform and being more optimized when doing the retrieval. Once you have those vectors, then you can stores those vectors in a database. And we're going to talk about some databases that AWS offers with the ability to store those vector databases. And then after, finally you can build applications that are able to retrieve query your database using semantic understanding and using different techniques like KNN and few other ones that you can just ask a question and you find the close similarities, vectors from the probe and query that you've provided. And after that you just, you copied the vectors from your database. You run again in the embedding model though that embedding model, just convert the vectors into text and then you can consume the text into the foundational models as texture text models that you might have available. Now let's just talk about the capabilities and databases that AWS offers you into storing those vectors. So there are a wide array of databases that AWS provides. They have vector capabilities you can see here on the list. We are going to go through most of them and I'm just going to talk to you about a high level why and how they are different from each other. So we're going to have search engines like open search. We're going to have relational databases like Postgres Aurora Postgres and RDS postgres. You're going to have document databases like document DB. You're going to have memory ink in memory databases like memory DB, and graph databases like Neptune. And all of those databases now have capabilities to run and store vector functionality. So let's just start with our first database. Amazon Zarora is a relational database that is a managed database on AWS. So Amazon Aurora Postgres flavor now has the capability to run vectors using a extension called, which is an open source extension called PG Vector. What it allows you to do is to have vector embeddings stored on your relation database. So if you're already storing your data using a relational approach and you just want to store an additional vector representation of the data, you can install PG vector both on Amazon Aurora and RDS postgres flavor. And once you restore those embeddings, you can support different algorithms such as KNN ANN H and SW and IV flat. Those are just different approaches and solutions on how to retrieve close similarities and chunks of embeddings and text for you. And you know, for postgres apps, the good thing is you don't need to make any driver change. You can just literally use install the extension on Amazon Aurora or RDS Aurora and continue to use your database. So this solution is a very good solution for existing postgres SQL users or any users that prefer relation database. You can actually use them. That right? So it's really powerful. There are a lot of integration. So if you have ML background but you are focused on relation database you, I would recommend you taking a look at Amazon Aurora with PG Vector and talking about PG Vector. PG Vector is an open source postgres SQL extension that is designed for efficient vector similarity search and perfect for levering machine learning with your databases. So it supports storing data along with your traditional data types while maintaining postgres robustness features such as acid compliant point in time recover and PG vector handles exactly an approximate nearest neighbor. Search accommodates in various distance measures like l two indian product and cosine distance. Those are just different mathematical expressions that are going to retrieve the similarity semantic search for you as you can see here, PG vector and with Aurora and RDS, sorry with Aurora are also integrated with Amazon bedrock knowledge base. We're going to talk about that in a moment. You have configurable require rate using these different approaches like HM and SW EF underscore search and IV IVF flat probes. The good thing about PG vector, it can scale to support over 1 billion vectors and the dimension it can support vectors with a 16 up to 16,000 dimension. So that is a very good way if you have relational databases and you want to store vectors and this could be the place you you go. Second, we nietzsche talked about a very powerful service which is Amazon open search. So Amazon Open Search is a NoSQL database that is has been built from the beginning with scalability and as a distributed database for search. So you can use search and analytics engine on top of open search. You have different types of deployment for open source. So you can have a managed service that you manage different instance behind the scene for you. But it's also have the capability to deploy a serverless open search where you don't need to manage, you know, even the service doesn't need to manage any server for you. It doesn't abstract abstracts that away from you. Open search has also the capability restore vector using the KNN plugin. It also supports different algorithms such as KNN, AM, HMSW and IV flat. IVF flat. So you can see that similar to the Aurora postgres, Opensearch has similar algorithm capability. And if you have DynoDB tables, you can actually use zero ETL from dynoDB to move the data into open source service and you can vectorize those as well. So who are open source service? Very. It's a good fit. So if you are already an open source user or if you prefer NoSQL and you want to do hybrid search as well. So let's say you have a piece of text and you want to search both maybe search or field from the text, but also using the vector semantic capability, open search support that capability for you. And with open search service on AWS for vector it supports. I really like the open search and we'll do a demo later on because you can very easily and cost efficient deploy an open search serverless vector database that will behind the scenes manage all the index shared and manipulation of the data for you and it can scale for over a billion vectors with very high performance with the same dimensionality as Aurora. You can also have configurable recall rates via different segments and EF search. And similar to Amazon Aurora, OpenSearch is one of the main vector databases on AWS and integrates very well with knowledge base on bedrock. But also open search has a plugin called Neuro search that it can provide a very seamless integration between your text ingestion and the vector embedding creation. It can talk to Bedrock, you can talk to OpenAI, you can talk with cohere. By using this neural search it can automatically do all the retrieval generation of the battings for you continuing the segment. Vector support on AWS is also made available through document DB. So document DB is a very fast cloud native document database. So again it's a NoSQL database that has MongoDB API compatibility. You have different provision deployment options that it's a managed service. It also supports the same algorithms that I mentioned before, KNN Am IVF flat. By using MongoDB you can just elevate the capability of your vector search if you're already using documentDB or MongoDB. And what we see here is the good thing about documentDB if you're very familiar with document databases specific JSON usage because document database are really powerful with JSON, if you want to vectorize that information by just enabling vector capabilities on your document DB, it becomes very very powerful and continue. This is a very interesting service. Amazon Memory DB for Redis now also have a feature that is currently in preview and hopefully very soon is going to become GA general available that adds the ability for memory DB, which is already a very popular and performant database to have multi zero ability to handle vector storage, index and search capabilities. So memory DB, like the name says, is a database that stores all the data in memory and is ready's API compatible. It's a fully managed service. You can see it supports different word vector searches, algorithms that we mentioned. It has abilities to support up to 32,000 dimensions of vectors. And this is ideal if you really have a workflow that requires single digit millisecond latencies and throughput for your vector. So let's say you are building a chatbot that should be really quickly or trying to do retrieval augmented generation. That is super powerful. Memory DB might be the best place to look for because that very powerful capability and then fine. Last but not least, you also have Amazon Neptune analytics. So Amazon Neptune is the Amazon graph database. It allows you with the Amazon Neptune analytics allows you to have analytic memory optimized graph database engines. You have different discrete capacity deployments to deploy this database. It supports agents w similarity algorithm. You can see the dimension of that. This database for vectors are much bigger with up to 65,000 and it complements. So it's an addition plugin on top of your Amazon Neptune database. And if you why would you use Neptune analytics for your vector database? So if you're using neural networks, use cases where you need to do vector search graph traversals. This would be a very good approach. You can also use Neptune database with serverless deployment, but Neptune analytics only supports discrete capacity levels at this time. So if you're curious to learn more, I know I just covered very quickly these databases. I would highly recommend that you just do a quick Google and search our AWS documentation about how they work. But now I want to talk about Amazon Bedrock. I mentioned before in the beginning of my presentation that Amazon Bedrock is the easiest way for you to build generative AI applications on AWS. And the amazing thing about Bedrock is a completely managed service for Genai models. So you have a choice of multiple models with industry leading foundational model providers that are available with a single API call if you want. You can also customize and fine tune your models using your own organization data. And Bedrock has taken security as job number zero and it has all the encryption capabilities, privacy capabilities, not using your data to train any of those models. So it's an enterprise grade security and private service. With Amazon Bedrock you have a broad choice of models, as you can see here. This list is just as of today in March 30 as I'm recording this session 2024. Right now there are seven different model providers, AI 21, Amazon and tropic cohere, meta nest row and stability. Those models have different capabilities. So you're going to have a text to text model where it's just a foundational model that you send text and it returns text back by predicting the next word. But you also have embedded models such as Amazon text embeddings and Amazon Titan multi model embeddings. But you also have an embeddings with cohere, which is the coherent embedding multilingual. And on top of that, you also have the ability to use bedrock to generate images with stability, AI stable diffusion Excel 1.0, but also with Titan image generator, it's pay as you go. You pay per token that you consume and you can choose the model that you're going to have access. In my demo I'm going to show you actually in the demo, the demo that we're going to show to you today is knowledge base for Amazon bedrock. And this is where I'm trying to bring all my presentation into a single place. Knowing the limitations of large language models that I've discussed in the beginning, one of the ways that you can work around the limitation is by creating a rack system, a retrieval augmented generation. What is a retrieval generation augmented is to bring pieces of data on text into your context before sending to a foundational model. And the ability that you do that, the first thing you need to do is to have a vector database where you can store all the vector embeddings from your specific domain data. You can retrieve the data at the query time. That data is going to be converted from vectors to text and then that data is going to be put it as the context of your query to the foundational model. It's, it can be very cumbersome to build this completely rag solution. So what knowledge basis for Amazon bedrock achieves is to automatically automate all the ingestion and retrieval for you on this reg system. So you connect your knowledge base with a database, you there are currently different supports for databases that are going to show in a moment for vector databases. Then you select an embedding model. Then you put your data on s three simple storage service and as soon as the data hits on that s three you can sync knowledge base which behind the scenes is going to create the embeddings, start embedding in the database and then when you make a call to knowledgebase for bedrock, that call you can decide if that call just retrieved the data from your database or if you want to do retrieve and generation, which means just retrieve the data from Myvector database, send to the foundation model, generate a response with my contacts, awareness information and then give the answer back to the customer. And you can select the model that you want to be used as the foundational model and also the embedding as well. So knowledge base on Bedrock has support for currently different databases. So right now it supports vector engine for open source serverless, redis, enterprise, cloud, Pinecone and Amazon Aurora. There are more capabilities coming soon. For example Mongodb. It's coming to be one of the vector databases support on Amazon Bedrock and hopefully in the future more of the databases that I talked today are also going to be available on bedrock. And the last thing I want to show is with knowledge base for bedrock you can use a single API call to do the retrieval and generation. So if you look at this diagram with a single API call on number one you can think about a search query. So you can say, let's just give an example. You are asking about a proprietary question of your company, right? And you know the foundation model doesn't know the answer. So you can do a search query what bedrock knowledge base you do, realizing that you need to do a retrieval on your vector database. So number two is going to go there, do the retrieval, then behind the scenes going to call your vector database is going to retrieve that embedding. The vector embedding is going to then convert the vector into text. And then on four it's going to send that text as context into your bedrock foundational model, texture text generation. And then finally it's going to send back the generation of answer that it chose. And you can see here soon, you know it's also going to support s three. And now let's jump in to do a quick demo of knowledge base for bedrock. Awesome. So let's just jump into the demo. The demo will be a very straightforward. I have downloaded some files from Amazon shareholder ladder. So you can see here I have the 2019, the 2020, the 2021 and the 2022 Amazon shareholder. What I want to show is I have already created a open search database and I have then linked that database into bedrock knowledge base and I want to show you that it created the vectors automatically from s three. So just first let me show you. I have an s three here. So I created an s three bucket on that s three bucket. I just true those four files. I could have as many files as, you know, I wanted here. And what I've done then of course I've created an open search database. So this open search database you can see here, it's an open search serverless database. I have a collection. So let me just close these ones. I have a collection here I call bedrock sample. So I created this database. There is a dashboard also created for this database that I'm going to show in a moment. But the interesting part here is if I go on bedrock, which bedrock is the service that, that allows a no easy and scalable way to create generative AI on bedrock. The first thing we're going to do is let's just ask a very specific question to a foundational model without a reg system. So without using knowledge base. So you can go here on text we can first let's just look for a very specific, I think on the 2020. There is a mention. Let me just find the mention. There is a mention of 3000. Just bear with me. Let me see if I can find on the document. There is a mention somewhere here. I just need to find that AWS has released over 3000 features, 3000 features and services. I don't think it's highlighting here. So just bear with me. Let me just, let me download this file. So what we're gonna do, we're gonna download the file. Let me just download the file. Let me open the file here. And I think if I search now here features here. So 33 times I was searching wrong. You can see here AWS continues to deliver new capability over 3300 new features and services launch in 2022. So what you're going to ask the foundational model without rag is this. Let's go here. Let's go on bedrock. Let's just choose one of the better models in tropic. Let's just go with instant because I know this is just a fast model and say how. So let's ask the model how many new features and services did AWS services did AWS launch in 2022? So there is going to be the question. You can see I'm going to go in the model you're going to ask and the model says I don't have exact, I do not have. Let's just wait the finish. And it says I do not have the exact count of numbers or new features services like launch 2022. So what this means in this place here is that the foundational model itself doesn't have that information, right? Exactly. But the document that we have knows. So how do you put these two pieces together? Well, the first thing we can do is if we go on the knowledge base. So let me show you how I've created a knowledge base. So I've already created a knowledge base on dialog. And let me show you how the knowledge base works. So let me just scroll. So you create a knowledge base. Then you choose a data source. So in this case the data source is s three. So you can see that is the s three I showed you. If I go here and I show you can see this, we have these files. So you put, you choose the data source first, which is just an s three bucket. Then you choose the model that you want, bedrock knowledge base to create the vectors for you. So we are using a model that is offered within bedrock, which is the title embedding model version 1.2. Then after that you choose a database that you want to store those vectors. Right? So you want to have a database where the vectors can be stored and then you can retrieve that after the fact. So if you look here we have a vector database. We're using vector engine Amazon open search serverless. We have created the index name. So open search works with multiple index and within those indexes you can have a combination of items and documents. So, and we said when you create new vectors please add the, add the vector into the vector field on that item, on that document and add the text, the text itself into the text field. Because remember open search can do hybrid. So search in this case we're just going to do semantic search which is doing a similarity algorithm on top of your vector. So before I ask a question here, let me show you. So I'm going to go on. So this is the open search dashboard where you can run some open search commands to see the data. So if I, this query, what is this query is going to return? Is just going to return all the different documents. So all the different ids within that doc, within that specific index. So you can see this index called bedrock sample index 665 is the same index that we've said here. So if you see here is the same vector index, right. And these open search serverless vector database there is nothing more than just the vectors from the s three files that we have uploaded. So you can see here I have multiple, these specific item has a chunk of this file here. So what we can do, we can just copy any chunk. In this case I've already selected this chunk and I want to show you how the vector is stored. So if you go and you compute this you can see that it creates the index, it creates the id. The sequence number might be because this specific file has been chunk, has been, you know, parsing to multiple chunks. And this is the sequence number 13. And here you can see the vector, right? So you can see a bunch of numbers. I'm just going to, you know, minimize this. But this is the vector. This is where the titan embedding model has been called to generate this vector. And here is the text. So what knowledge base, bedrock knowledge base automatically did for me was copy, copy this chunk, ran this chunk of text into my embedding model and then it generated the vector. So this is the factor. So now what we can do and you can see here, I think this is the one that I want to show if I'm not mistaken. Let me see. Yeah, here. So this is the chunk that we were gonna, that my, I want to show you that bedrock knowledge base will automatically retrieve and generate an answer for me. So remember we tried with just the foundation model, it didn't know, right. But now I have this piece of text and with the vector embedding itself that has this information. So what we can do, if you go back to backdrop, you can go on this tab just for I guess usage, you can select a model. Let's use the same model before and let's copy. Let's actually go. I think that the data, let me just, let's do this. Just give 1 second. Let's go here. I remember I copied this new features and service launch 2022. And if you go back to bedrock and you can see here that I can just say knowledge base for bedrock allows you to just retrieve the data or retrieve and generate, I'm going to show you both. So if I just go and I answer this, how many new features and service AWS launch in 2022? Remember this is exactly the same question I asked the model before and he said he didn't know. So what, what I'm going to do first is to generate an answer. So this is going to retrieve the piece of text and then he's going to send the piece of text to cloud instance as the model. And then finally it's going to generate an answer based on that. You can see here, it's saying retrieving and generating the response. And voila. It worked. So over three 3300 new features and service were launched by AWS in 2022. And you can see that I have the source detail. So if I click to source this layer, you can see that he actually retrieved from my database a chunk and the same chunk that I was showing before that has these piece of data. So what that rock did automatically with a single API was retrieve the chunk, you know, convert back to text, add that text as the context of my question and then send back finally to my cloud instant model to give the answer that you can see here what you can also do. So if you clear this, what we can also do, we can just say generate response. Sorry, let's disable generate response. I'm just going to give the answer this question. Many new services and features. How many new features and service did AWS launch in 2022? When I click run what this does. So you see that I disabled and just said I don't do not generate response, just do the retrieval. So you can see that he has, if you go source detail, it has retrieved multiple chunks for me and I would expect some of them see here, the 3301 of the chunks has responded so in this case it returned multiple chunks. You can decide how many chunks you want to retrieve here, right? You can see here maximum number of chunks. And finally what I want to show you everything that I'm doing. The console, you can actually also run via APIs. So you can see I have, let me just run this for you. What you see here, this retrieve and generate is the API that I'm calling. And here we can give the same answer, just copy the same answer, the same question. Apologies. How many new features and service did a launch in 2022? And you can see this is just going to call this a specific function, which is this function here that is calling a bedrock agent client API call, retrieve and generated. I pass some information like my knowledge base id, the model id that I want to use and the session id, and then it's just actually going to generate the information back for me. So if I run this, you can see it's running and the answer is back here. So what I wanted to show is you don't need to only use the console. Of course there are a lot of APIs that you can use and you know, we can actually see the citations, you can see the citations here. Again, the same citation that I have, it comes from. So the API and you can see the response comes with a citation part automatically. And this is pretty good because what open search the combination of knowledge base backdrop and open source serverless is super powerful because it pretty much removes all the cumbersome and manual actions that you need to do in order to create an ad events. Very powerful rack system. So I hopefully you enjoy. Please feel free to reach out if you have any questions. Have a great conference and talk to you soon.

Slides

Download slides (PDF)

See all 28 talks at this event!

Conf42 Large Language Models (LLMs) 2024 - Online

April 11 2024

Vectoring Into The Future: AWS Empowered RAG Systems for LLMs

Video size:

Abstract

Summary

Transcript

Slides

Samuel Baruffi

Principal Solutions Architect @ AWS

Join the community!

Featured event

2026

2025

Info

Conf42 Large Language Models (LLMs) 2024 - Online

April 11 2024

Vectoring Into The Future: AWS Empowered RAG Systems for LLMs

Video size:

Abstract

Summary

Transcript

Slides

Samuel Baruffi

Principal Solutions Architect @ AWS

Join the community!