Vector Embeddings and RAG Demystified

Video size:

Abstract

In this talk I demystify the world of vector embeddings and explore their pivotal role in enhancing our interactions with vast data sets, ensuring AI systems can offer personalized, contextually relevant responses.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi everyone. Thank you for joining this conference, con 42 conference and for joining my session. I am er and I'm a developer advocate at Amazon Web Services. I will keep my session in. Less than 30 minutes, I will do a very high level introduction of vector embeddings and retrieval augmented generation, and we'll take a look at how these two actually compliment each other. If you have any questions or any feedback while you're watching this session, please find me on LinkedIn. I'm there as er, and I would love to hear your feedback. So let's not waste any more time. Let's get started. Have you ever wondered how music applications suggest songs to you, or how shopping applications suggest products that perfectly match your taste? To understand how all of this works, you would have to dive into the world of vector databases, and this is where data isn't just stored as rows in tables, but it's also mapped out as geometric points in space. So now taking a look at a simple conversation with a large language model, and throughout this talk I will be interchangeably using the words large language model, LLM Foundation, model based model. Every time I use any of these, just know that I'm talking about the same thing. So looking at a brief conversation. Let's say you've got an application that you've built and the user of your application goes in and they type. The capital of France is, but because your application has generative AI capabilities, there is completion of the sentence. The application returns Paris. So now we have the portion that the user in inputted, and then we've got the session or the portion that AI returned through these LLMs. So we've got Paris returned by ai. So how does all of this work? How did it happen that we got this return of Paris back to the front end for the user first? Now looking at large language models, defining them before we look at how they work. Large language models that are known as LLMs as well are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consists of an encoder and a decoder that has self attention capabilities. This means that the encoder and the decoder extract meanings from a sequence of text, and they understand the relationships between words and phrases that are in this text. So this means then that we cannot send raw text directly to models because machine learning models only understand numbers. So when that text was typed by the user in the front end, there would've had to be a conversion to numbers so that there can be engagement or interaction with the models. What we are looking at is a numerical representation then of that sentence that we saw earlier on. Looking at another example, let's consider the terms coffee and tea in a hypothetical vocabulary space, these two could be transformed into numerical vectors. So if we visualize this in a tree or in a three dimensional vector space, coffee might be represented as a set of numbers and tea might be represented as another set of numbers. So these numerical vectors then carry semantic information. This indicates that coffee and tea are conceptually similar to each other because they have an association with hot beverages. This means then that they likely will be positioned closer together in the vector space. Now we are adding another word cappuccino. And cappuccino could be something in between. Cappuccino has some coffee in it. It might not look like it on my graph, but it's very likely that cappuccino will be stored a lot closer to where coffee is because there's coffee in cappuccino. The actual positioning of these words is based on a training process and is also based on how these words are used on the large amount of text that is used for training. So we can think of these words on paper as vectors, and one way to look at them is like looking at them like they are arrows. All of them starting from one of the corners and the direction of the arrow. Then bringing the meaning of the word. Now I can start to scale or setting up a scale on the sites and writing down the distance from the corner to each of the words. I can use these numbers to uniquely identify these words in these two dimensional space that we have. And then by doing this, we have effectively translated words into numbers. And these words embed some of the meaning of the original words. Of course, two dimensions like. One piece of paper are not enough to map the complexity of a vocabulary into numbers. For this, we need more dimensions and we need more numbers, and we need hundreds, if not thousands of them. So right now, where do we stand? We've already looked at the text that was captured by the user in the front end, and they said the capital of France is. Then we saw the input context represented as tokens in the numbers that we saw earlier on. And we've had, we've since then gone and defined what embeddings are, which are these numerical representations of how these words are stored. So we've learned that converting our data into vectors is the first thing that we need to do. So now let's think about this. Does this mean that LMS then can answer all of our questions? So embeddings transform data into numerical vectors, making them highly adaptable tools. They enable us to apply mathematical operations to assess similarities or integrate these words into various machine learning models. There uses. Diverse ranging from search and similarity assessments to categorization and topic identification, and this flexibility is great because it then makes embeddings a fundamental component in many data-driven applications. There are. Different ways to create embeddings. Just some that we could consider and we'll look at a demo later on using bottle three in AWS and you could also use link chain bottle three. We can use that with a bedrock client, which will be looking at later on. And I talked about a demo. So we look at a demo on how we can create embeddings using bot three with a bedrock client. And what we are now seeing on the screen is sample code that we can use for this and just. Stepping through it. We see right at the beginning that we initialize a session with AWS using bot three, and we then create a client for the Bedrock Runtime Service. Next, we then define a get embedding function, which takes text as input and also then utilizes the Amazon Titan Embeddings model to transform this text into embeddings. And once the embedding is generated, the function then returns the embedding vector. We can see there that's returned as a response. So now moving on from vector embeddings and looking at retrieval, augmented generation and Reg, and looking at how these two actually talk to each other or how they relate or how they compliment each other. Now looking to define re. Imagine that you have a database or you've got a document. This is in PDF, and this is storing courses that are available for your internal staff at your organization. So staff can just look and see what kind of training they can have, and this is all internal to your organization. So can anyone answer then any questions that are related to these internal courses? Can they answer questions such as, how many members of staff have completed a particular course? How long on average does it take an employee to complete a course and which course is the most popular? And at this point, this is where then customization of the response from the L LMS then becomes a necessity retrieval augmented generation. Also known as Reg is a process of optimizing the output of a large language model. It does this so that and it does this by referencing an authoritative knowledge base that is outside of its training data sources before generating a response and returning that to the user. LMS are trained on vast volumes of data, and they use billions of parameters to generate original output for tasks, like answering questions like translating languages or completing a sentence. RE extends this already powerful capability of LLMs to specific domains or to specific organizations, internal knowledge base, such as what courses are being taken by staff at this organization. It does all of that without the need to retrain the LLM. It's a very cost effective approach to improving LLM output so that it remains relevant. Accurate and useful in various contexts. So you might feel that, okay, this is too much of a task. How do I now, in addition to the LLMs that I already have, how do I then bring this authoritative information or data close to the L LMS so that I get a customized responses? So you could be thinking, how do I manage the multiple data sources? How do I create vector embeddings for large volumes of data? How do I do incremental updates to vector stores, the coding efforts that might be involved, the scaling of this retrieval mechanism and the orchestration of all of this. So on AWS we have Amazon Bedrock Knowledge Bases, and this feature gives foundation models and agents contextual information from your private data sources. For Reg, with or on Amazon Bedrock knowledge bases, you get fully managed support for end-to-end workflows. You get to securely connect your foundation models. And your agents to data sources. You get to easily retrieve relevant data and augment prompts, and you get to provide source attribution. There is, we saw earlier on with the previous slide, the text generation workflow portion of engaging with retrieval augmented generation. So this, you get to do an Amazon bedrock. This is where you augment the prompt before you get the response that gets sent back to the user. This is after you've received the input from the user. And there's also the API, the retrieve API that is used to enrich or get more context. That also gets to be used to enrich the prompt or augment the prompt. All of this is happening within Amazon Bedrock knowledge basis. And then ultimately there's another API retrieve. And generate API that would then do the retrieval of the response that then gets sent back to the user in the front end. So you can get started with Amazon Bedrock you can scan the QR code that you see. And if you wanna find out more or learn more about large language models, there's another QR code that you can scan. And there's also training that I've seen from one of my colleagues on multimodal re and embeddings with Amazon Nova and Bedrock on AWS. You can consult that course to learn more and dive deeper into the topic that I introduced us to today. And with that, I thank you very much and thank you for. Coming to this session or for watching this session, and I look forward to hearing from you on LinkedIn about the feedback that you have regarding this topic that we just went through today. Thank you very much and enjoy the rest of the conference. Bye-bye.

Slides

Download slides (PDF)

See all 137 talks at this event!

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Vector Embeddings and RAG Demystified

Video size:

Abstract

Summary

Transcript

Slides

Veliswa Boya

Senior Developer Advocate @ AWS

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2025 - Online

May 08 2025 - premiere 5PM GMT

Vector Embeddings and RAG Demystified

Video size:

Abstract

Summary

Transcript

Slides

Veliswa Boya

Senior Developer Advocate @ AWS

Join the community!