Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hi everyone.
Thank you for joining this conference, con 42 conference and for joining my session.
I am er and I'm a developer advocate at Amazon Web Services.
I will keep my session in.
Less than 30 minutes, I will do a very high level introduction of vector
embeddings and retrieval augmented generation, and we'll take a look at how
these two actually compliment each other.
If you have any questions or any feedback while you're watching this
session, please find me on LinkedIn.
I'm there as er, and I would love to hear your feedback.
So let's not waste any more time.
Let's get started.
Have you ever wondered how music applications suggest songs to you,
or how shopping applications suggest products that perfectly match your taste?
To understand how all of this works, you would have to dive into the
world of vector databases, and this is where data isn't just stored as
rows in tables, but it's also mapped out as geometric points in space.
So now taking a look at a simple conversation with a large language
model, and throughout this talk I will be interchangeably using
the words large language model, LLM Foundation, model based model.
Every time I use any of these, just know that I'm talking about the same thing.
So looking at a brief conversation.
Let's say you've got an application that you've built and the user of your
application goes in and they type.
The capital of France is, but because your application has
generative AI capabilities, there is completion of the sentence.
The application returns Paris.
So now we have the portion that the user in inputted, and then we've
got the session or the portion that AI returned through these LLMs.
So we've got Paris returned by ai.
So how does all of this work?
How did it happen that we got this return of Paris back to
the front end for the user
first?
Now looking at large language models, defining them before
we look at how they work.
Large language models that are known as LLMs as well are very
large deep learning models that are pre-trained on vast amounts of data.
The underlying transformer is a set of neural networks that consists
of an encoder and a decoder that has self attention capabilities.
This means that the encoder and the decoder extract meanings from a
sequence of text, and they understand the relationships between words
and phrases that are in this text.
So this means then that we cannot send raw text directly to models because machine
learning models only understand numbers.
So when that text was typed by the user in the front end, there would've had to be a
conversion to numbers so that there can be engagement or interaction with the models.
What we are looking at is a numerical representation then of
that sentence that we saw earlier on.
Looking at another example, let's consider the terms coffee and tea in a hypothetical
vocabulary space, these two could be transformed into numerical vectors.
So if we visualize this in a tree or in a three dimensional vector
space, coffee might be represented as a set of numbers and tea might be
represented as another set of numbers.
So these numerical vectors then carry semantic information.
This indicates that coffee and tea are conceptually similar to
each other because they have an association with hot beverages.
This means then that they likely will be positioned closer
together in the vector space.
Now we are adding another word cappuccino.
And cappuccino could be something in between.
Cappuccino has some coffee in it.
It might not look like it on my graph, but it's very likely that cappuccino will
be stored a lot closer to where coffee is because there's coffee in cappuccino.
The actual positioning of these words is based on a training process
and is also based on how these words are used on the large amount
of text that is used for training.
So we can think of these words on paper as vectors, and one way to look at them is
like looking at them like they are arrows.
All of them starting from one of the corners and the direction of the arrow.
Then bringing the meaning of the word.
Now I can start to scale or setting up a scale on the sites
and writing down the distance from the corner to each of the words.
I can use these numbers to uniquely identify these words in these two
dimensional space that we have.
And then by doing this, we have effectively translated words into numbers.
And these words embed some of the meaning of the original words.
Of course, two dimensions like.
One piece of paper are not enough to map the complexity
of a vocabulary into numbers.
For this, we need more dimensions and we need more numbers, and we need
hundreds, if not thousands of them.
So right now, where do we stand?
We've already looked at the text that was captured by the user in the front end,
and they said the capital of France is.
Then we saw the input context represented as tokens in the
numbers that we saw earlier on.
And we've had, we've since then gone and defined what embeddings are, which
are these numerical representations of how these words are stored.
So we've learned that converting our data into vectors is the
first thing that we need to do.
So now let's think about this.
Does this mean that LMS then can answer all of our questions?
So embeddings transform data into numerical vectors, making
them highly adaptable tools.
They enable us to apply mathematical operations to assess similarities
or integrate these words into various machine learning models.
There uses.
Diverse ranging from search and similarity assessments to categorization
and topic identification, and this flexibility is great because it then
makes embeddings a fundamental component in many data-driven applications.
There are.
Different ways to create embeddings.
Just some that we could consider and we'll look at a demo later on using
bottle three in AWS and you could also use link chain bottle three.
We can use that with a bedrock client, which will be looking at later on.
And I talked about a demo.
So we look at a demo on how we can create embeddings using bot
three with a bedrock client.
And what we are now seeing on the screen is sample code that
we can use for this and just.
Stepping through it.
We see right at the beginning that we initialize a session with AWS using
bot three, and we then create a client for the Bedrock Runtime Service.
Next, we then define a get embedding function, which takes text as
input and also then utilizes the Amazon Titan Embeddings model to
transform this text into embeddings.
And once the embedding is generated, the function then
returns the embedding vector.
We can see there that's returned as a response.
So now moving on from vector embeddings and looking at retrieval, augmented
generation and Reg, and looking at how these two actually talk to
each other or how they relate or how they compliment each other.
Now looking to define re.
Imagine that you have a database or you've got a document.
This is in PDF, and this is storing courses that are available for your
internal staff at your organization.
So staff can just look and see what kind of training they can have, and this
is all internal to your organization.
So can anyone answer then any questions that are related
to these internal courses?
Can they answer questions such as, how many members of staff have
completed a particular course?
How long on average does it take an employee to complete a course and
which course is the most popular?
And at this point, this is where then customization of the response from
the L LMS then becomes a necessity
retrieval augmented generation.
Also known as Reg is a process of optimizing the output
of a large language model.
It does this so that and it does this by referencing an authoritative knowledge
base that is outside of its training data sources before generating a
response and returning that to the user.
LMS are trained on vast volumes of data, and they use billions of parameters to
generate original output for tasks, like answering questions like translating
languages or completing a sentence.
RE extends this already powerful capability of LLMs to specific domains
or to specific organizations, internal knowledge base, such as what courses are
being taken by staff at this organization.
It does all of that without the need to retrain the LLM.
It's a very cost effective approach to improving LLM output
so that it remains relevant.
Accurate and useful in various contexts.
So you might feel that, okay, this is too much of a task.
How do I now, in addition to the LLMs that I already have, how do I then
bring this authoritative information or data close to the L LMS so
that I get a customized responses?
So you could be thinking, how do I manage the multiple data sources?
How do I create vector embeddings for large volumes of data?
How do I do incremental updates to vector stores, the coding efforts
that might be involved, the scaling of this retrieval mechanism and
the orchestration of all of this.
So on AWS we have Amazon Bedrock Knowledge Bases, and this feature gives foundation
models and agents contextual information from your private data sources.
For Reg, with or on Amazon Bedrock knowledge bases, you get fully managed
support for end-to-end workflows.
You get to securely connect your foundation models.
And your agents to data sources.
You get to easily retrieve relevant data and augment prompts, and you
get to provide source attribution.
There is, we saw earlier on with the previous slide, the text generation
workflow portion of engaging with retrieval augmented generation.
So this, you get to do an Amazon bedrock.
This is where you augment the prompt before you get the response
that gets sent back to the user.
This is after you've received the input from the user.
And there's also the API, the retrieve API that is used to
enrich or get more context.
That also gets to be used to enrich the prompt or augment the prompt.
All of this is happening within Amazon Bedrock knowledge basis.
And then ultimately there's another API retrieve.
And generate API that would then do the retrieval of the response that then gets
sent back to the user in the front end.
So you can get started with Amazon Bedrock you can scan the QR code that you see.
And if you wanna find out more or learn more about large language models, there's
another QR code that you can scan.
And there's also training that I've seen from one of my colleagues on
multimodal re and embeddings with Amazon Nova and Bedrock on AWS.
You can consult that course to learn more and dive deeper into the
topic that I introduced us to today.
And with that, I thank you very much and thank you for.
Coming to this session or for watching this session, and I look forward to
hearing from you on LinkedIn about the feedback that you have regarding this
topic that we just went through today.
Thank you very much and enjoy the rest of the conference.
Bye-bye.