Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hey everyone, so I am moder, and this is gonna be my short talk on go
manifest analysis of for businesses when it comes to fine tuning LLMs or
deciding between fine tuning and other techniques out there in order to get
to the final result that the use case.
Company wants to solve.
So like the most, like in every use case, like the central question that
every business operations or stakeholder needs to answer is is it possible for
us to use just pre-trained models that are out there from open ai, from cloud?
The reason being is because they are out of the box.
There is no need to do any fine tuning there.
They are fast, they are affordable.
And the next most important question is, for example, like in our use case.
What is our use case?
And in our use case, is there a need?
A need for any data that is not public, that is private?
So you want L them to go over that data, understand patterns in that
data in order to answer some really important question or was or call
some really important function or tool if you are building an agent.
AI and then based on that, do something.
So that's where if you have your own private data do is it required or not?
That's a really important question that every stakeholder needs to answer.
And then based on this answer, let's say you need, you do need your data in
the entire use case you wanna resolve.
Then the question is do you.
Fine tune your model on that data, or do you just every time you wanna answer some
question, you go and you retrieve the most important bits and pieces of your
private data, and then you just see, add that in the model without fine tuning it.
So these are two different, I would say techniques out there where
you can use your private data.
So I feel like to answer this question on what do you choose?
Do we choose fine tuning or do you choose rack where you retrieve the
most important bits of data and then use that as context in the LLM?
I think like it's really important too.
First of all, understand what is fine tuning and what the fundamentals
are in order and before we can easily answer this question.
So the fine tuning is all about adjusting the weights of the
model, with the new weights.
And the new weights will have knowledge of that private data as well.
fine tuning also means, for example, you have to modify how the models behave.
You have the models need to unlearn something or relearn something
because you wanna, you want model to like, just focus on your data
and then forget about the past.
And the way I learn LS or any deep learning model models work is as
you train and keep on training.
The latest training always has a precedent over the old one.
So if you take a LLM or large language model that's pre-trained on some
natural language data, let's say if I just find, tune that on medical
data that LLM will eventually like.
Learn, be more focused or learn more stuff about medical data.
and in order to do that, it'll also unlearn some of the
stuff that it learned before.
So fine tuning in a sense, will update your model.
It waits.
And there are two things that, and then there, and then within
the fine tuning domain, there are two ways that you can fine tune.
Number one is like you, you update all the weights, every layer of your
model, which is really expensive by the way, because, let's say.
The models that we have right now out there.
So one way to fine tune is like find, just updating all the weights in, all
the weights right now that we have in our model, which is really expensive
by the way, because as I said before, large models, they are really big.
They have one 50 billion parameters wide.
So let's say if you wanna go and.
Update or the entire model.
You need a really big hardware set up, which is really expensive by the way.
And the other way of fine tuning, which is also provided out of the works by
Open AI cloud and other companies is just Fine tuning a few layers towards the end.
It's called, meter efficient fine tuning because like you are only, and then here
you only fine tuning the last few layers.
The advantage being is like the last few layers will be able
to retain the knowledge that.
Or learn from your own private data.
Whereas all the other layers in the model, which are already trained on other
aspects of language, for example, grammar and other stuff, and then emotions.
So all that knowledge you still need, right?
So that's the overall idea of just fine tuning a few layers and normally.
That's much cheaper, that's much faster.
You can, if you have a small amount of your data, you can easily
fine tune it and then the cost won't be super high at the time.
Requirement is also pretty low as well.
And then you can move forward with that.
So if I just recap everything right now on Prompt engineering and then, fine tuning
and rack, and then from scratch, as well.
So if you are fine tuning the entire model, then it's same as like
building the model from scratch.
It's not same as, but in the end, for example, like you, since you are.
It's like build, building the model from scratch because like you have
to update so many different layers and weight in the entire model.
and then you will end up having your own cluster model, which you will
have a full control over, but in the end, like it's gonna be really
expensive to train it from scratch and then maintain that on the server.
From scratching, train from scratch is really important pathway.
If you have terabytes or petabytes of large amount of data and then you know
that, you won't be able to fine tune, get the results by just fine tuning
the last few layers you need to like.
Train majority of your entire model.
And then in the end, the other important consideration here is that let's say you
want to keep your model private as well, so that, so from scratch or taking the
open source model and then retraining it on your data makes a lot of sense.
Let's say if you have a very small amount of data and then you are not
concerned about privacy, or if you are even concerned about privacy based on
the privacy factor, you will decide to either use an open source model or
you will use, the closed source one.
For example, open AI cloud.
And then when you once, and then once you decide the close and open source model,
since you have small amount of data.
Then fine tuning the last few layers makes a lot of sense.
And then as a result, you will see that, for example, the model will be
better able to answer questions in the domain of data that you have given.
Let's say if you have given a really specialized data on health tech,
for example, then you can expect a fine tuned model to answer.
Health tech questions better than the model.
That's not fine tuning.
and then for example, why would you not use fine tuning, let's
say your data is evolving a lot.
Just to give one more example on when you should use rag.
Or fine tuning is, let's say like you have a an LLM or an AI agent, which
buys and sells stock based on the news.
Since news is evolving and everything is evolving, it makes no sense just
to fine tune the LLM on all the news.
It makes more sense to just retrieve the latest news.
In seconds, and then use that as the context for the LLM to answer.
Really important question about whether to execute a trade or not.
So in this case, you need up to date information and fine tuning is not
a bit feasible because you need to execute the results really fast.
And then the data over here is also changing over time as well.
So we can talk about more about RAG as well.
There are multiple types of rags, which are, which work in different use cases.
And this is something that can be part two of this to in the future, as well.
and then the last point over here is like prompt engineering,
which is pretty much like.
that's all about, for example, adding some context in the prompt where you think
you only have a few small contexts that you want LLM to take into account when
answering a question or doing some task.
And that context is pretty small, then it makes sense.
Just add that context in pretty much like you are on.
System prompts.
So in short, like you add your small content and system prom, if you have
data that's changing a lot, then makes sense to retrieve that data through rag.
If you have a really, if you have a decent amount of data.
On some domain and then you wanna fine tune LM to, to work on that domain,
then fine tuning makes more sense.
And then for example, like if you have large amount of data, then
going from going for building your LM from scratch makes more sense.
and then of course, for example, every method that we discussed just now
has different costs and everything.
For sure.
If you are.
Training from scratch, that's gonna be a really big hassle for sure.
Fine tuning is a bit cheaper and fine tuning.
The most important thing that you need is the compute resource
you can easily have, right?
There are so many different, even if you are using open ai, they, they provide you
with a way to fine tune open AI models.
You don't need to.
To care about compute resources over here for sure.
They will charge you more per token for the model, which is fine tuned
versus the one is not fine tuned.
But there's, but in the end, like the compute resource won't be too much.
if you wanna op, fine tune an open source model, then for sure you have
to think about compute resources.
The most important cause.
Will be in data preparation because like there, if there's a certain
format that data needs to have, and then let's say if Q data is
spread around, it's in bit pieces.
So compiling data collection, preparation is something that takes so much of the
time and that is where you need also manpower and that is where you need.
An investment into engineering as well.
And then for example, it, so fine tuning is not just a one time thing.
It's quite possible your data is changing every three months or every four months,
or every six months depending on that.
You have to also keep on fine tuning as well, so you need to have a
schedule for fine tuning your LLMs.
And speaking about the hidden cost here, the most important thing is like the
delayed implementation because There, data preparation involved, and then there's
also fine tuning involved, which can take a few days depending on the size of data.
And then, for example, the most important thing is, for example,
maintaining that custom model as well.
So yeah, that all adds up.
But.
If your use case is a use case where you want your model to be an expert in a
really niche domain, and then that, and then you have your own private data as
well on that domain than fine tuning is a way to go, and then the benefits will way
more than the cost and everything as well.
as I said before, the cost reduction, pretty much like what I see everyone
doing, I haven't, people don't retrain or fine tune the entire LLM because,
just because that's super hard.
In fact, they always go for an approach where they only fine tune.
One part of the model or last few layers of the model, and then as a result,
we have seen that's much quicker.
You don't need as much hardware as you, you required before.
Number three, your knowledge of your custom data will be so model
will be better able to learn that knowledge from custom data
and then that knowledge will be.
Stored in a few layers in the model, which is gonna be, which is even better, versus
spreading your knowledge about your custom data around 65 billion parameter model.
That's pretty big.
And then we have, I would say, a couple of.
Tuning things, for example, as well, fine tuning techniques right now out there.
So yeah, like a last few layers is a way to go.
Always.
And then moving on to was like the ROI.
The ROI pretty much the entire ROI equation depends on multiple things.
For example, how periodically you not wanna retrain everything
after three months or four months.
What's the inference cost quite possible that if you are using LLM, the inference
cause of a fine tune model is way higher.
that's.
Pretty important.
And then the initial fine tuning where you have to prepare data
is really important as well.
So all these things, factors, once again, will go into your
equation to find your ROI.
What's the cause, what's the advantage out there as well?
in the end, on average, I would say.
it's 20% or 15%.
The cost is higher for a fine tune model than the other way around.
So if your advantages or the benefits you will get out of is more than
that, then it's totally makes sense.
and then speaking about the benefits, they are quite a lot.
for example, if you wanna, if you have a, if you wanna have LLM, be an expert in a
domain, then fine tuning on that domain will give you 15 to 30% more improvement
in the accuracy of results compared to.
Out of the box LLMs from open AI or from cloud and from other vendors out there.
The next most important thing is, for example, as the model has
more knowledge about that domain and that knowledge is fed into.
A few layers of the model where the model can go and fetch that information.
We also see around 50, 60% less hallucination.
So in the use case where you need higher accuracy, fine tuning
is a way to go here as well.
Yeah.
moving on.
For example, like since you have spent fine tuning, You don't
have to pass as much context in your input or system token.
So as a result, you can also, there are cases where you can expect to have
less inference calls just because like you need to use less tokens and then.
If you, if reasoning is really important for you since you have better, more
context, and so reasoning that you will get out of LLM with the output will be
also much better than before as well.
And then the other benefits that are.
Extremely important to consider here are like, if you have your own
fine tuned LLM on your own prior, proprietary data, that also gives you
an advantage because that's your own ip.
that's the own unique model and expert in some domain that you own as well.
This is something that.
Will surely go on to your asset books of the company, something that you can easily
think about leveraging or selling or.
Or even like licensing in the future as well.
That's also gives you a really huge advantage over, over everyone.
And then for example, if you are fine tuning on the open source model, it's
quite possible that the fine tune model on your domain, on the open Source
one will perform much better than any out of the books top models out there.
From top printers and then in that case, like you will end up owning everything
that owning the entire LLM or really big thing, which is an expert in that domain.
and then for example, like if you're going within an open source route as
well, there's also the advantage of.
Enhance security means your sensitive data.
Your LLM is gonna be trained or fine tuned on who will stay within your system.
And it's, and then that's a much secure way of doing things.
That's really important if you are in a healthcare space or, or
a government organization that.
Totally cares about your data as well.
And then, or if you are in the industry where data is very
important, you can't share.
So for compliance reasons, you have to go with the open source one as well there.
Uh, and then I've also compiled here different metrics or
different that people reported.
For example, like how fine tuning were, was able to help them get
better results in different domain.
For example, yeahs are trained over wide array of data, of language data.
Legal data is a bit different.
You have a different lingo.
like a sim.
A person who hasn't gone through a law school will surely have.
Have a problem reading all really big legal documents, right?
That is what, and that is also the case with out of the books at the lamps.
They're really good, but they do struggle sometimes.
That is where, like the legal profession has found that if you fine
tune that, these models on the legal data, you do get better results.
The same also goes for the healthcare.
it's much better also the financial services.
For example, if you have your own priority data, then you wanna train the
model on, that's much better as well.
Maybe you have a model to execute some trades as before, and then in the,
in that model, that execute train.
One really important input is like what's happening in the entire world
based on news and all that stuff.
And that's where can easily come in.
And then the fine tuned model will be able to pick up patterns from
the world much better as well.
And then in manufacturing, fine tuning is super important here if you are.
And then at the same time, All the AI agents provide us in
customer service out there.
They have also found that, for example, fine tuning LLMs to
better understand a business.
For a client makes the LLM perform better and gives them the comparative advantage
or edge over everyone, other vendors out there because here you are building
your own mini LLM that understands your business and then if you are providing
your customers with that, a really personalized for customer support.
They surely will get better results than they will get from other AI
customer support vendors out there.
And so because you have better results, which are the most
important, metric out there.
what are some different technical considerations?
If you are a tech person, going over this fine tuning rack and other stuff, so the
most important thing is as we discussed before, is open source of closed source.
It's data like, do you wanna train the last few layers or the entire model?
That's really important to consider as well as I discussed before.
And then if you're going for the rack architecture, given
that you need that data in real time, that's revolving as well.
they are some of the most important techniques I've found so far
are here, there is a different concepts that you wanna understand.
This is something that can be, Some other talk where I go into detail on
different rag techniques out there.
So lastly, on a really high level comparison between RAG and fine tuning,
the cause, for fine tuning is fix.
Whereas for rag, every time you call an LLM.
You might wanna call a rack to fetch relevant information from your databases,
or you have a real time course for sure.
So initial course is really high with the fine tuning.
With rack, it's much lower With fine tuning you alway, you always have to
retrain or refin tune after the interval once you get more and more data.
And then for rack, you just have to update your knowledge basis every time
the LLM needs to extract some information from knowledge base, LM can extract.
The real time up to date information as well for sure.
For example, in the RAG pipeline, like you have to call rag, you have to pass
that to LLM, and then when you call LLM, you also have to add rag data
tokens into LLM, so you have more token usage, so you have a really higher.
Latency as well compared to fine tune, reasoning, it's better or even
same with the bot techniques as well.
in terms of infrastructure, you need a different infrastructure in.
In both cases, if you are using OP out of the box, LLM providers, there's no
infrastructure required because you, they just want you to provide them
with data in some certain format, and then once you upload data in open ai,
it just does the fine tuning for you.
There's no need to build or maintain anything at all on your side.
On the other hand for the rack, you do need to build your vector
database to store your embeddings.
And pretty much there is some investment required data as well.
I think we already recovered data sensitivity as well before.
So in, in summary, you'll choose fine tuning when the knowledge
is not changing rapidly.
You have a really high volume of Curie, right?
And then it makes sense to find, tune it versus using rack because like
you have a really high cost we have.
And then, you will use fine tuning more.
when you have really lar large amount of domain knowledge that is required
by the LLM to answer a question, right?
That they're, and then in the end, for example, like you need really fast
results, like you need really high accuracy reserve, that's where you will
for surely go for fine tuning as well.
And then you will choose rag, Where the data is evolving by a lot and
you need to extract relevant data in real that's evolving in real time.
You also use rag, for example, if you wanna do citations, right?
Fine tuning LL M1, know which document had that information that LM learned?
But with rag, since you are adding extra knowledge required by LLM.
In the context, LLM will, you can easily cite as well.
If you need citations, then RAG is a way to go and you will use rag.
If you have only small amount of data, that fine tuning won't make any sense.
You like, hey, I'll just add that data in the context as well.
and then once again, like you don't wanna spend so much time.
Initially fine tuning.
There's no budget.
That's also where fine tuning is where to go as well.
I do also have a few right now case studies that I have mentioned here.
I will be adding the link of this artifact as well, so
people can easily learn from it.
So once again, key takeaways, like we have different techniques out there.
There's fine tuning.
There's fine tuning.
Only a few layers.
There is rack.
So depending on different scenarios, every technique is really important, right?
So we learn that fine tuning can be something of an investment where you will
end up building your own prior prietary.
Model that works really well in, in the domain you operate in.
This is something that you can license in the end, in the future or sell
that's an asset that belongs to you.
we also learned that, they are the cost.
The direct course of fine tuning is the initial course that you have to spend
money on where you need data preparation, and then we also learn that there is
also the indirect course as well as associated with that, and then fine
tuning can be done faster if you only like fine tune a small part of the model.
Which would also means that your knowledge will be better learned
by the model as well, because it's all confined to that small part.
And then there is also a possibility that I. Your use case will need
both rack and both fine tuning.
It's quite possible that there are two types of information there is fixing.
There's evolving.
What's in evolving information can become part of the rack, what's fixed and what's
really in high large quantity data.
That can easily become your fine tuning can be used in fine tuning as well.
so customer support.
All the last tickets.
That will resolve, pretty much can be used for fine tuning LLM
to tell LM, Hey, that's how you resolve a customer support curie.
And then the information about policies refund what's available in
our data or what's available in stock.
This is.
In the company.
This is the evolving information, and this can be part of the rag, as well.
So there can be a use case where you need both of these
things, depending on the context.
yep.
So this is on a really high level, what this is all about.
why use LLM for, why use fine tuning and why is that important for every business?
Say tune for more talks in the future.
Bye.