Fine-tuning LLMs: A Cost-Benefit Analysis for Businesses

Video size:

Abstract

As businesses scramble to integrate Large Language Models (LLMs) into their workflows, a pertinent question arises: Is fine-tuning worth the investment? Fine-tuning offers domain-specific precision and improved user satisfaction, but requires tremendous upfront data, computational power, and skilled personnel. This presentation analyzes the economics of fine-tuning’s intricacies, when it is a good investment, when retrieval-augmented generation (RAG) is a more suitable option, and how companies can balance their need for custom solutions against cost considerations.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hey everyone, so I am moder, and this is gonna be my short talk on go manifest analysis of for businesses when it comes to fine tuning LLMs or deciding between fine tuning and other techniques out there in order to get to the final result that the use case. Company wants to solve. So like the most, like in every use case, like the central question that every business operations or stakeholder needs to answer is is it possible for us to use just pre-trained models that are out there from open ai, from cloud? The reason being is because they are out of the box. There is no need to do any fine tuning there. They are fast, they are affordable. And the next most important question is, for example, like in our use case. What is our use case? And in our use case, is there a need? A need for any data that is not public, that is private? So you want L them to go over that data, understand patterns in that data in order to answer some really important question or was or call some really important function or tool if you are building an agent. AI and then based on that, do something. So that's where if you have your own private data do is it required or not? That's a really important question that every stakeholder needs to answer. And then based on this answer, let's say you need, you do need your data in the entire use case you wanna resolve. Then the question is do you. Fine tune your model on that data, or do you just every time you wanna answer some question, you go and you retrieve the most important bits and pieces of your private data, and then you just see, add that in the model without fine tuning it. So these are two different, I would say techniques out there where you can use your private data. So I feel like to answer this question on what do you choose? Do we choose fine tuning or do you choose rack where you retrieve the most important bits of data and then use that as context in the LLM? I think like it's really important too. First of all, understand what is fine tuning and what the fundamentals are in order and before we can easily answer this question. So the fine tuning is all about adjusting the weights of the model, with the new weights. And the new weights will have knowledge of that private data as well. fine tuning also means, for example, you have to modify how the models behave. You have the models need to unlearn something or relearn something because you wanna, you want model to like, just focus on your data and then forget about the past. And the way I learn LS or any deep learning model models work is as you train and keep on training. The latest training always has a precedent over the old one. So if you take a LLM or large language model that's pre-trained on some natural language data, let's say if I just find, tune that on medical data that LLM will eventually like. Learn, be more focused or learn more stuff about medical data. and in order to do that, it'll also unlearn some of the stuff that it learned before. So fine tuning in a sense, will update your model. It waits. And there are two things that, and then there, and then within the fine tuning domain, there are two ways that you can fine tune. Number one is like you, you update all the weights, every layer of your model, which is really expensive by the way, because, let's say. The models that we have right now out there. So one way to fine tune is like find, just updating all the weights in, all the weights right now that we have in our model, which is really expensive by the way, because as I said before, large models, they are really big. They have one 50 billion parameters wide. So let's say if you wanna go and. Update or the entire model. You need a really big hardware set up, which is really expensive by the way. And the other way of fine tuning, which is also provided out of the works by Open AI cloud and other companies is just Fine tuning a few layers towards the end. It's called, meter efficient fine tuning because like you are only, and then here you only fine tuning the last few layers. The advantage being is like the last few layers will be able to retain the knowledge that. Or learn from your own private data. Whereas all the other layers in the model, which are already trained on other aspects of language, for example, grammar and other stuff, and then emotions. So all that knowledge you still need, right? So that's the overall idea of just fine tuning a few layers and normally. That's much cheaper, that's much faster. You can, if you have a small amount of your data, you can easily fine tune it and then the cost won't be super high at the time. Requirement is also pretty low as well. And then you can move forward with that. So if I just recap everything right now on Prompt engineering and then, fine tuning and rack, and then from scratch, as well. So if you are fine tuning the entire model, then it's same as like building the model from scratch. It's not same as, but in the end, for example, like you, since you are. It's like build, building the model from scratch because like you have to update so many different layers and weight in the entire model. and then you will end up having your own cluster model, which you will have a full control over, but in the end, like it's gonna be really expensive to train it from scratch and then maintain that on the server. From scratching, train from scratch is really important pathway. If you have terabytes or petabytes of large amount of data and then you know that, you won't be able to fine tune, get the results by just fine tuning the last few layers you need to like. Train majority of your entire model. And then in the end, the other important consideration here is that let's say you want to keep your model private as well, so that, so from scratch or taking the open source model and then retraining it on your data makes a lot of sense. Let's say if you have a very small amount of data and then you are not concerned about privacy, or if you are even concerned about privacy based on the privacy factor, you will decide to either use an open source model or you will use, the closed source one. For example, open AI cloud. And then when you once, and then once you decide the close and open source model, since you have small amount of data. Then fine tuning the last few layers makes a lot of sense. And then as a result, you will see that, for example, the model will be better able to answer questions in the domain of data that you have given. Let's say if you have given a really specialized data on health tech, for example, then you can expect a fine tuned model to answer. Health tech questions better than the model. That's not fine tuning. and then for example, why would you not use fine tuning, let's say your data is evolving a lot. Just to give one more example on when you should use rag. Or fine tuning is, let's say like you have a an LLM or an AI agent, which buys and sells stock based on the news. Since news is evolving and everything is evolving, it makes no sense just to fine tune the LLM on all the news. It makes more sense to just retrieve the latest news. In seconds, and then use that as the context for the LLM to answer. Really important question about whether to execute a trade or not. So in this case, you need up to date information and fine tuning is not a bit feasible because you need to execute the results really fast. And then the data over here is also changing over time as well. So we can talk about more about RAG as well. There are multiple types of rags, which are, which work in different use cases. And this is something that can be part two of this to in the future, as well. and then the last point over here is like prompt engineering, which is pretty much like. that's all about, for example, adding some context in the prompt where you think you only have a few small contexts that you want LLM to take into account when answering a question or doing some task. And that context is pretty small, then it makes sense. Just add that context in pretty much like you are on. System prompts. So in short, like you add your small content and system prom, if you have data that's changing a lot, then makes sense to retrieve that data through rag. If you have a really, if you have a decent amount of data. On some domain and then you wanna fine tune LM to, to work on that domain, then fine tuning makes more sense. And then for example, like if you have large amount of data, then going from going for building your LM from scratch makes more sense. and then of course, for example, every method that we discussed just now has different costs and everything. For sure. If you are. Training from scratch, that's gonna be a really big hassle for sure. Fine tuning is a bit cheaper and fine tuning. The most important thing that you need is the compute resource you can easily have, right? There are so many different, even if you are using open ai, they, they provide you with a way to fine tune open AI models. You don't need to. To care about compute resources over here for sure. They will charge you more per token for the model, which is fine tuned versus the one is not fine tuned. But there's, but in the end, like the compute resource won't be too much. if you wanna op, fine tune an open source model, then for sure you have to think about compute resources. The most important cause. Will be in data preparation because like there, if there's a certain format that data needs to have, and then let's say if Q data is spread around, it's in bit pieces. So compiling data collection, preparation is something that takes so much of the time and that is where you need also manpower and that is where you need. An investment into engineering as well. And then for example, it, so fine tuning is not just a one time thing. It's quite possible your data is changing every three months or every four months, or every six months depending on that. You have to also keep on fine tuning as well, so you need to have a schedule for fine tuning your LLMs. And speaking about the hidden cost here, the most important thing is like the delayed implementation because There, data preparation involved, and then there's also fine tuning involved, which can take a few days depending on the size of data. And then, for example, the most important thing is, for example, maintaining that custom model as well. So yeah, that all adds up. But. If your use case is a use case where you want your model to be an expert in a really niche domain, and then that, and then you have your own private data as well on that domain than fine tuning is a way to go, and then the benefits will way more than the cost and everything as well. as I said before, the cost reduction, pretty much like what I see everyone doing, I haven't, people don't retrain or fine tune the entire LLM because, just because that's super hard. In fact, they always go for an approach where they only fine tune. One part of the model or last few layers of the model, and then as a result, we have seen that's much quicker. You don't need as much hardware as you, you required before. Number three, your knowledge of your custom data will be so model will be better able to learn that knowledge from custom data and then that knowledge will be. Stored in a few layers in the model, which is gonna be, which is even better, versus spreading your knowledge about your custom data around 65 billion parameter model. That's pretty big. And then we have, I would say, a couple of. Tuning things, for example, as well, fine tuning techniques right now out there. So yeah, like a last few layers is a way to go. Always. And then moving on to was like the ROI. The ROI pretty much the entire ROI equation depends on multiple things. For example, how periodically you not wanna retrain everything after three months or four months. What's the inference cost quite possible that if you are using LLM, the inference cause of a fine tune model is way higher. that's. Pretty important. And then the initial fine tuning where you have to prepare data is really important as well. So all these things, factors, once again, will go into your equation to find your ROI. What's the cause, what's the advantage out there as well? in the end, on average, I would say. it's 20% or 15%. The cost is higher for a fine tune model than the other way around. So if your advantages or the benefits you will get out of is more than that, then it's totally makes sense. and then speaking about the benefits, they are quite a lot. for example, if you wanna, if you have a, if you wanna have LLM, be an expert in a domain, then fine tuning on that domain will give you 15 to 30% more improvement in the accuracy of results compared to. Out of the box LLMs from open AI or from cloud and from other vendors out there. The next most important thing is, for example, as the model has more knowledge about that domain and that knowledge is fed into. A few layers of the model where the model can go and fetch that information. We also see around 50, 60% less hallucination. So in the use case where you need higher accuracy, fine tuning is a way to go here as well. Yeah. moving on. For example, like since you have spent fine tuning, You don't have to pass as much context in your input or system token. So as a result, you can also, there are cases where you can expect to have less inference calls just because like you need to use less tokens and then. If you, if reasoning is really important for you since you have better, more context, and so reasoning that you will get out of LLM with the output will be also much better than before as well. And then the other benefits that are. Extremely important to consider here are like, if you have your own fine tuned LLM on your own prior, proprietary data, that also gives you an advantage because that's your own ip. that's the own unique model and expert in some domain that you own as well. This is something that. Will surely go on to your asset books of the company, something that you can easily think about leveraging or selling or. Or even like licensing in the future as well. That's also gives you a really huge advantage over, over everyone. And then for example, if you are fine tuning on the open source model, it's quite possible that the fine tune model on your domain, on the open Source one will perform much better than any out of the books top models out there. From top printers and then in that case, like you will end up owning everything that owning the entire LLM or really big thing, which is an expert in that domain. and then for example, like if you're going within an open source route as well, there's also the advantage of. Enhance security means your sensitive data. Your LLM is gonna be trained or fine tuned on who will stay within your system. And it's, and then that's a much secure way of doing things. That's really important if you are in a healthcare space or, or a government organization that. Totally cares about your data as well. And then, or if you are in the industry where data is very important, you can't share. So for compliance reasons, you have to go with the open source one as well there. Uh, and then I've also compiled here different metrics or different that people reported. For example, like how fine tuning were, was able to help them get better results in different domain. For example, yeahs are trained over wide array of data, of language data. Legal data is a bit different. You have a different lingo. like a sim. A person who hasn't gone through a law school will surely have. Have a problem reading all really big legal documents, right? That is what, and that is also the case with out of the books at the lamps. They're really good, but they do struggle sometimes. That is where, like the legal profession has found that if you fine tune that, these models on the legal data, you do get better results. The same also goes for the healthcare. it's much better also the financial services. For example, if you have your own priority data, then you wanna train the model on, that's much better as well. Maybe you have a model to execute some trades as before, and then in the, in that model, that execute train. One really important input is like what's happening in the entire world based on news and all that stuff. And that's where can easily come in. And then the fine tuned model will be able to pick up patterns from the world much better as well. And then in manufacturing, fine tuning is super important here if you are. And then at the same time, All the AI agents provide us in customer service out there. They have also found that, for example, fine tuning LLMs to better understand a business. For a client makes the LLM perform better and gives them the comparative advantage or edge over everyone, other vendors out there because here you are building your own mini LLM that understands your business and then if you are providing your customers with that, a really personalized for customer support. They surely will get better results than they will get from other AI customer support vendors out there. And so because you have better results, which are the most important, metric out there. what are some different technical considerations? If you are a tech person, going over this fine tuning rack and other stuff, so the most important thing is as we discussed before, is open source of closed source. It's data like, do you wanna train the last few layers or the entire model? That's really important to consider as well as I discussed before. And then if you're going for the rack architecture, given that you need that data in real time, that's revolving as well. they are some of the most important techniques I've found so far are here, there is a different concepts that you wanna understand. This is something that can be, Some other talk where I go into detail on different rag techniques out there. So lastly, on a really high level comparison between RAG and fine tuning, the cause, for fine tuning is fix. Whereas for rag, every time you call an LLM. You might wanna call a rack to fetch relevant information from your databases, or you have a real time course for sure. So initial course is really high with the fine tuning. With rack, it's much lower With fine tuning you alway, you always have to retrain or refin tune after the interval once you get more and more data. And then for rack, you just have to update your knowledge basis every time the LLM needs to extract some information from knowledge base, LM can extract. The real time up to date information as well for sure. For example, in the RAG pipeline, like you have to call rag, you have to pass that to LLM, and then when you call LLM, you also have to add rag data tokens into LLM, so you have more token usage, so you have a really higher. Latency as well compared to fine tune, reasoning, it's better or even same with the bot techniques as well. in terms of infrastructure, you need a different infrastructure in. In both cases, if you are using OP out of the box, LLM providers, there's no infrastructure required because you, they just want you to provide them with data in some certain format, and then once you upload data in open ai, it just does the fine tuning for you. There's no need to build or maintain anything at all on your side. On the other hand for the rack, you do need to build your vector database to store your embeddings. And pretty much there is some investment required data as well. I think we already recovered data sensitivity as well before. So in, in summary, you'll choose fine tuning when the knowledge is not changing rapidly. You have a really high volume of Curie, right? And then it makes sense to find, tune it versus using rack because like you have a really high cost we have. And then, you will use fine tuning more. when you have really lar large amount of domain knowledge that is required by the LLM to answer a question, right? That they're, and then in the end, for example, like you need really fast results, like you need really high accuracy reserve, that's where you will for surely go for fine tuning as well. And then you will choose rag, Where the data is evolving by a lot and you need to extract relevant data in real that's evolving in real time. You also use rag, for example, if you wanna do citations, right? Fine tuning LL M1, know which document had that information that LM learned? But with rag, since you are adding extra knowledge required by LLM. In the context, LLM will, you can easily cite as well. If you need citations, then RAG is a way to go and you will use rag. If you have only small amount of data, that fine tuning won't make any sense. You like, hey, I'll just add that data in the context as well. and then once again, like you don't wanna spend so much time. Initially fine tuning. There's no budget. That's also where fine tuning is where to go as well. I do also have a few right now case studies that I have mentioned here. I will be adding the link of this artifact as well, so people can easily learn from it. So once again, key takeaways, like we have different techniques out there. There's fine tuning. There's fine tuning. Only a few layers. There is rack. So depending on different scenarios, every technique is really important, right? So we learn that fine tuning can be something of an investment where you will end up building your own prior prietary. Model that works really well in, in the domain you operate in. This is something that you can license in the end, in the future or sell that's an asset that belongs to you. we also learned that, they are the cost. The direct course of fine tuning is the initial course that you have to spend money on where you need data preparation, and then we also learn that there is also the indirect course as well as associated with that, and then fine tuning can be done faster if you only like fine tune a small part of the model. Which would also means that your knowledge will be better learned by the model as well, because it's all confined to that small part. And then there is also a possibility that I. Your use case will need both rack and both fine tuning. It's quite possible that there are two types of information there is fixing. There's evolving. What's in evolving information can become part of the rack, what's fixed and what's really in high large quantity data. That can easily become your fine tuning can be used in fine tuning as well. so customer support. All the last tickets. That will resolve, pretty much can be used for fine tuning LLM to tell LM, Hey, that's how you resolve a customer support curie. And then the information about policies refund what's available in our data or what's available in stock. This is. In the company. This is the evolving information, and this can be part of the rag, as well. So there can be a use case where you need both of these things, depending on the context. yep. So this is on a really high level, what this is all about. why use LLM for, why use fine tuning and why is that important for every business? Say tune for more talks in the future. Bye.

See all 40 talks at this event!

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

Fine-tuning LLMs: A Cost-Benefit Analysis for Businesses

Video size:

Abstract

Summary

Transcript

Muddassar Sharif

Co-founder & CTO @ Virtuans.ai

Join the community!

Featured event

2026

2025

Info

Conf42 Large Language Models (LLMs) 2025 - Online

March 20 2025 - premiere 5PM GMT

Fine-tuning LLMs: A Cost-Benefit Analysis for Businesses

Video size:

Abstract

Summary

Transcript

Muddassar Sharif

Co-founder & CTO @ Virtuans.ai

Join the community!