Maximizing Efficiency with AI Tools as Catalysts

Video size:

Abstract

Learn how to use Large Language Models to optimise workflows by automating repetitive tasks, boosting productivity. This talk illustrates suitable applications and example architectures to empower teams’ self-service.

Summary

Don't over commit. Build out a small suite of working examples for others to use. Different use cases are going to benefit from different learning models. Empower teams to self service.
The main beneficiaries of all of these LLM assisted tools are going to be the less technical folks. Make sure it's easy for teams to do the right thing. The high performers are already using these tools. It's in your interest to make sure that they can utilize these tools in ways that won't cause your business harm.
There are tools like Jan AI or Olama. Your teams can interact with these tools all locally and can even process the data locally. These local only tools are slower than running on dedicated better hardware or using a remote API. It enables folks to experiment and see whether this style of tool will actually work for them.

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Here's the too long didn't listen list the whole point of this talk is not to tell you exactly what to do, it's to show you system patterns I've seen work well and to give you ideas. The key takeaways are going to be don't over commit. Things are improving rapidly. Build out a small suite of working examples for others to use. Empower teams to self service. Ensure that it's easy for teams to do the right thing and that there are limitations to all of this. So don't over commit. Things are improving rapidly between starting this script and recording it. The best model changed three times and I didn't even know about the local only architecture for non technical folks. I'll be describing that later. What this means is it's better to focus on the style of problem you want to solve and the building blocks, rather than over committing to any one vendor or technology. Different use cases are going to benefit from different learning models. Don't waste opportunities by enforcing that your teams have to use a single LLM model. They will need different tools for different purposes, and that's fine. So build out a small suite of working examples. Not everyone will understand how LLMs can be used to make them productive, or what styles of problems are even suitable for use with large language models or other techniques. This can be mitigated by having a library of real examples or your business, which teams can interact with and then emulate or improve upon for their specific challenges. Also, a library of non examples, as in this would be a documentation piece. A library of non examples or forbidden uses will also help. If they're data sources which are prohibited from using by law or contract, list them out. Make it clear so that teams don't accidentally misuse any of this in par teams to self service. So another architecture I'm going to be talking about is retrieval augmented generation. If you make it possible for teams that add their own Personas, their own data, and add this to centralized service, then this is going to reduce the barriers for teams to try out using these tools. After all, you want your teams to be spending the time improving how they work, what data they have, rather than reinventing all of this middleware. The main beneficiaries of all of these LLM assisted tools are going to be the less technical folks who are unlikely to be able to code the machinery that makes all of these tools work together. They can certainly provide the raw data that makes it work. So, for example, team support teams will have user manuals, rum books, example tickets, all of these are really useful context to make off the shelf tools work so much better to make them really shine. These teams should be able to get the benefits without having to learn how to program as well. Also make sure it's easy for teams to do the right thing. The high performers are already using these tools. Regardless of what policies you've put in place, it's in your interest to make sure that they can utilize these tools in ways that won't cause your business harm one way or the other. A great way of doing this is by having garden paths. So have approved tools paths for teams to get it to use, things that they actually want to use with data that they can use with them. Just saying no isn't going to stop folks. Now the bit you've all been waiting for, the example architectures. So these are the architectures I've seen work well and will help your teams improve their own efficiencies. First up, local processing. There are tools like Jan AI or Olama. There'll be links at the end where you can have a front end and provided your teams have suitable hardware. So say you have macOS, you've got modern MacBooks. These run these tools fairly well. Windows, if they've got dedicated gpu's, say you've got some rendering workstations, they'll work again fairly well. Your teams can interact with these tools all locally and can even process the data locally. You can also run it on top of documents that you provide. Just taking one example, this Jain AI tool. You can actually deceive all any use of remote APIs and can load in LLM models that you folks have decided already approved, and then they can be used for device local processing. So if you have strict data residency requirements, you can just run it on your machine. The data never leaves your machine, the documents never leave your machine. The results of the LLM running doesn't leave the machine, but you can also use this a different way. So say for example, you've set up the next architecture I'll show, or you've already set up a proxy, you've already got some approved tools. You can have these local agents call those tools rather than going off to some external third vendor. Another advantage of these local only tools, they are slower than running on dedicated better hardware or using a remote API where a company will do all that for you. It enables folks to actually experiment and see whether this style of tool will actually work for them, whether it will actually give them any benefit without you having to invest very much in it at all. You just have to download it and run it. The architecture you possibly have been waiting for retrieval augmented generation. Rather than having to spend ten or $100 million training your own AI model, you can use something off the shelf. There are many out there. I'm not going to name names or recommend any specific ones because they'll all be out of date by the time this airs. But this is the architecture. This means that you can use somebody else's model, but make it relevant for your business, your data, whatever it is. So what this architecture looks like is the user through whatever it is. Maybe it's ja, maybe it's some custom front end, doesn't really matter, they put in their query. Heck, it could even be with a chatbot. And that gets sent off to your server. Whatever it is, that server will then take that query and send it over to what's commonly a vector database. This is just doing a search for relevant chunks of documentation which you've already put there. So this could be your internal wikis, this could be run books, this can be whatever it is. This can be many, many many PDF's. And all this stage is doing is sending you back chunks of documentation. That might help. So if the user is asking about how do I do business? Process X for customer Y, this might bring back the runbook for that process. It might bring in some extra information about that customer. So this then gets sent back to your, to your server. And then you will put together the prompt where you tell the language model what to do. You'll put in the query, you'll put in this extra context, you'll send it over to the thing to do text generation and you'll get back your response. So for example, it might be the prompt might be you are a helpful service desk employee. Help the users as much as you can based on only the information that is provided in the context. Then you might list the context. So it would say this is by customer x, this is the relevant process document. And then after that you would put the user's question of like how did I do this process for this customer that gets sent back? And then the person at the start can use it. This architecture can be quite nice because if you have this server in the middle and you've, so let's pretend that you've got some sort of chat system, you could allow teams to add different Personas and the Personas will tell your API server to use a different prompt, to use a different model, to use a different set of data to enrich these queries. Again, teams generally will be able to say, ah, I want to do this kind of thing. Here's some examples. You can work with them to get the prompt and they'll probably be able to go, yeah, here is my big stack of documentation that I think is relevant for this. How exactly the vector database works. To pull out the relevant pieces of those documents is dependent on what system you're using. But just think of it as it goes off, finds relevant information, brings it back, and then adds it all together for the language generation. So now onto the limitations. These large language models LLMs can make stuff up. This is commonly called hallucination. They are a piece of software designed to make plausible text. What that means is that they have no concept of truth or lies, they just generate text that looks plausible. This can mean that you end up where if you don't provide your own process document, it will just make something up that seems plausible. And that's a real challenge. Make sure that whenever you're using these tools that you have a human over the loop. What I mean by that is that humans are checking the output of these tools before they go any further. So if you're using it to improve your documentation, have someone review that. So outdated knowledge. Once these models are trained, they typically don't get updated with fresh information for a while or at all. Even so, you might be dealing with steal things like, oh, this library used to work this way, but now it works another way. The model doesn't know that it's going to tell you the old way of doing things. These tools are also because of, well, what I've just said. They're generally better for templates or skeletons of things rather than fine detail. So one example of this might be if you want a pitch deck for a specific industry, it can give you the broad strokes, but you're going to need an expert to put in those fine details, the things that actually make it relevant and particularly useful. You can use it for things like make me a bash for loop, for example. I can never get that right. I can have the bot do that, and then I fill in my specific logic that I actually want. These models can get very expensive. So if you're trying to buy equipment so that you can run it on your own hardware at sufficient speed, that can get very expensive. If you can even get the hardware at all. There are very long waitlists for some of this equipment. If you're using some of the cutting edge models, they will essentially bill you by token, which is roughly two or three characters, which looks like a really small number. But again, if you're enriching your queries with large documents, that can get very expensive very quickly. So estimate your costs. Choose appropriately. And again, this is also fast moving. The models and the vendors are evolving and changing so rapidly that's unlikely that you will choose the model today that you would choose in a month or in two months or even next year. So you want to make sure that you have flexibility. You probably don't want to sign large upfront contracts. You probably don't want to sign large long term contracts. No one knows who's going to be the top performer in even six months. There's a lot of interesting things happening, which is good, but it also makes it a real challenge. Key takeaways once again, don't over commit. Things are improving rapidly. Build out a small suite of working examples for folks to build on top of. Empower your teams to self service. Ensure that it's easy for teams to do the right thing. These tools have limitations and some example architectures that you can use.

Slides

Download slides (PDF)

See all 36 talks at this event!

Conf42 Machine Learning 2024 - Online

May 30 2024

Maximizing Efficiency with AI Tools as Catalysts

Video size:

Abstract

Summary

Transcript

Slides

Richard Finlay Tweed

Senior Site Reliability Engineer @ Thought Machine

Join the community!

Featured event

2026

2025

Info

Conf42 Machine Learning 2024 - Online

May 30 2024

Maximizing Efficiency with AI Tools as Catalysts

Video size:

Abstract

Summary

Transcript

Slides

Richard Finlay Tweed

Senior Site Reliability Engineer @ Thought Machine

Join the community!