Conf42 Large Language Models (LLMs) 2024 - Online

Unleash the Power of GenAI: Generate Growth and Innovation with Data

Abstract

Your data is the key to unlocking GenAI’s true potential! This session dives deep into real-world GenAI use cases across various industries. We’ll then explore how to optimize your data for these applications. Discover practical strategies to transform your data into the fuel that ignites GenAI’s power. Learn how to unlock fresh ideas, streamline processes, and achieve exceptional results – propelling your business forward with innovation and growth.

Summary

  • Generative AI has taken the world by storm. The true power of generative AI goes beyond a search engine or a chatbot. Data is a foundation module of building your Jennai application. Generate growth and innovation with data.
  • When you want to build your Genai application, there are unique to your business needs and for, and unique for your customer base. Your data is your key differentiator. Lack of the right data foundation or a data strategy was one of the top challenges to implement generative AI.
  • JNAi uses what is called vector embeddings. It represents word and phrases and entities as numerical vectors in a multi dimensional space. This enables genai to understand similarities and relationship between words and entities. Jnai module will produce more relevant and contextual responses for the question asked.
  • Many of our customers are using vectors for their genai application. We store our vector and database together. It will help you break down data silos and, you know, empower your team to build Genai applications. There are a couple of ways where we can help you.

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone, welcome to today's session on unleash the power of Genei. Generate growth and innovation with data. My name is Akanksha Sharon. I am principal data lead at AWS for UK public sector. Now generative AI has taken the world by storm. I'm sure all of you have heard about applications like Chat GPT and it just shows how powerful the latest machine learning modules have become. The true power of generative AI goes beyond a search engine or a chatbot or, you know, chat GPT. It will essentially, you know, transform how companies or organization operate or will operate in future. Just to share some perspective here, Gold Goldman Sachs forecasted a 7 trillion increase in global GDP. They also predict that JNAi has lifted the productivity growth by one and a half percentage over ten periods of time. This is just a small glimpse of potential of Genai. Now, I've worked with lot of customers, enterprise customers, public sector customers, and I think it is safe to say that everyone acknowledges the power of Genai and they are comfortable in thinking big with Genai or they are actually making plans how Genai can be utilized in their organization. But what I find is nearly everyone I speak with focuses on foundation models and LLM more broadly. So the iceberg that you see, the tip of the iceberg is the Genai application, right? There is more under the glacier, there is more that meets the eye at the first glance, right? And this is like my favorite slide. So what enables you to drive the value of Genai? Now, Genai applications are still applications, and like any other application, you need a database underneath it. So you eventually need an operational database to support your user experience and your Genai applications. So if you see on the slides on the right, you need a storage layer, you need a database layer that will have purpose built databases like document, DB, graphDB, vector enabled databases, right? And then there is data integrations. You need the source of your data, you know, whether the data is going to come in batches or via streamlining or, you know, you set up your pipelines to keep up with data change. Finally, you also have to consider governance, you know, process to ensure data quality, privacy, security, right? So while it is very tempting to think about generative AI at a surface level or as a tip of the iceberg, really the data you need to nail down effectively and use modern data architecture, right? And data is, I could say a foundation module of building your Jennai application. And in future slides, you know, in upcoming slides we're going to talk more about, you know, how data is more important. Right? Next slide. Yeah. Another data point that we have is from a McKinsey report. And, you know, there's a link to that report that you can see. Companies that have not yet found ways to effectively harmonize and provide access to their data, unable to fine tune their generative AI and which will eventually, you know, they won't be able to use Genai for their customers or will not be able to unlock full potential of Genai to do this. This requires a very clear data and infrastructure strategy. Now, why does data matter so much? I've been talking about data in like few slides now, right? So let's see why data matters a lot. Now, when you want to build your Genai application, there are unique to your business needs and for, and unique for your customer base, right? Your data is your differentiator. You know, as the name suggested, the data is your key differentiator. And let me give you more thoughts on this right now. When you think about it, every company has access to some foundation models, right? Some of them are easily available in the marketplace. Some of them are easily available in GitHub that you can download and use it, right? But the companies will be successful if they build a Jenny eye application with real business data or business value data that will help them to build a amazing Genai application which caters their customer history, their needs, their utilization pattern and whatnot. Right? So data is the difference between the generic genai application and those that know your business and your customer deeply. And I've seen it with many customers where, you know, they have taken on the shelf foundation modules and not really taken care about data. And eventually they don't see much benefit of those applications now, right? Whereas I've seen organizations who work backwards from their data build the Jennai application, or even if they take off the shelf genei application, they actually embed it with their own business data. And that really helps them to make it, you know, serve their customer in a better way. Right? Now, using data for Jnai actually doesn't mean that you have to go and build your own model, right? So it doesn't mean that, as I said right. Now, while some companies will build, and there are a couple of type of, you know, company, so there could be one organization that will build and train their own large language modules with vast amount of data, and many will use their organizational data to fine tune their foundation models for their unique business use case or their unique needs, right? But underlying all of this, the key message is the data is your differentiator. Now, you know, in a recent surveys with CDO's, we found that 93% of CDO's said that the importance of the data strategy and its role in making generative AI custom to their business is one of the most important thing that they can do, right. And on the right hand side, 37% of CDO's agreed that lack of the right data foundation or a data strategy was one of the top challenges to implement generative AI. So now data foundation matters for generative AI because the access to high quality data about your organization and your customer improves the accuracy and reliability of these GenAI modules and their responses as well. Now this is another example. Now, if you I would like to share an example of an online travel agency here, and they want to generate personalized travel itinerary. So when you want to do this personalized itinerary, what you would like to use as an organization is your customer profile data in your databases. And based on this data, you would like to tailor the recommendation based on things like past trips history, travel preferences, hotel preferences, preferences of family members, age of the family members and things like that, right? So what you will do is you will marry that data with the other company details like flight details, hotel inventory, promotions and things like that. So if you look at this, you know, there are a couple of data points that you're using. There are two kind of different data sets that you're using. So again, it is very important, where is this data set residing and how easily can you access it? The more easily you can access it and secure it, the more easy your response is going to generate. The personalized travel itinerary. Now there's another example here we have. Now, you know, I've been talking about this powerful capabilities of Genei to create content, right? But to make this content, you know, relevant to your organization, you would definitely like to customize it and customize it with things like your own brand logo, your own brand guidebook. What were your previous ad, you know, content from your data lake as well as, you know, company data, like real time inventory of your transactional database and so forth, right? So you eventually are going to use the jenny, but you're under using the data from all your different traditional or transactional databases as well. Now, to get the high quality data for JNAi, you need a strong data foundation, right? In fact, like, I'm sure many of you who are listening to me would have already spoken or would have had a data strategy. You know, in your organization, that's a different thing. Whether you're working towards it or you're running into some issues with it, but we can definitely help you with that process. Right. But Jenny, I make this data foundation even more critical than ever because your data is your differentiator. Right? I've met so many organization that were not really to adopt cloud, they were not thinking about data strategy. But now with JNai, it is becoming more and more critical for them to put this as a priority, right? So your data has to be up to date, complete, accurate, discoverable and available. Right? So that is like your key things for your data strategy. Now, these are a couple of modules for JNAi that we have. Obviously we have purpose build LLM, then we have fine tuning of LLM, and then we have Rag, right? And for the purpose of this presentation, I'll pick up the rag use case and work towards it. Now, with Rag, the external data used to augment your prompts can come from multiple data sources. It could include documents, different repositories, databases, APIs. Right. And Reg helps the module to adjust its output with data retrieved as and when needed, so that you know, it can prompt you with right information. So this is just a quick overview of what Rag is. And this is a very high level reference architecture of, for rag. Right. Now you'll notice two sides to the story. On the left hand side you have processes that occurs in the end user critical path. That is, the end user interacts with application and is waiting for a response. And on the right hand side are the processes that happens behind the scene, right, like ingestion from data sources, batch and stream processing, data integration with pipelines. So you need this for populating your vector databases and various enterprise databases or data warehouses. Now notice the data governance and data warehouse and vector data store. These are very critical, right. And what I am seeing and what most and more customers are doing is they are modernizing their entire infrastructure by moving them to the cloud. And this includes relational databases, non relational databases, right. And let's talk more about the vector data store that is there in the screen. Right. Before we go there, let's look at this critical path for the end user here. Right now. This is again a set of use case, set of scenario that we have. I don't have animation right now, but I'll go by the numbers on the screen here. So, yeah, the first one is, you know, the end user interacts with JNAi application and typically by posing a question. And this is just to give you a example of what happens underneath, right? So an end user interacts with the Jenny application is number one. The second is the application loads the relevant prompt template and, you know, you can create your own templates based on different rules that you have and things like that. Then there is a number three. Is that the question posed by the user? Right. It could be a new question or it could be an ongoing conversation. Right. So anyways, in that case, you know, what we have to do is we have to look into the history data store to allow the user to pick up where they left off. Let's say this is in between the conversation. And, you know, this is a very good example when you go online and you go for like chat option or you go for online help option, right. This is a critical workflow. And for that. So anyways, application needs to pick up where the customer lost the application. We need to pull that state into the right context. Right. What was the context we were asking that question? Number four is the application need to query for profile or any other situational data, right. And this typically would come out of like a data store. For example, if you're returning something, right. It would go back to your historical data store and say, when did you purchase this? And details of the order and all of that, right. Number five is it tokenizes the original question, so, you know, to get a set of embeddings from the LLM. And number six, what happens is with those questions embedding, it performs a similar search in the vector data store. This is using some form of algorithm which basically tells you the nearest neighbor search for the algorithm, right. And it searches, you know, that along with some context. Right. So it basically creates its algorithm to search it. Number seven is once all that data is synthesized into a prompt, it is then sent to LLM to get a response, right? And number eight is, you know, it updates the conversation state and history according to the new interaction. And number nine is finally is the response that you see on the screen, like so, you know, if you really dive deep, you know, I've used data stores and, you know, retrieving the historical information and all of that. Right. You know, if you dive deep, you know, different layers and see what data services you should all consider for these architectural prompts. Right? And one of this is vector databases. So let me go to the next slide and talk more about what is a vector data store? Right, perfect. Now, vector embeddings, basically it represents word and phrases and entities as numerical vectors in a multi dimensional space. Now, in this example that you see, the words or items with similar meanings are mapped closer to each other in this space. This kind of representation or this kind of semantic relationship actually enables genai to understand similarities and relationship between words and entities, right? So for example, like, if you say sandals, high heels, color, comfort, fit, you know, all of these are similar things and this will help them to kind of do that. Next slide. Okay, now, vector embeddings are essentially numerical representation of your audio or video data, right? While humans can understand the meaning of all these words, right? But machine cannot and machine will only understand numbers. So do that. To make them understand that, we have to translate them into format that is suitable for machine learning or for the jenny I application. And this is essentially what is called vector embeddings. This is a very good example of vector embedding. Now let's assume by assigning numbers to different words, you know, we can view vectors in a multi dimensional space, as I said. Right? And then you can measure the distance between them. For instance, you know, if you look at this graph, cat is closer to kitten, whereas dog is closer to puppy. So now by comparing these embeddings in this way, the module, the Jnai module, will produce more relevant and contextual responses for the question that was asked or for a matching word, right? So this is how, you know, whole vector, you know, assignment works. Basically. Another example, just to give you more insight, you know, this is called a superpower semantic search for use cases, you know, like rich media search or for product recommendations. So when you go on the websites and you get some product recommendations based on your previous purchase history or what have you typed in or, you know, what have you seen in that specific portal and things like that. Right? Now, in this scenario and in this screenshot that you see on the screen, you can see that semantic search greatly enhances the accuracy of the output of the query, right? Like one of the things that you say is bright color golf shoes, right? So that is like very, very specific, and that is how attaching vectors and numbers to the search query makes it very, very precise in scenarios like this, right? Okay, so most of our custom, I've spoken about vectors, and now let's talk about how does this vector and data work together. Right? Now, many of our customers are using vectors for their genai application. And one of the feedback that we have got from them is their existing databases should have vector enabled and it will make them more confident, it will meet the requirements of being scalable, available and provide durability, storage and high compute, right? And what we have done is we have made sure that when your vector and business data are stored in the same place, your application will run faster, because when they are in the same place, there is no need or no, you don't have to worry about data sync or data movement and, you know, data silos at all. So we store our vector and database together. And that is why we have, um, enabled vector searches across our multiple services that you see. We have Amazon open search, we have Aurora postgres, we have RDS postgres, Neptune document DB and DynamoDB also has zero ETL for faster retrieval. So this is our famous flywheel. And we start off with, you know, unify where you make sure that you break down your data silos, you innovate by building new Ji application and you modernize your data infrastructure. Now, the beauty about this flywheel is you can essentially start off your data modernization or data strategy from anywhere, right? Like, I've met customers who would say, I'm going to start off with innovate where I'm going to innovate genai application, you know, work on my llms and work on my use cases and then go to the modernizing your data or, you know, infrastructure and then think about not having data silos or making more use of those data. And then, you know, flywheel goes where I have come across customers who would say, yes, we would like to go to cloud first, have a modern data structure infrastructure on the cloud, have a great data strategy, utilize all the benefits of the cloud and then go in the flywheel and then innovate and all of that. So the beauty of this flywheel is, you know, we can start off from anywhere and then once you are in the flywheel, it will just power itself and goes from there. Right. Also, you know, this whole flywheel avoids the risk of getting logged into a proprietary format. It will help you break down data silos and, you know, empower your team to build Genai applications, building a data foundation to fuel your generative AI application. You know, AWS provides a wide variety of services which are comprehensive services for each use. We have integrated with vector databases, zero etL, so you can easily connect to your different data stores, right. If you're already in cloud and using any of your services, we have zero ETL in most of our services that will easily help you to connect and access your data all around. And then we have some very good data governance as well available to have secure your data in the cloud and also utilize some policies or user access as well. So we have lot of services aligned to that as well. Now, where we can help, there are a couple of places where we can help. You can obviously go to our generative AI innovation center page and request for a conversation or if you're already one of our customers and reach out to your respective account team. But there are a few ways you can we can help you is about getting buy in from your exec on data strategy. The next one is we can help your organization to envision, you know, data to drive some of the business outcome. Maybe do a POC, maybe do a first pilot. And then we also have options to basically modernize your data foundation as well. So there are a couple of ways where we can help you, right? And then reach out to AWS generative AI Innovation center to help you more. These are some two very good workshops. This is a very technical, dive deep workshop. So if you are interested to learn more, get your hands dirty, scan the code, register for these amazing workshops and you go from there. So this is my last slide. Thank you so much everyone for joining today's session and have a great day. Thank you.
...

Akanksha Sheoran

Principal Data Lead @ AWS

Akanksha Sheoran's LinkedIn account



Awesome tech events for

Priority access to all content

Video hallway track

Community chat

Exclusive promotions and giveaways