Conf42 Observability 2025 - Online

- premiere 5PM GMT

RAG Beyond Chatbots: Transforming Customer Support with Observable Intelligence

Video size:

Abstract

This presentation reveals how Retrieval-Augmented Generation (RAG) is fundamentally transforming customer support operations, with implementing organizations achieving up to 40% improvements in first-contact resolution rates, 35% reductions in average handling time, and 25-point increases in Net Promoter Scores.

Unlike traditional AI chatbots constrained by static knowledge bases requiring multi-day update cycles, RAG systems represent a paradigm shift by dynamically retrieving information from enterprise knowledge sources before generating responses—with observable knowledge updates in near real-time. Our analysis across enterprise deployments demonstrates RAG systems reducing agent escalation rates by over 60% while expanding support coverage to 85%+ of total inquiries.

We’ll explore RAG’s observable advantages with precise metrics: 95% reduction in hallucinations when properly implemented, 72% decrease in knowledge maintenance costs through dynamic integration without manual updates, 43% improvement in contextual understanding across multi-turn conversations (reducing conversation length), and measurable knowledge transfer efficiency to human agents (improving agent satisfaction by 38%).

The presentation explores implementation requirements through an observability lens, examining how organizations achieving the highest ROI focused on knowledge base instrumentation and monitoring. Companies with properly structured and observable repositories demonstrated 3.5x higher retrieval accuracy than those using unprocessed document collections. We’ll conclude with emerging trends in the observability space, including multimodal RAG systems showing 57% improvement for visually-oriented support issues and proactive support capabilities reducing ticket creation by up to 30%.

This data-driven exploration equips customer experience leaders with actionable insights into how observable RAG architectures can transform support operations while significantly reducing operational costs and creating sustainable competitive advantage.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Welcome to Comp 42 20 25. My name is Webpa and today I'll walk you through how retrieval augmented generation, also known as drag, offers a huge improvement over conventional chatbots and the enterprise support domain, especially when it comes to accuracy, scalability, and knowledge integration. We'll first understand how like traditional chat botts work traditional chatbots even the more modern ones that use intent classification or basic NLP rely on static knowledge and like hardcode hardcoded flows or pre-train models with limited ability to adapt they're built using rule-based systems that scales poorly and can easily break on edge cases. They're trained on like a snapshot of data, which becomes absolute very quickly. And the session state management is often brittle or non-existent making multi-tone context attention weak or absent. So in other words, none of these scale. In real world customer service environment where like data is. Continuously evolving and there are diverse user queries every day. As a result this causes a maintenance overhead. You need to manually update knowledge bases continuously. And there is a risk of hallucination when the model doesn't know the answer, but still guesses it. And the most critical one, like the system can't handle novel or low frequency queries because they're explicitly encoded in the training set or a dialog tree which creates a feedback loop. Loop of high operational overhead and poor user trust. Here comes rag in the picture and rag shifts the paradigm by introducing real time document retrieval as the very first step to response generation in the chat bots. The first step involves understanding the query the user has asked, or using advanced transformers with semantic. Later the query is embedded via bird or sentence transformers then matched against a vector store like F-A-I-S-S or pine Cone or Westpa to retrieve the top K passages. And once those latest information is retrieved this data is fed into the prompt for language model enabling like a grounded generation. And in case if the data updates are detected in any of those while receiving any of these data, the delta of those information gets in text automatically and gets made and is then made available in the next retrieval cycle and saving the efforts on fine tuning the models again and again. Along with that like traditional bots typically fail at handling composite or like multi intent queries due to linear dial up flows and like limited memory as well. Rag in the situation introduces a multi-stage reasoning. So the way it works is like the complex inputs are passed into smaller structured sub-questions into using semantic and syntactic segmentations. And then each component is independently and individually queried against the vector store. The system then can pull from multiple domains or knowledge clusters, and then the retrieve snippets are fused using either concatenation or specialized summarization layer fusion in decoder. Then the language model generates a comprehensive, unified answer that covers all parts of the inputs. Something like, which is very difficult to achieve with a traditional bot. Later. Due to all this applications prag is able to achieve a higher accuracy over traditional chat bots and LLM models because of its ability to ground responses with real data. The language model is conditioned on retrieved documents. So the outputs are based on actual enterprise knowledge and not the parametric memory or like the stale memory, which is out of date. And in this situation, if let's say the relevant documents for some outage or downtime aren't retrieved the system can return, fall to fallback messages or uncertainty signals instead of guessing, improving the trust and transparency. Along with that rag systems can estimate confidence based on retrieval scores and response entropy, enabling you to set threshold enabling you to set thresholds. And escalate uncertain responses or ask clarifying questions in situations when the model is not able to, or if the estimate con if the estimated confidence level is below the set threshold. Apart from like the ask accuracy aspect of the rag, rag based system, the other very big benefit that rag based chatbot brings in improving the user experience that it can. We augmented with session level memory, with using embeddings or key value stores. So basically what it can do is then it can track user intent, slot fills, and previous interactions to maintain context even across like multi tone conversations. This in turn results and dramatically more. Coherent dialogue flows and like higher task completion rates. He elevating the user's experience apart from this in a domain of customer support. Even like the advanced automation there will be a point when some conversation needs escalations. And in this situation, having an AI agent that is able to store context and intent and intent. In those situations, the handoff to human agents helps in bringing the human agent up to the speed of the issue very quickly because the rag based system has all the context and intents already stored in there. Its database. So with RAG, we can achieve like a structured handoff to the human agents through delivering a package with. History retrieved document and generated responses. An agent no longer needs to ask questions again and again or dig for background. They get full context, including the intent chain and prior decision points so that they are already up to the speed with the customer's issue. This in turn reduces average handling time for a of a customer and improves both agent satisfaction and first time resolution. Rates. For, from a design view these all can be easily integrated with any of the CRM applications that any organization is using for the support operations in their company. And so in, in order to make sure your, a rag based bot is up to date with all the knowledge and all the latest information. And this is all because like in a live system document, in the knowledge base are bound to change. So in order to make sure your rag based chat bot is always up to date we need to establish a continuous knowledge update pipelines. So as soon as like content, like policy, document or support articles is updated. In the source systems like Confluence or any of the CMS, system that the company is using, it needs to get ingested and queued for reprocessing. And once that happens, a background job should vectorize these content using embedding models like BGE mini, LM, et cetera, and then update the semantic index in a vector database like F-I-A-F-A-I-S-S or any of the corresponding data vector databases. Later. So what this does is like later, whenever a user query comes in, the retriever immediately surfaces the latest relevant content, ensuring the response reflects the current state without any model fine tuning. And the generator model conditions its output on the updated context, enabling it to answer questions about the new feature poli or any of the policy changes or known issue on the same day the document subject was changed. So the response will always be with respect to the latest updated data. So all this is works fine in the ideal world but in order to have like a performance or like a well running rag based chat bot we need to have a quality data knowledge base. Because racks performance, I would say is directly tied to the quality of the knowledge base you have. So in other words, this is a situation of where like you put in garbage and you will get a garbage out. So in order to make sure like that doesn't happen sorry that doesn't happen. We need to make sure all the redundant data or near duplicate documents or contents are removed from the content base. And other ways we can do is like normalizing conflicting entries. That can in some way con confuse the retrieval mechanism as well as the LLM models in generating response. There should be a structured KB using clear topical taxonomy with fallback links in situation if the content is not available. Along with that, like there should be metadata tagging for all the content so that it's easily available for filtering and at the time of retrie. So this is like a high level information about like the technical technology stacks involved in each layer of rag based chat bot. For knowledge base. There should be proper document vectorization and metadata tagging for easy retrieval. The retrieval systems should be should be equipped with vector databases and semantic search for the models to easily select. Then the language model can be any of the l LMS that are out in there in the market for actually generating the responses. And the last piece is the security which is the most crucial one because, even after the retrievals there should be one layer of access controls and query filtering that should be in place to make sure there is no data breach and an un unauthorized data does not get sent or gets displayed to the customers. So now coming to the future avenues of rad based chat bots since it's context awareness and and working with real time data, it has opened door to a very hyper-personalized and behavior aware. Support experience. Customers there can be customer specific retrieval, basically segmenting vector indexes per customer profile or organization. Along with that it can also detect anomalies and usage logs because and these logs can be readily made available to these, to these chat bots so that it can predict some sort of outage. And then proactively surface health. This is one of the use cases we are currently working on. And our goal is to achieve a state where we detect an outage even before it happens. Along with that since the models are connected with live data even if, let's say a bot has been trained or has been embedded with one language, since it's it can always retrieve new information it is possible to have the bot respond in any other language as per the user's need, which makes it perfect for a organization which has a global presence and has customer base spread across the growth. And along with this, the next evolution in multimodal rag where inputs and and retrieve content are just not limited to text. Some of the emerging capabilities currently we have is the visual search and retrieval. Basically customer uploads an image and then the rag model tries to find a match against the corresponding visual kbs example, like a photo of damaged product a error screenshot that the customer just uploaded. And based on that, it is able to troubleshoot and based on that, the chat bot is able to troubleshoot what the customer issue is. Apart from that retrieving instructional videos based on any text that customer provided so that it can provide more helpful information or resources to the customer to solve their issues. And later it could be like a fusion of both text and video and vice versa. That, can be implemented with the rag based chat bots architecturally and technically these for us to for us to enable these features we need to have some sort of vision. ENC coders. Example I think clip and blip two are some of the widely used encoders that out there in the market. And some sort of cross model retrievers. Which will for sure help in expanding the use cases for Rag Beyond Text. Now here are some stats that we collected that shows a tangible improvement in customer support operations across multiple KPIs. These have directly impacted operational efficiency and customer satisfaction. These efficiency gains stem primarily from improved first contract resolution, as you can see, has a number of around 75 to 85% compared to 45 to 55. And also the reduced need of clarification exchange is another thing that has proven that the rag based model has been tremendously very helpful in enhancing the user's experience while trying to find support. To wrap up the evolution for traditional AI chat bots to rag powered system reference represents a transform, transformative advancement and the customer support technology. Since it enables like grounded real time answer that scales with knowledge changes. So the customers are always, given like the latest information it reduces the retraining cycles and lowers the operational overhead, which saves any enterprise organizations a lot of money context continuity, even across long interaction so that customers don't get irritated in explaining the same things again and again. And, the use cases are still like, since it's an evolving technology, there are many more room for more evolution in this domain. So thank you so much everyone for your time. I'll be happy to answer any question regarding RAG and how it can be used to improve customer support operations. Thank you.
...

Vaibhav Fanindra Mahajan



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)