Conf42 Platform Engineering 2025 - Online

- premiere 5PM GMT

Building Agentic AI Platforms: Lessons from Autonomous Mortgage Processing at Scale

Video size:

Abstract

Ready to build AI platforms that think and act autonomously? Discover battle-tested architectural patterns from mortgage AI systems that process complex workflows with minimal human intervention.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello, and thanks for joining my presentation on building agent AI platforms, lessons from Autonomous mortgage processing. If you talk to someone who went through getting a mortgage, they would typically say it was like being stuck in a bureaucratic quicksand for about two months, but this is what industry leaders are now thinking. What if this entire time. The process could run by itself. What if instead of humans shuffling papers between departments, you had intelligent systems that could actually think through problems and also coordinate with each other? Now this is exactly what I want to talk to you about today. I am now gonna in the, and I have spent over a decade providing solutions to various business functions across the financial industry, and I've also spent the last few years. Watching this space evolve from a real equal idea to something that's actually transforming how complex business process work. Just to set the context, let me give you a scenario that's happening right now in the mortgage industry. Let's say someone applies for a loan on a Friday evening, and by Monday morning, the system could have already verified their income through multiple sources. Ordered and reviewed property appraisals. Reviewed their flood zone data, coordinate with insurance providers and flagged any potential compliance issues. All the weekend when everyone was sleeping, AI agents could have been working together like a well orchestrated team. This is not science fiction anymore. This is what Agentic systems can do today. So let's talk about what makes these systems different, because I think there's a lot of confusion out there, especially when people hear what agent AI actually means. Think about the AI tools you use today. Maybe charge GPD or some other automation in your company. You ask it something, it gives you an answer. You feed it data, it processes data, but you're always the one driving it, right? You are the conductor and the AI is just the instrument of your orchestra. Agent AI is now completely flipping that perception. Imagine if your AI could actually understand the bigger picture of what you are trying to accomplish, and then it can go figure out how to make it happen. Should I talk to other systems? Should I make decisions when roadblocks come up or learning something new? When old steps are predetermined, actions are not working. Now here's a real world example. Traditional systems break whenever they encounter something accepted, and that's a known fact. Let's say someone applies for a mortgage and their primary incomes comes from, not from traditional sources of income, but from cryptocurrency mining. Now, this is something the system has not seen before. A traditional rule-based system would just stop and then gives out, give out an error message and wait for manual intervention to come and fix it. However, in an agent system. The agent would itself recognize that this is an income verification challenge, research IRS guidelines on crypto income. Look for similar cases in historical data. Maybe it can even try consulting external tax services. And then once it identifies all the solution, it may create its own verification pathway on the flight. It can also document this new approach, so future crypto income cases can get handled automatically without going through all this identification process. That's the difference. It's not just about processing the data, it's actually thinking through problems just like we humans do. Now, let's take the next few minutes. I, during this time, I want to walk you through six areas that I think are crucial. If you're con seriously considering building these kind of systems, and I'm going to be honest with you, I'm going to talk about what doesn't work just as much as what works because the failures are far more instructive than the successes in my own personal experience. Let's start by providing a sense of just how complex these mortgage workflows are. I, over the years, I actually sat down and mapped out a typical mortgage process. And a typical mortgage process has 47 separate and discrete steps. Yes, it is 47 and this is, and now you know why your mortgage takes so much time to process, end-to-end. Each of these steps require human judgment and they're all interconnected in the complex dependency chains. You got income verification talking to employment verification, which feeds into credit analysis, which in turn impacts risk assessment and which in turn affects the pricing, which again, in turn, appliance compliance checks. Now all of this is happening across multiple systems, and these systems were built years apart or sometimes even decades apart, and they were not designed to work together. And here's a scenario that plays perfectly, that captures the old way of doing things. Imagine an applicant whose employer recently switched payroll companies and which happens, and we've been through such situations as well. The new payroll systems outputs data in a completely different format than what the company expects in the traditional process. This application sits in a queue for weeks. While the user or the agent behind human agent behind the scenes will manually figure out on how to interpret the new format, they make phone calls, they reach out to other teams or maybe even vendors to get clarification, and eventually they build a solution. Meanwhile, dozens of other applications from employees of the same company might be piling up behind the scenes and they all will be having the same problem. Now with the Agen approach, instead, when the income verification agent encounters this new format, it automatically recognizes the pattern item, multiple verification pathways. And within our words, it has established a new integration protocol. It then shares knowledge across the systems so that all future applications from this employer. Get processed seamlessly and successfully. Now, if you look at compare both scenarios, it is not about making older process faster. It is about reimagining how the entire business workflow operates. It takes a lot of change, not just from technology, but also from how we understand these processes going forward. Now I want to dive into the technical side of the architecture. If you want to build one of these systems, there are three architectural foundations that you have to get right. You miss any one of them and the whole thing falls apart. So each and every part of this architecture is crucial for the success of Agent TKI. Now component number one is workflow orchestration. One mistake that I see everybody make constantly is on this aspect. They start with a simple cubase system because they feel the process very straightforward, like agent A finishes since message to agent B. Agent B finishes then sends a message to agency. Straightforward sounds reasonable. But when you hit a scenario where, again, going back to the example of a mortgage, you're processing a mortgage for a property in a flood zone during hurricane season. Suddenly you need property inspection. Flood insurance verification, evac evacuation route analysis, and updated property violations. All of them should be running concurrently, and all of them are dependent on each other. And all of them have completely different processing trendlines now in a simple queue system, everything will get locked up because they're on all on different timelines and different, dependencies. You need an orchestration that can handle dynamic dependencies, resource contention, and resource scenarios. It's like the difference between a traffic light and an air traffic control system. One is straightforward, rule-based, not complex. The other requires a lot of intelligence behind the scenes. The second component is event driven communication. Think of events as the nervous system of these platforms. And here's how it works. When a credit score gets updated during processing, the credit agent broadcasts a credit score updated event. Now, the risk assessment agent sees this and immediately recalculates loan terms. The pricing agent then adjusts interest rates. The compliance agents checks regulatory boundaries. All of this happens automatically in parallel and within milliseconds. The beauty of this architecture is that none of these agents have to know about each other and what these each other, what other agents actually are doing. They just respond to relevant systems. To their relevance events in their own systems. It's like having a really sophisticated gossip network where everyone pays attention to the rumors that they only care about, and we all know how that works. The third component is ML Pipeline integration. This isn't a typical machine learning setup where you run models on historical data and generate reports. We are talking about real time interference. That deeply integrates into the business process itself. Think of a fraud detection model that's analyzing every document that you upload, every phone call that you made, or every API interaction that is happening within the system. The model maintains conference scores that change dynamically, and whenever the confidence drops below a certain threshold, it can automatically trigger additional verification steps without human involvement, or it can involve a human, human until a verification agent is brought up to speed. Now let's dive deeper into event driven architecture, because this is where I see a lot of systems either succeed brilliantly or fail spectacularly. Now picture a si situation, I think, which we have been all which would've all been in at one point of time in our career. It's 2:00 AM on a Saturday, and your property valuation service goes down, or any service for that matter. Maybe it's a data center power outage. Maybe it's a network issue. It doesn't matter for the purpose of this conversation. In a traditional system, every single mortgage application requiring property evaluation would fail immediately and thereby causing a massive backlog, which would require, triaging and teams looking into it on a Monday morning through a crisis response. In a well designed, event driven system, those property valuation requests just queued gracefully. Whenever the issue is troubleshot and the service comes back online, the entire backlog gets processed automatically. Now you have zero data loss and there is zero manual intervention and no customers calling in angry on a Monday morning because your system is well designed through an event driven architecture. Now. Is the trick is in getting the, architecture and design, right? You need to design your events with just the right amount of information. Too little context and your agents can't make intelligent decisions. And if you provide too much context, you create the problem of tight coupling that might kill your scalability in the future. And in a typical market system, you might have 4,000 and even greater than 5,000 different event types. And each has to be carefully crafted for every. In a significant event that occurs during your mortgage process. For example, an income verification completed event includes only the verified amount, confidence score, verification method, and any other anomalies that it may have encountered. But it should not contain the raw bank statements. This keeps the agent loosely cud while enabling even smart coordination. Now one of the other benefits of event driven architecture is that you essentially get a flight recorder, or a black box for your entire business process. When something goes wrong and something always goes wrong in it. You can replay the exact sequence of events and see precisely why and where a decision went sideways. And for regulated industries, especially financial industries, this is absolute good. I want to get a quick reality check. We need to talk about. All the ways the systems can break, because if you're thinking about building one, you need to know what you're getting into and what are the possibilities that it might break. One instance is every organization has legacy integration, nightmares. For example, mainframe is something that is still not dispensable in the financial industry. And I want to talk about the mainframe legacy integration nightmare. That's a scenario where this is scenario where most developers and even platform engineers, get nightmares. You are building this beautiful, modern agent system, and then you discover that some loan data or metadata lives on a mainframe system, which was built in the 1980s. And there are no APIs. You just get one batch file every day that contains, the previous updated, the previous days updated data. And the data formats are in such a way that even XMLs look like modern technology. You end up building, what can be called as a legacy translation agent where an entire AI system whose job is to just speak mainframe day to day. It converts modern event streams into formats that would've been even cutting edge technology and translates the responses back into something other agents also can understand. Now, this agent will act as a bridge between your legacy systems and then the modern agent AI systems. Sometimes these translation agents becomes the most sophisticated component in your entire component. The next one is state management complexity. Now, here's a fun challenge where in a distributed state management, you know long running process or a problem, mortgage applications can stay active for 60 days or more. Sometimes your rate log may extend even for 90 days or 180 days, or even 360 days depending upon the contract you hold with your mortgage provider. Now, during this time, applications can change, applicants can change, property values can fluctuate, interest rates can shift, and even new regulations can come into effect. Now, a dozens of autonomous agents have to be synchronized because each of these aspects are handled by different autonomous agents. Now the naive approach is to store. Every current state that you have in a database somewhere so that when an agent crashes and restarts, especially in a distributed systems, you will have context which you did not have before. You end up needing something like even sourcing, where you store every single thing that ever happened to an application. Any agent can rebuild its complete understanding just by replaying the events team. The third challenge is scalability optimization. Credit analysis for a standard W2 employee might take 30 seconds, but that is just one piece of the pie of mortgage applicants. But when you throw in self-employed applicants, you have multiple, they may have multiple, they have stock income, they may have cryptocurrency income, and also international assets, and suddenly you are looking at 20 minutes of processing time. But how do you plan for an infrastructure with that kind of variability? You need adaptive resource allocation that can monitor agent behavior patterns and automatically spin up capacity when complex cases start hitting the system. Now during the tax season, which also aligns with the, summer school of summer season. Mortgage applications spike and the systems might actually scale your income verification capacity by 300%. And this is without anyone touching a configuration file to scale up or scale down, and all happening intelligently and automatically behind the scenes. Now in financial services and really any regulated industry, you can't just say the AI did it. Every decision has to be explainable, auditable, and legally defensible. And this is not only a nightmare situation, but also a necessary situation for all these regulated industries. Now imagine a regulator walks in and asks about why a loan denial that happened eight months ago, and what was the reason In traditional systems, that's days of forensic work. Detective work going through log files and database records, and there is no guarantee that you'll actually find everything. But with proper decision provenance architecture, you can pull up a complete decision tree in seconds, which agents were involved, what data they accessed, and which algorithms ran, and what business rules are fired, and even the exact model versions that were active at the moment, which gives you a picture on why a decision was taken. At the moment, it was taken. This level of transparency the agents provides is actually better than what most human driven processes can offer. Now, mortgage industry, like any other financial industry, is very dynamic in nature, and the compliance side of the industry is very dynamic as well. Now. Compliance and regulations change constantly and often with very such short notice. New guidelines for cryptocurrency income verification might come and they may require implementation within 30 days. Now instead of rushing code changes into production, a well-designed compliance agent can monitor regulatory fees, automatically adjust business rules, and even run impact simulation before changes can take effect in production. Now data minimization becomes very critical. Your credit scoring agent never needs to full see full bank statements. They can just view aggregated spending patterns and derive data from it. Your identity verification agent confirms identity and then they can immediately discard your biometric data. This architecture not only protects the privacy, but it also enables intelligent processing. Now I want to take a moment to talk about the different types of monitoring and observability. Monitoring agent systems is like monitoring a team of consultants rather than monitoring servers. This fundamental approach in our thinking has to will play a critical role while implementing agent TK in our organization. You care less about CCP utilization and more about decisions, quality and business outcomes. Just like human agent for example, you might track agent confidence patterns. If your property evaluation agent suddenly starts expressing low confidence in its decision, then you would have to take it as a signal to either adjust market volatility. Or newer property types that the data hasn't seen yet. Or data quality issues with vendor feeds, which can offer open if vendor feeds actually change or vendor systems change. Now all of this gives you a early warning before problems start hitting customer experience. The metrics that matter. Most are things like, are agents making decisions that lead to successful ones? Are they catching risks that human underwriters might miss, or even processing times improvement while is the quality is staying high. One critical metric is decision reversal rate. Now, how often your human agents are disagreeing with the AI agents? Now when the disagreement rate or the reversal rate is high, then you know. That one particular AI agent requires attention. It may be a data training issue, or there may be fundamentally some other issue that needs to be addressed. And then you can bring your that particular AI agent offline while you work on bring, propping it up to a required threshold. I would like to take a moment to share some practical lessons about infrastructure that can save you, not only money, but also, months of pain. In case of our deployment complexity organizations still, most of them have a blue green deployment approach. In the case of a blue green deployment. It becomes incredibly tricky when long running processes, especially in mortgage applications where application can take 30 days into processing. Now when an application is taking 30 days of processing, how do you deploy a sec critical security update? And how do you update agent versions without breaking the applications in process or applications in flight? Now this is where you need versioned agent deployment. Where each agent version can handle requests from older process versions, new applications get newer agents, existing applications will continue, still continue with the older agents. Now this can be understood with an example like think of a restaurant menu where meals, where your previous orders meals are still getting prepared while you work on updating your restaurant panel. The next aspect is configuration, and this is also configuration management is also key. Now, make your agents pull configuration from external systems at runtime. This means you can adjust business rules, update model parameters, or even modified decision thresholds without deploying any code or any downtime. During the early pandemic lending criteria almost changed daily. But ha without. But having runtime configuration management was the key differentiator between organizations that was able to adapt quickly and that was also falling behind. Now. This is the organization. Culture factors are equally and critically important, just like a technical factors. This is where most technical teams fail entirely. The technology is often the easy part for a technologist, but getting people and the organization to adapt is much harder and actually very harder. Picture a change management scenario. Your most experienced mortgage Generat has been doing. Their job for 20 years now, they can spot problems that are not captured in any rule book. They're, they know how to handle end scenarios or they know how to handle new scenarios. Their initial reaction to autonomous agents is probably going to be met with skepticism or even challenged with hostility. The successful approach is not to force adoption. But to involve them in every step of the decision making process and also in every step of making the system better turn them into agent trainers rather than a competitor, and then they can help identify edge cases and also teach nuanced decision making patterns to the AI when domain experts. Like an experienced underwriter become invested in improving your system rather than competing everything, it actually changes the whole landscape and changes everything. You also need people who understand distributed systems and machine learning and mortgage underwriting. These. Unicorn skills don't exist in traditional job descriptions. Cross training domain experts often work better than trying to hire people who know everything. This helps with employee retention within the organization as well. Third aspect is governance, balance and oversight. Now, having a governance balance and avoiding the extreme is also one of the important factors micromanaging every agent decision. Kills the autonomy and the, and and it defeats the purpose of having an agent AI in your organization, but zero. Our site also is unacceptable and brings in a lot of risk. Exception based monitoring is where you find the right balance agents operating independently, but they alert humans when they hit a situation outside their confidence boundaries. The next topic is about future directions and emerging patterns. The future will get even more indus. Interesting. We are starting to see emergent behaviors where agents develop their own coordination strategy even without explicit instructions or explicit programming, for instance. Fraud detection agent might learn to tip off an income verification agent when it spots suspicious patterns, or create collaborative investigation workflows where more it is more effective for both agents to work together than working alone. These emergent behaviors often surprise even the engineers who built the system, and this is where. The industry is actually heading towards the next aspect is large language model integration. Now, large language models are actually a game changer and have become a game changer. Imagine telling your system new regulations require additional verification for properties in flood zone, and having it automatically understand and implement the necessary changes across all relevant agents is necessary. We are now moving towards. LLM interface for business where it can automatically manage the rules, directing the business processes. Now the last aspect is cross organizational agents. The next frontier is also agents that work across company boundaries and not working within company silo. Imagine mortgage processing where your agents coordinate directly with AI agents or human agents from external vendors like title companies or insurance providers, or appraisal services. And they can all work together seamlessly while maintaining security and privacy requirements and not compromising any compliance and regulatory standards. Now the architecture patterns that we've discussed right now is not limited to just mortgage or financial industry. The use case applies well beyond the financial industry. For example, healthcare organizations are using similar approaches for patient care coordination. Supply chain companies are also building agent networks that automatically adapt to disruptions. Any complex multi-step processes with lots of coordination requirement is actually a good candidate for agent automation. Now whether you are processing loans or managing patient care, or even optimizing supply chains, the core principle is going to be the same. For all of them. Autonomous agents, event driven communication, comprehensive observability and adapt orchestration, all pillars of a good architecture and governance framework. I want to leave everyone with four lessons that can actually save you from most common failures. The first one is. Orchestration. Orchestration is everything. I would suggest not to underestimate any coordination complexity. Simple cubase systems will break when the real world scenario start. It may work well. During your, initial testing or initial deployment, but as long for long-term usage, it's going to start breaking. For H case scenarios or for newer scenarios, you need to invest in sophisticated orchestration from day one. The second one is you need to have observability from the very beginning. Audit trails have to be built. You need to have decision transparency and regulatory adaptations should be built directly into your system architecture. Compliance is not something that you can add later. You it has to be incorrect incorporated from day one. The third one is compliance isn't optional. You need to you need, if you can't see your agents, what they're doing, you can't trust them with critical processes. So you need to build comprehensive monitoring into your architecture. And this comprehensive monitoring should be any critical first step and not an afterthought after building your applications. Now the final thought is we are at the beginning of a very fundamental shift. How complex business processes are working now. Agent DKI platforms aren't just automating existing tasks. They are actually re-imagining entire industries. The organizations who are figuring or understanding and figuring this out first will have the significant market competitive advantage. Actually thanks for spending time with me today. I am hoping this presentation gives you a practical foundation for thinking about how to build agent ai and in your own context. If you're considering building these kinds of systems, my advice is to start small and focus on solid architectural foundations. And do not be afraid to fail fast because it is only from these failures that you can build solid applications. The patterns I've shared with you today should give you a roadmap for avoiding some of the most common pitfalls that most organizations go through when they start on the Gent AI Road roadmap. My final thought is the future belongs to those organizations that can actually successfully blend human expertise with autonomous AI capabilities. The technical patterns we have discussed here today is just going to provide you a blueprint for building that future. Thank you.
...

Naganarendar Chitturi

Senior Solutions Architect @ Newrez

Naganarendar Chitturi's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content