Building Scalable AI-Powered Platforms: Microservices Architecture for Enterprise CPQ Systems

Video size:

Abstract

See how we built an AI platform processing 100K+ requests with 99.9% uptime! Live demos of auto-scaling ML pipelines, self-healing systems, and developer tools that cut deployment time 65%. Real architecture patterns for enterprise AI platforms that actually work in production.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Hi, this is Kisha Kumar IPOing. I'm a CPQ Solution Architect. I have more than 15 years of experience in building the CPK applications for various enterprise orgs. Okay. And I worked in the very fortune companies led to the manufacturing semiconductor and also the energy and power utility services built. The CPQ ecosystem and helped in generation of the proposal, quotation and complex pricing related solutions to them. So coming to this today topic, where I was trying to present to the external audience is how can we utilize the CPQ application or the CPQ systems by using the AI powered platforms? The reason being now, nowadays, we keep hearing everywhere the AI implement, the AI in every technology. So that's where we are trying to see, hey, how can we leverage the AI tools to the CPQ? How, what is the benefits that could be bring up to the CPQ applications by using the AI platform, having the microservices architecture kind of stuff. So we'll go in detail. How it is going. How it looks like, how the deployments, how the configurations, what is the ROI we get it from, be utilizing this corresponding feature. And what are the current challenges? Use the, the implementations and also the sales guys, how issues in generating the quotations, getting the dynamic pricing, the complex configuration, taking how much time for them to bill. All this information we encapsulated in such a way by giving a microservices kind of an operation by using the AI as a base platform. Okay, so coming to this one, right? So as I mentioned, a system process in this particular architecture standpoint that a hundred care requests monthly could be coming up with 99.9%. There will be an uptime means. There are no issue. The, the key complex AI workloads will be at the enterprise level scale. We have engineered the AI in such a way that even though you've tried to hit thousand K per requests immediately, the trained models will able to generate the right configuration with accuracy, pricing, and accuracy coating. So that's where this corresponding architecture, the tool will looks like. So going to the next slide. Okay. So as I mentioned there's a interaction. So CPQ has much more demand coming to this CPQ means its configuration, pricing, and coating. So how can we do this is complex configurations with the real time data I know by using the real time data, configure the products, accurate pricing decisions. And when, if you try to give more discounts, the approval process should be in the dynamic nature. Hey, this is the, most of the customers are asking this much of discount. Okay, we can able to approve this particular code with the same discount. So those kind of a training model and the dynamic and the forecasting. Okay, these are the list of codes within the quarter. How can we forecast that one? What is the expected revenue? And when I were trying to use a, trying to configure a model. We do an upselling kind of a thing. Hey guy, our guided selling, Hey, if you try to buy this particular product, okay, so you get this particular product with certain price, with less price. If you combine into a bundle, you have much more discounts a thing. But this is all like, and the air will train the corresponding backend system and suggest the user saying that, Hey, this is the configuration you're trying to do. Most of the customers are trying to buy this corresponding product. And also you'll save this much of amount. So these kind of ai, there are microservices behind the scenes, which will try to solve and help you in getting the right configuration with the right pricing too. And coming to this, the response time by building this microservices is, was 200 milliseconds. It's not even 0.2 second in the response time. And we address by utilizing this corresponding platform technology, we address the challenges like, okay, what are the providing the abstractions, the tooling necessary to deploy the manage the AI workloads at a scale? And this particular article presence, the approach, how can we build the microservices based CPQ platform that seamlessly integrate with enterprise grade reliability and developer productivity means this microservice can be plugin. And within that AI tools, and then that will be trained and get utilized by the CPQ platform for the better coding process. So here is a high level architecture. So one is the user request means, for example, users trying to input some user request. Okay? Capturing the configuration, input, whatever he's trying to want to buy, he select those inputs. And everything. After selecting those configuration, whatever is required. So what happen, it'll go to call an API, which is like a background app, API interface that will call the microservices. Microservices already build in such a way that it take the mission learning languages, AI tools, everything will betray, generate the product rules and the data. So for example, configuration, it'll generate hey. For this configuration. These are the, bill of materials, we can say it or it could be a product. Those are the products that could be offered. And also this is a price that could be optimal. Price also will be generated for the users so that they can able to generate a quotation. So it is the, it's all included in the microservices. The way of getting the request from the user, it will orchestrate and give a flexibility and to, and for the enterprise work, I mean ensuring that we are handling the ING mechanism. We are handling the multiple request, multi treading, all those interfaces, which will again interact with the microservices AI interface services and supporting the infrastructure components. And we do have a synchronous API calls and also the synchronous event, streamlining on the case by case requirements. Real time ca pricing calculations and also ensure there's no latency and that sometimes it requires a batch processing and model training. Real time model training will also be taken care as part of this overall architecture standpoint and next to this one, right? If it is about the AI and the machine learning, the service integration. So what is do is we do have three level of architecture of the integration pattern. One is a gradient Boosting engines means it's handling the pricing observation with a standard API interfaces. The second is the neural network. It's a power configuration recommendations to dedicated microservices. The third one is the real time. Real time is nothing but okay, having the information, so dynamic prices, adjustments based on the market conditions. So based upon, for example, if you're trying the module of the train model will suggest a price. At the same time, it'll interact with the several services microservices in the on demand, on the current market conditions. And it'll also adjust the pricing, say pricing dynamic pricing. So it'll, it's not only the easy infrastructure is not just for each static thing. It can be leveraged to any industry standard frameworks, including the tens of flow serving mission flow, and the customer interface service builds that kind of information. So the data science team will to choose the when. Now we are trying to sell this corresponding product of the architecture. The data science team can choose the appropriate serving solutions for the specific model types and the performance requirements. So where they can also provide a unified monitoring and management CAPA capabilities across the all serving frameworks, ensuring the consistency in the. Operational practices regardless of underlying technology. So it really comes as a package, as a service so that this microservices can be utilized to any industry framework is what the high level, the integration standpoint we are trying to build. That is the microservices service integration and coming to this event driven communication band. The next level is about, one is we talk about the integration patterns. And ML thing, there's an even based, so we now we're trying to, okay, there's some triggers or the even driven based architecture. We now there some changes is happening. So then lose couple, so for example, the end up data is not getting it. So you lose couple of between the services while maintaining the data consistency and also enabling the real time approaches. So we use the car Kafka serves as a central event, maintaining handles the millions of pricing events so that it's event driven. So a lot of pricing, even when now we try to hit the request, the millions of pricing events will directly hit to the Kafka services and standardize the schema, enabling the backward compatibility as a platform evolve, so the events also be taken into consideration. So it's keep on evolving the microservices and also the get the data, train the model, all those information. So even and sourcing the caps, the complete history of the pricing decisions, the configuration changes. And provides then audit trails and then also, okay, what type of price happened in past, like last few days and what is the history of the overall pricing and okay, what type of products configured by the users in the order the period after time. All those audit trails will also be available and provide for the analytics, this events to maintains the immediate records, debugging, compliance, analytical purpose, all those information. So this highly, we are trying to talk about the event driven communication patterns and coming to the next one is the what. Okay. We, what is how we can, what is infrastructure, how, what is a deployment cycle looks like on this particular microservices? One is this is a multi-cloud infrastructure management. So Terraform infrastructure as a code. Okay. That is one structure. The other one is about the cross cloud networking. So coming to the Terraform is pretty much like entire infrastructure with the modular configurations and different cloud providers, including the AWS Azure and Google Cloud platform. So these are the standardized patterns for the common components. It has a build in security controls and also caught up cost optimization settings as well. And coming to the cross cloud networking, it'll be a ME technology, so it to provide a secure, encrypted communication between the services. It's an advanced traffic management, candid deployments, and the secured braking and automatic tires. So we leverage both of these infrastructures based upon, avoiding the vendor locking. Then this infrastructure code undergoes the same rigorous review and also the testing. Have the ability process as application code, ensuring the real reliability and the security at the infrastructure layer. So we use the multi-cloud infrastructure management on the high level we talking about it. Yeah. Coming to the scalability and the performance optimization. So any tool, anywhere performance is very key. And also on the top of it, what is a scalability standpoint? So it's not just okay, one time you buy it or one year, one time you install it. We cannot, we need to ensure, because now the world is keep on evolving, changing very, quite often. We don't want, okay, this is the static thing. We cannot change anything. And the performance is like that. We cannot do improve the performance. It's not like that. This is about the performance. We ensure that the services are more scalable and keep on, it could be evolved as much as we can and you can make the changes as, but the industry standards and all. And coming to these auto-scaling policies respond to the both traditional metrics and also the AI specific indicators. Okay. And also horizontal ports, autoscaler configurations, and machine learning services, including custom metrics, exported from the models, serving the frameworks, ensuring the scaling decisions reflect actual model performance. And. I end up coming to the performance optimization. That is one of the key characteristics without having the performance intelligence that we added to the way of implements we do that is the intelligent node selection algorithm that's considered both cost and performance characteristics when adding a capacity. So GPU enabled nodes are provision only when neural network workloads require them optimizing the infrastructure course while maintaining the perfor, while maintain the performance as a. Is and high level, it is a scalable and performance optimized as well and coming to the deployments. Okay, so everyone knows we do have a GI repository. Also, the Microsoft is can also be handled by the GI repository and the configuration deployment manifests stored in the gates and the bluegreen deployment. This is a new model version that is fully bombed up so that you can have the algo series sync automatically syncs the design state from gate with Accu actual. Cluster state, automated testing, all those simple, the deployment pipelines, it has a zero time down downtime for the deployments. For example, if you're trying to enhance a new features to the M, you don't need quite a downtime. It'll, the pipeline is in such a way that it will automatically disconnect and connect to another models. Inside that is a well sophisticatedly connected platform and manages the traffic also be shifted. From one tric to another metrics and detect any degradation is happening. The performance in the model performance, it'll alert and say that, Hey, there's some issues. We have to take a look. So it was very strongly connected. GI ops and the continuous deployment. Okay. The second thing is the developer experience and tooling, right? There's the one is the deployments, which we already discussed. What should be the developer experience? What type of APIs that will be supported for the microservices and SDKs, so standardized APIs. Okay, so there are two types. Typically we look, one is the standardized and the custom APIs. But by using the self services, APIs provide the standard interfaces for common operations like model deployment and also the feature more engineering and performance monitoring. These APIs will extract abstract the complexity of the underlying infrastructure. Allowing the developers to focus on business logic rather than the operational concepts. And the other one is the software development kits. I like the Python, Java, and GoPro idea medic interfaces for interacting the platform services. STK can also used data scientist tooling. So the other one is, we know this APIs, we know the about the languages, and then the data. Data is data scientist that will be, these environments helps libraries. The common tasks of feature extraction, model evaluation, and the deployments. This is about the developer experience and the tools we use coming to that once we develop, the other thing is okay. Where we have to do that development. Okay. We understand that this is very scalable. SDA SDKs and also APIs and the data science. We can do local development, okay? And also replicate the production platform architecture with the lightweight alternate use like Docker composed configuration models. Including mock ML models for common. These approaches enable developers to test complex workflows entirely on their local mission. They don't, see, because we're trying to take the backup of the production, it is a very simple mechanism or lightweight, alternate use, and we can reproduce any issues or challenges in the complex workflows and increasing the development velocity. And also the platform includes a sophisticated testing frameworks for ML workloads, addressing the unique challenges of testing probabilistic system. And the testing, yes. Continuous integration pipelines we have with automated testing is available. We can have the full test lifecycle suit. We can create it so that once it is pulled from the production code base, you can run the test test cases and see what is a benchmark. The performance benchmark and integration test, we can do it. The c the kind of maintains the historical performance data. Automatically flag integrations in more accuracy or interference speed. Okay. Now, integrated development environments. Right now we have your local environment. We can do the testing. How can we connected with, how can we connected with the production environment and what are the tools are necessary kind of a thing. So cloud based environments that includes all the necessary tools, configurations for production deployment, GitHub code basis, and pre-configured work workspaces accessible for any. So integration with popular ideas through remote development extensions, NFD developers, and while more benefiting from cloud-based compute process, this is high level. We are trying to explain that this is an integrated development environment, which is in easily integrated from our test dev stage and production with having the CICD platform coming to the observ and see one is, yes, deployed is done after once it's go live. Okay, how can we calibrate the metrics? Okay, so what is the metrics collection? Okay, we are what is how the usage is happening? Okay? What are the, any error is happening in the application? How we get notified, so what are the metrics collection? How will we distribute that tracing visualization in the dashboard, provide a realtime visibility hearing system, behavior system, or holds the system performance. What is a motor behavior? And also the intelligent alerting. So ML algorithms analyze the historical metric data to establish a dynamic baseline and detect anonymous. And the other one is that the metrics collection pro, this captures the traditional application metrics and instrument. So typically this microservice has an ability once we go live, to provide the observability and also the reliability as it kind of stuff. So we know the metrics we visualize in the dashboard. The racing thing and also intelligence alerting if there are any issues as well coming to the SLI or SLO framework, okay, this is 99.9% availability with, 0.1, I mean I can say close to the 99 point 900%, it is up. There's no downtime per month, so try approximately 33 minutes of downtime per month is max. And also the response time. The other one is very, see the services are not down. The how the response looks that is, that's what we talk in previous slides about the performance standpoint and the scalability standpoint. It's coming to the performance. The events is 200 milliseconds means the accuracy is more important also, and also the speed is also more important. 95%. The pricing calculation completed within the 200 milliseconds by using this microservices. Maintaining the response to, so user not even feel any slowness. And 50, the ML interface, because we are using an, the average inter interference of the latency, enabling the real time pricing without predictability delay is 50 milliseconds. The, these are the SLI service level indicators beyond traditional availability and latency metrics to include AI specific measures. The key SL. S includes the modern model interference and accuracy. Prediction consistency across replicas and future freshness for real time predictions. So on the high level, the downtime is very minimal. The response time is very quick, and the accuracy or suggestion to the customers is almost all very quick. And the ML interference is just, it's 50 milliseconds. So it's all very, it's negligible. So that user will not see any slowness. And also there will always be available. That's how the framework looks like coming to the incident response. So we talk about the development first. We talk about the infrastructure, we talk about the development. What are the APIs that will be utilized, and also the pipeline integration to the production. And then we observed, we looked into it. What is the pa? Also any alerts are a thing. And also how the downtime and looks like and now coming to this, okay, in the worst case, if there's an incident happen, what is a response? What is a recovery and how can we ensure that? So because there's a, some issue happen in the circuit breakers preventing cascading failures or automatically it's not getting that much. So there is an automated runbook guide troubleshooting. So we can get created so that incident response team procedures leverage the extension. Even for those instance, we do have automated tools so that which will be okay for this response. For this one, you have to fix this score or this responding. And also we have an automated charge, GPT integration, that instrument management communication platforms where user can also see the moment they get an error. That error will all be translated in the backend and guide the user, Hey, there was some issue with X, Y, Z. And just wait. We are working. We created an incident automatically just called off. And also the disaster recovery, right? Nowadays, okay. All of a sudden, what will happen out to our critical data? So it's all running in the ml. We are trying to gather the data, train the microservices, and get the information and share the data. So what is the disaster recovery process? Okay. We are always take regular backups, regular restoration tests, do BA and all if anything happen. So we always have a restore mechanism from our backup services. And automated DNS updates as well for the traffic redirection. Yes. And coming to this important thing is about the data platform integration, right? So one task is about how the Microsoft architectural front-end looks like the backbone of this. Everything is about the data without having the data. Even though if you try to build the microservices Yes. Output will not be kept. That is the major thing. There's a Kafka backbone, is there? What are the sources we are trying to consume? And the cons were the consumers was how we streamline the data based on the consumer information. So every year we process the millions of the pricing events daily, and the topics are organized by businesses domain, which generalized naming conventions and retention process. So it includes the timestamps and correlation IDs, portion information enabling the detail traceability, and the debugging cap. So that is about the, one of the important data platform architecture. The strain, even stream streaming architecture system. Coming to the other one is about the feature store and ML data management. So the Future store serves as a central repository for ML features, ensuring the consistency between the training and also the serving environments built on top of the Apache fist. This is the feature store in the ML data management. So we had a batch feature. Suppose the model training workflows while online features enable the low latency interval during the interference. So Apache Spark for the large scale data processing with optimized algorithms for the common transformations and also coming to the data line is right. So we know the tracking provides a complete visibility into the future generation process. From raw data services through transformation steps to final. Future values. This transparency proves invaluable for debugging model issues, and also ensuring the compliance with data governance requirements. And the coming to the other picture is about security and compliance. This is very important aspect. Okay. So because we are dealing with the data of the organization, and also nowadays AI is trying to capture all the data and to guide us, but at the same time. We need to protect our data and also what is the security measures? What is the compliance? When you're trying to input a request, the request has to be reside within that our architectural and also the compliance, guidance, and standpoint so that it should not be below outside of our world. So that's why we doing the mutual TLS, all the service to service communications across over mutual t ls. It's certificate rotation automated through certificate manager so that it has a TLS mutual TLS certificates and the network policies. The moment when you're trying to send the request, the stringent segmentation between the services, limiting the OCA communication too, explicitly authorize the paths. And the other thing is the identity management could be open ID or 2.0. Definitely. It's a key based kind of thing. Or the client ID kind of, or rules. We use it. There's nothing like a. Layman, a username and password kind of thing. It's all will be a secure identity management of what? 2.1 the Open ID connects and the secrets management, right? We store everything in the vault. Even though those keys, you cannot easily capture them. Those will be the secured wall and it'll be the vault to access the vault every 15 minutes. It's keep on changing your password. You need to connect to the VPN and you need to connect to the world management with your short. We can save one type of a password to get the key. And again, then this kind of a rotation will also be happen in the database. So these are the very key important aspects for the security and compliance. And ensure that every layer of the platform, zero trust principles, that in assume no implicit trust between the components because end of the day it is, will be supporting with single signon, with multifactor and hazard capacity to utilize this, I mean backend device services. So that they cannot tam the data and and also very strong mechanism to, for the security standpoint. Yeah. Coming to the data privacy one, we talk about the security, right? How we can control, we have those mechanisms of TLS open ID and also the what kind of a thing and okay. Privacy. So we need to have certain data. Okay. Certain persons only can sit and see the certain data within even the organization as well. There could be industry specific requirements. Hey. He can see that XY amount of price, which is the roll of case, should not see that price. So those kind of things also be classified in such a way. But the rules a thing also be embedded. Hey, okay, this particular user is just dedicated to configure the model. The price is not there because it belongs to the engineering department a thing. And also the privacy preserving techniques also. So if we add that control noise to maintain the data and federated the. Learning across distributed data sets right to be forgotten implementation. So at the moment when we've done this diagnosis, there's a right to be forgotten implementation. The data retention and also the deletion policies automatically remove the expired data so that you don't need to locate access according to the regulatory requirements and the business policies. And the platform maintains the detailed audit logs for off all data taxes and modifications. Support compliance and reporting and forensic analysis and coming to the other important gain, the performance optimization, we did, we lightly touch in the previous slide, but coming to the multi-layer caching, so it's nothing but you keep on hitting the same request, it will be cached and it gives the response very quickly and very intelligently with, even though you're trying to make the small parameter change, you get it very quickly. And the other one is database optimization. So it's now. Now we nowadays it's like Postgres SQL instances use that once. Features including the ING pallet query execution, just in time compilation. NoQ databases, including Mongo, d, p, and Cassandra, handle specific workloads and they coming to the queue. Resource utilization, spot instances handling batch processing workloads with automated failover or to on-demand instances when sport capacities available. ML training jobs run during the off peak hours. In which compute the resource are less expensive. I know there are very less resources that the training jobs will be trained and get it done so, and these content delivery network integration accelerate the delivery of the static assets and also the A PR responses to global users, edge location, catch pricing catalogs, and the configuration data, which will reduce the latency for international customers. That is mainly, we are working on the performance optimization and the strategy. Is coming to the case studies, right? What? Okay. We are, we're all good. We have the microservice that we talk about. How can we utilize it in the CPQ, what are the deployment strategies? What are the APIs behind the scenes? What is the performance? Optimization and also scalability. We are talking a lot of good things in the slides. Okay. How can we ensure this all will be work and what are the kind of thing or the results? Okay. We did a high level analysis of this taking into small microservices built in the system and trying to understand the capacity. So 10 time you will request the capacity and increase from 10,000 to over a hundred K monthly request to horizonal scaling and the performance optimization so that even though you try to increase the 10 times, the responsive time is 75%. Respondent is very quick. That is between the 800 milliseconds to under. 200. 200 milliseconds. It signed 65% time to market decrease for new features measured for production. So the newer products evaluation will be very quicker and very good. Building the scalable AI power platform request a care will consideration of architecture, operations, and developer experience for Microsoft. Visual based CPQ platforms demonstrates the enterprise great reliability and AI innovation can coexist when supporting by. Appropriate platform engineering practices all in one go. What I'm trying to tell is everywhere we are using the ai but AI is giving a very positive result and taking that as an advantage kind of a thing, or taking that as a source for that AI platforms. Plugging into the CPQ technologies and trying to evolve, how can we do a configuration pricing co. Very efficient, very responsive, very optimization way, and also very accurate. So Synchron and more driven with the train the models and get the import robust platform engineering that will grow by using this AI to drive the business decisions. That's concluding myself. This is just focusing on that scalability, reliability. And also the greater experience the organization can create. The platforms not only meet the current needs, but also adapt to the future challenges because the landscape is keep on evolving, changing day by in the industry standpoint by utilize, utilize this AI powered MEChA enterprise systems. That's the major history and overall the presentation I was provided to you guys. Hope you guys have enjoyed the presentation. And thank you everyone. Have a great day. Good night. See you. Bye.

Slides

Download slides (PDF)

See all 83 talks at this event!

Conf42 Platform Engineering 2025 - Online

September 04 2025 - premiere 5PM GMT

Building Scalable AI-Powered Platforms: Microservices Architecture for Enterprise CPQ Systems

Video size:

Abstract

Summary

Transcript

Slides

Kishore Epuri

@ OSMANIA UNIVERSITY

Join the community!

Featured event

2026

2025

Info

Conf42 Platform Engineering 2025 - Online

September 04 2025 - premiere 5PM GMT

Building Scalable AI-Powered Platforms: Microservices Architecture for Enterprise CPQ Systems

Video size:

Abstract

Summary

Transcript

Slides

Kishore Epuri

@ OSMANIA UNIVERSITY

Join the community!