Deploying AI Literature Agents at Scale: MLOps Strategies for Biomedical Research Platforms

Video size:

Abstract

From 1M papers per year to instant answers. See how we built production AI agents that revolutionized biomedical research. 90% accuracy, 80% time savings, 100% actionable MLOps blueprints with live demos

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.

Good morning and good afternoon. Thank you Con 42 and the organizers for providing me the amazing opportunity. And let me start with a brief background of the problem that you are going to talk about and how to solve that. So in the biomedical research field, there are more than a million of. Research papers are being published every year, and if you take the citations of those research papers it'll multifold such tsunami of information is out there. When a researcher comes to do a research in that field, they first look for doing the literature review. That's the first step in any literature any research process. When they attempt to do that, they have to go through these millions of records. Probably they can do a quick search in somewhere, but in spite of that, finding the relevant literature is very difficult. First of all, there may be a good literature out there. That may be hidden from the researcher search. That's also not good. There may be processing issues like if they take into these millions of records into account it will take forever to do that literature review. So in sum up I would say the average, the researchers spend like a total of eight, more than 80 percentage of time just for doing this literature review. So this is the clear problem statement. This slide we already discussed about it, and our solution for this problem is intelligent AI literature agents. It contains four, four subdomains, like domain adapted lms. That's LLM is large language model, advanced rag system. It is. Advanced retrieval augmented generation system, precision, NER System. N-E-S-N-E-R is nothing but entity recognition for genes, proteins, and drug directions, robust ML ops infrastructure. Let's go through this agenda. We'll first start, talk about talking about the ML ops architecture. And then we'll move on to biomedical NLP Pipeline. We'll also talk about vector database scaling after that and following followed by monitoring and observability. And finally, we'll close with the continuous experimentation and deployment strategies, a ML Ops architecture overview. So when we talk about ML lops everyone might have cared about this. ML lops. It's a machine learning how to ma operate, operationalize the machine learning pipeline. So it's, ML ops is is a critical part. Like just by developing the models, we cannot achieve everything. We have to implement it and make it production ready. ML ops is a crucial role in any machine learning lifecycle. So effective lops, if you consider that, if you ask me to define an effective lops architecture I would say that few components are necessary. The first one is contentized microservices. It's for modular scalability and version versioning. This model registry. So models keep getting updated, so we need to have it versioned in the model registry, a synchronous processing pipelines for efficient batch citation updates because in research company is very vibrant the citation results and and citation count, those things will vary every day. So we need to get the latest information. To give the correct picture of the literature to the researcher, clear separation of embedding generation from retrieval services real time inference, AI A PA layer, engineered for subsecond responses and dedicated evaluation enrollment to ensure rigorous scientific validation. So when we look at the biomedical NLP pipeline there are a few challenges with the domain adaption. Because biomedical is something which only few people who are really into the bioTE, biotechnology and bio biology and healthcare and those people are interested in, and it has lots of jargons and technical things. And not all of them will work on the can understand this biomedical, vocabularies, even the language and, the researchers out there. So it processes it, it poses sorry. It poses some challenges because the biomedical language it's complex. It's because it's highly specialized vocabulary across sub-disciplines. Entity relationship, recurring domain expertise like gene protein interactions that needs a domain expert to, give good understanding to the researcher. Contextual meaning that changes across research areas need for continual updates as a new discoveries emerge. Traditional NLP five plants failed in, in this domain without specialized adaption techniques. So as a machine learning engineer or somebody who does the NLP on this biomedical data, not everyone can do that. They should have certain knowledge about this domain and the languages and certain level of understanding. Model training and deployment over workflow. So model training, it starts with data ingestion and cleaning and doing the exploratory data analysis. Those are typical steps in any machine learning. And once model is you do all those things, you'll develop the model and, you will have it checked with the test set and validation set, and you'll have a working model. So imagine that it all happens and data ingestion happens. Okay? And then you will go and training the pipeline, you'll run the training, the pipeline and evaluation framework. So we need to have a specialized metrics for biomedical accuracy with the domain expert review. And then we finally deploy the models. Scaling vector databases for 30 plus million citations. So the operational challenges is like efficiently generating a biddings for massive document car and balancing recall against computational cost, managing seamless index updates. These are like issues with the operational issues with the vector. Vector embeddings. And with our ML op solution to encounter that is a synchronous embedding generation pipelines with batch processing, hierarchical indexing strategies and then read replicas, which staged updates, optimist query caching. So these are the techniques to encounter those shortfalls or challenges like I would say. Rag architecture and production implementation. There are lots of key components like domain specific impacting models, multi-stage retrieval, pipeline citation, network enrichment, and context window optimization. And but with respect to performance metrics we have to, evaluate the things like evaluate any, the performance against baseline, I would say to make sure that the performance has indeed has improved like subsequent correl latency 90 percentage accu accuracy in the entity recognition. And citation relevance score is coming up more than 85%. As such, KPAs are really important when evaluating performance. It's not about just doing the things. We also need to make sure that we are doing the right thing and getting the right expected results within the stipulated time. So the final one is, yeah, reduction in the researcher literature searched. Monitoring and observability framework. So once we deploy the model, and the data may change, it's evolving. The model which work efficiently while we implement it, first time may not work over the period. So we need to constantly monitor the models and then do the fine tuning. So model performance tracking, scientific accuracy, validation, and user interaction analysis. They're really important and we have to, once we are done with the work, we cannot be, we cannot say that. Yeah, it's all done and we don't need to revisit. We have to come up. Come back again and we have to have a certain metrics to evaluate every time and make sure that if it's derailing or maybe deviating from the desired expected outcome, then we need to intervene and fine tune the model. Or maybe enhance the ecosystem to, to produce the right output. Real time monitoring dashboard. Our custom monitoring solution provides both ML engineers and scientific stakeholders with visibility into model drift detection, citation accuracy system health and performance metrics, user query patterns and success rates, resource utilization and scaling triggers A and B a RB testing, a performance comparison. Continuous experimentation methodology. We have implemented AB testing framework that balance experimentation with scientific reliability, so it enables targeted HO cohort testing. My research domain measures both technical metrics and research outcomes and provides statistical confidence for biomedical application. Supports multivariate testing across model components. If this approach has yielded 80 percentage reduction in ture search time while maintaining scientific rigor and results, infrastructure, scaling patterns, so we have to have a baseline infrastructure it's core component size of our average load with two times redundancy, elastic scaling layers, autoscaling inference endpoints based upon query volume and scheduling, scheduled scaling events predictive scaling for known search patterns like especially maybe sometime towards academic. Term then if you observe that sorry. If you observe that the demand increases, we may have to plan for the scaling in advance, specialized to compute GPO allocation for embedded embedding generation and inference. Our infrastructure scales efficiently across both batch processing needs and real time query patterns. So this is really important, while making sure that we are not doing less or more, we are doing the right thing. Key lops learnings from biomedical ai. So we have to, yeah, already we saw that we have the, in integrating domain expertise. That's really important. Without the involvement from domain scientists, it's it's difficult to do a right lops without the domain experts help developing scientific validation pipelines. That's also another important, we cannot just be sure that the performance has increased just because the time to search. Has come down. We have to also measure based upon the other scientific variables and KPIs enabling continuous corpus updates. This is really important. Corpus is increasing, so we have to keep up to date for the latest and greatest information, prioritizing explainability. Biomedical research application necess creates significantly higher transparency and inter, relatively interpretability standards compared to consumer AI system. And with that, I'm concluding. And thank you for this opportunity t thanks a lot.

Slides

Download slides (PDF)

See all 37 talks at this event!

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Deploying AI Literature Agents at Scale: MLOps Strategies for Biomedical Research Platforms

Video size:

Abstract

Summary

Transcript

Slides

Nishanth Joseph Paulraj

Senior Data Engineer @ Thermo Fisher Scientific

Join the community!

Featured event

2026

2025

Info

Conf42 MLOps 2025 - Online

September 18 2025 - premiere 5PM GMT

Deploying AI Literature Agents at Scale: MLOps Strategies for Biomedical Research Platforms

Video size:

Abstract

Summary

Transcript

Slides

Nishanth Joseph Paulraj

Senior Data Engineer @ Thermo Fisher Scientific

Join the community!