Conf42 Machine Learning 2025 - Online

- premiere 5PM GMT

Data Quality and Validation in ML Pipelines Great Expectations, Deequ, and TensorFlow Data Validation

Abstract

In machine learning, data quality isn’t just a nice-to-have—it’s make or break. Bad data can silently derail your models, leading to poor predictions, wasted resources, and lost trust. In this talk, we’ll explore how to bring data validation into your ML pipelines using three powerful open-source tools: Great Expectations, Deequ, and TensorFlow Data Validation. We’ll look at how each tool helps catch issues like missing values, schema drift, and unexpected data distributions before they become bigger problems. You’ll see how they work, where they shine, and how to choose the right one for your workflow—whether you’re building batch pipelines, streaming systems, or end-to-end ML platforms. If you care about building reliable, production-ready ML systems, this session will give you the practical tools to keep your data in check.

...

Sunil Kumar Mudusu

Lead AI Engineer/Data Engineer @ Church Mutual Insurance Company

Sunil Kumar Mudusu's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)