Conf42 Golang 2025 - Online

- premiere 5PM GMT

Data Mesh Unleashed: Scalable Architectures for AI-Driven Enterprise Intelligence

Video size:

Abstract

Unlock the future of ERP testing with Golang, AI, and cloud-native solutions! Learn how test orchestration can slash execution times by 70%, boost test accuracy by 90%, and deliver results 74% faster. Discover how to scale testing, optimize resources, and achieve unmatched efficiency in ERP systems

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. I'm Arun Subramanian and I'm honored to be speaking here at Con 42 Golan 2025. The title of today's stock is Data Mesh Unleashed, the Scalable Architecture for a Driven Enterprise Intelligence. We are going to explore how data mesh reshapes traditional thinking about data platforms, especially in the context of AI and rapidly scaling organizations. This isn't just a technical transformation. And it's an organizational one, and it's redefining how teams deliver trusted, scalable, and usable data products. Let's start with the basics. What exactly is data mesh? In short, it's a decentralized approach to data architecture. Instead of managing all data in one central warehouse or lake data, miss distributes responsibility to the teams that generate and use the data, the domains. This means product marketing, sales, or finance streams can own, manage, and serve their data directly using Shad platform tools. Why do this? Because centralized models don't scale. They create bottlenecks, long lead times, and poor alignment between data producers and consumers. Data mesh flips the model. Empowering domain teams. While still maintaining interoperability through shared standards and governance. Now ness is based on four core principles. Each of these addresses a key limitation of traditional and centralized data systems, domain rated data ownership. This principle. Place a responsibility for data in the hands of the teams that generate and use it. For example, the payments team in a FinTech company owns all transactional data. They know the context like settlement delays, transaction reversals, and fraud patterns for better than a centralized data team. What this ownership means. Faster decisions, more relevant data models, and quicker itration self-serve data infrastructure. Just like DevOps built internal platforms to help engineers ship code faster. Be built data platforms that abstract complexity, ingestions storage, transformation, cataloging, and monitoring. Consider. A marketing team that wants to expose campaign performance. Instead of submitting a ticket to a data engineering, they use internal tooling to publish a data product by a declarative config or pipeline ui. All self-serve federated governance. This principle ensures standardization without centralization every domain, others to a common metadata model. Lineage tracking, access control and regulatory compliance, but enforcement happens automatically through tooling. Think of GitHubs for data. Policies, live in code and deviations, trigger alerts or block deployment. Data is a product. Teams don't just expose raw data. They deliver curated, documented. Version and reliable data products. Let's say the sales team publishes a monthly sales performance dataset. It should have metadata, freshness indicators, contact info for the data steward and alerts. If the pipeline fails, it's treated like a production grade a PA. Now let's talk about how data warehouse is different from data mesh. Traditional data varis offer a single source of truth, structure, centralized and controlled. But let's look at a real world challenge. A healthcare company wants to add real time lab test results into their B dashboards. In a varis model, this means coordination between data engineers, analyst, and business teams. Sometimes taking weeks schema changes can impact dozens of downstream jobs. In contrast in a data mesh model, the clinical lab domain wants the real time data stream, the expose curated version data product that other domains like pash, patient care or billing can consume immediately. The sales or analytics teams don't have to wait. They use the product via defined APIs or SQL interfaces. This removes bottlenecks and enables parallel innovation. While a warehouse scales compute you mess scales, ownership and speed. Now compare data lakes with data mesh. Data lakes. Were a step forward from rigid warehouses. They let us to store raw, semi-structured and unstructured data cheaply. Great for ML use cases and future proofing, but the flexibility came at a cost. Ownership was unclear. Documentation was often missing, and it became hard to find or trust anything. A common example, a data scientist wants customer feedback, data force sentiment analysis. In the lake, there are five data sets named customer reviews. Feedback, dump recommends version two, et cetera. No documentation, no guarantees. Result days lost. Exploring under label data. Data me fixes this by assigning clear ownership. For example, the customer experience domain wants all feedback data. Their data product is named documented ENT and searchable. Via metadata catalog, consumers can explore lineage, understand transformations, and raise issues to the warning team. So while a lake store's data mis prioritizes it, adding quality, traceability, and trust. Let's talk about key benefits of data mesh. So why are teams making the shift? Let's look at tangible benefits, grounded in examples. Scalability, I. At a large retail company, the centralized data team had a backlog of over 150 feature request. With the data mesh, the teams like logistics, inventory, and promotions now publish and manage their own data products. The backlog shrinks and velocity increases, not because we hired more engineers, but because we distribute our responsibility, reduced it dependency. Your global telecom company reported a 70% drop in ticket volume to the central data engineering team. After adopting self-serve infrastructure, product teams manage build their pipelines using internal platforms, reducing handoffs and accelerating delivery, a faster decision making in logistic startup. The operational teams own shipments tracking data. Because it's in their control, they can expose near real-time metrics delays, EDA variance without going through central teams. This leads to instant feedback loops, daily optimization, and reduce delivery time. And ML teams often spend 60 to 80% of their time wrangling data. Data mesh reduces this by delivering ready to use reliable. Well-documented data products. For example, a fraud detection model can plug into a domain own stream of payment events with consistent schemas rather than scrapping logs or waiting for ET jobs. Now, enabling technologies for database, so let's, implementing a database at scale requires a robust technical foundation. Here are some of the key enabling technologies. Apache Iceberg. An OpenTable format that brings AIP compliance, schema evolution, time travel to big data. For example, if the product catalog domain updates, a product name iceberg ensures consumers can time travel and compare changes supporting reproducibility in ML models. Data Lake tightly integrated with spark ecosystem. Delta Lake provides transaction integrity over data lakes, for instance. Your finance team using Delta can perform streaming and batch processing over the same table while enforcing schema consistency, snowflake with separation of compute and storage and its powerful data sharing capabilities. Snowflakes allows domain teams to share data product across departments and regions securely. A global marketing team can publish campaign data once and teams in other countries can query it without duplication. AWS S3 offer. The foundation layer S3 is used for durable, scalable object storage. It works with iceberg or delta to serve the as the backend for domain owned data products with versioning and IEM based access, it enforce governance policies at the object levels. And some of the tools like Airflow, Dexter, and Prefect Orchestrate pipelines while Data Hub Open Metadata and Amon provide metadata management, lineage and discoverability. For example, a data scientist can look up a data product, see its owner's freshness, lineage, and press it before using it. Now let's talk about the real world implementation. With examples four. Zando as one of the Europe's largest e-commerce platforms, zando embraced data mesh to decentralized data ownership. Each domain team now wants its data products and publishes them with standardized metadata and SLAs. Your product manager in logistics can access inventory productions without having to wait for a central team. JP Morgan Chase. In highly regulated environments like finance, federated governance is critical. JP Morgan used policy as code and lineage enforcement to decentralized while maintaining compliance. The risk analysis team can safely build models on credit transaction data with trust in its prominence and control Netflix while not branding a data mesh. Netflix follows the principles closely. Teams publish data through APIs and internal tooling, and other teams can subscribe, transform, and build insights. Their internal data marketplace allows personalization and content teams to shared behavioral insights in real time. Now, let's talk about, the business impact the data mesh creates to the organizations. 40% increase in team autonomy at a large online marketplace. Domain teams build and deploy their own products, improving responsiveness to change, 35% improvement in data quality at a FinTech startup. Domain ownership led to faster road cost analysis and issue resolution. 50% increase in collaboration. Cross-functional teams at a healthcare provider now co-design data contracts, ensuring shared understanding, 60% reduction in time to insight in a SaaS company. The product analytics team ready use dashboard refresh latency from hearts to minutes by owning their own event pipeline. These are just metrics they represent. Faster feedback loops, better trust, and tighter integrations. Between data producers and consumers. Now let's talk about some of the challenges in implementing data mesh. While data mesh is powerful, it comes with challenges like cultural change. Not every team is ready to own data. Teams must shift from data is someone else jobs too. We won and serve this, for example, HRT Morning. Their recruiting funnel data may need upskilling to support schema changes and documentation. Platform maturity. Without strong tooling, the burden on domain increases. A company that launched mesh too early without governance templates saw schema drift and broken pipelines. Security and compliance, distributed ownership expands the attack surface. Role-based access. Audit logging, PII, tagging must be automated. For example, a healthcare organization must enforce HIPAA policies even as domain teams manage patient's data in clear communication, training and incentives are needed. Teams need to know the why and how of the new model, and be given the tools to succeed. Now, let's talk about some of the best practices for adopting data mass. Organizational readiness assessment. You evaluate whether your team align naturally by domain and if they have the skills and leadership support to own data. And start with the pilot domain. Pick a domain with a clear value case and motivated team. Example, your marketing team that wants to optimize campaign spend using real time engagement data build data play platform as a product. Create golden parts for ingestion. Documentation, validation and publishing include observability and alerting by default, and also education and enablement. Offer training and product thinking, data quality, security, and platform tools. Celebrate wins. For example, your domain team, saving time through automation and keep it governance in check. Start with minimal viable policies and evolve based on feedback. Use tooling to enforce standardized without creating manual overhead. Now let's talk about the future of enterprise data management. So we know the future of enterprise data is gonna be decentralized, governed, and real time with data mesh. We are not just improving performance, we are rethinking the. The data flows in an organization as a, and AI and real time personalization becoming standard, an organization needs to scale not just storage, but trust, discoverability and velocity. Imagine an enterprise where product teams publish features and marketing launches, campaigns and fraud systems, train new models, all powered with the same interconnected, trusted. Mesh of data products, that's where we are headed, and data mesh provides the roadmap and thank you.
...

Arun Vivek Supramanian

Senior Data Engineer @ Amazon



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)