Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
I'm Arun Subramanian and I'm honored to be speaking here at Con 42 Golan 2025.
The title of today's stock is Data Mesh Unleashed, the Scalable Architecture
for a Driven Enterprise Intelligence.
We are going to explore how data mesh reshapes traditional thinking about data
platforms, especially in the context of AI and rapidly scaling organizations.
This isn't just a technical transformation.
And it's an organizational one, and it's redefining how teams deliver trusted,
scalable, and usable data products.
Let's start with the basics.
What exactly is data mesh?
In short, it's a decentralized approach to data architecture.
Instead of managing all data in one central warehouse or lake data,
miss distributes responsibility to the teams that generate
and use the data, the domains.
This means product marketing, sales, or finance streams can
own, manage, and serve their data directly using Shad platform tools.
Why do this?
Because centralized models don't scale.
They create bottlenecks, long lead times, and poor alignment between
data producers and consumers.
Data mesh flips the model.
Empowering domain teams.
While still maintaining interoperability through shared standards and governance.
Now ness is based on four core principles.
Each of these addresses a key limitation of traditional and centralized data
systems, domain rated data ownership.
This principle.
Place a responsibility for data in the hands of the teams
that generate and use it.
For example, the payments team in a FinTech company
owns all transactional data.
They know the context like settlement delays, transaction reversals,
and fraud patterns for better than a centralized data team.
What this ownership means.
Faster decisions, more relevant data models, and quicker itration
self-serve data infrastructure.
Just like DevOps built internal platforms to help engineers ship code faster.
Be built data platforms that abstract complexity, ingestions
storage, transformation, cataloging, and monitoring.
Consider.
A marketing team that wants to expose campaign performance.
Instead of submitting a ticket to a data engineering, they use internal
tooling to publish a data product by a declarative config or pipeline ui.
All self-serve federated governance.
This principle ensures standardization without centralization every domain,
others to a common metadata model.
Lineage tracking, access control and regulatory compliance, but enforcement
happens automatically through tooling.
Think of GitHubs for data.
Policies, live in code and deviations, trigger alerts or block deployment.
Data is a product.
Teams don't just expose raw data.
They deliver curated, documented.
Version and reliable data products.
Let's say the sales team publishes a monthly sales performance dataset.
It should have metadata, freshness indicators, contact info for
the data steward and alerts.
If the pipeline fails, it's treated like a production grade a PA.
Now let's talk about how data warehouse is different from data mesh.
Traditional data varis offer a single source of truth, structure,
centralized and controlled.
But let's look at a real world challenge.
A healthcare company wants to add real time lab test results
into their B dashboards.
In a varis model, this means coordination between data engineers,
analyst, and business teams.
Sometimes taking weeks schema changes can impact dozens of downstream jobs.
In contrast in a data mesh model, the clinical lab domain wants the
real time data stream, the expose curated version data product that
other domains like pash, patient care or billing can consume immediately.
The sales or analytics teams don't have to wait.
They use the product via defined APIs or SQL interfaces.
This removes bottlenecks and enables parallel innovation.
While a warehouse scales compute you mess scales, ownership and speed.
Now compare data lakes with data mesh.
Data lakes.
Were a step forward from rigid warehouses.
They let us to store raw, semi-structured and unstructured data cheaply.
Great for ML use cases and future proofing, but the
flexibility came at a cost.
Ownership was unclear.
Documentation was often missing, and it became hard to find or trust anything.
A common example, a data scientist wants customer feedback, data
force sentiment analysis.
In the lake, there are five data sets named customer reviews.
Feedback, dump recommends version two, et cetera.
No documentation, no guarantees.
Result days lost.
Exploring under label data.
Data me fixes this by assigning clear ownership.
For example, the customer experience domain wants all feedback data.
Their data product is named documented ENT and searchable.
Via metadata catalog, consumers can explore lineage, understand
transformations, and raise issues to the warning team.
So while a lake store's data mis prioritizes it, adding
quality, traceability, and trust.
Let's talk about key benefits of data mesh.
So why are teams making the shift?
Let's look at tangible benefits, grounded in examples.
Scalability, I. At a large retail company, the centralized data team had
a backlog of over 150 feature request.
With the data mesh, the teams like logistics, inventory,
and promotions now publish and manage their own data products.
The backlog shrinks and velocity increases, not because we hired more
engineers, but because we distribute our responsibility, reduced it dependency.
Your global telecom company reported a 70% drop in ticket volume to
the central data engineering team.
After adopting self-serve infrastructure, product teams manage
build their pipelines using internal platforms, reducing handoffs and
accelerating delivery, a faster decision making in logistic startup.
The operational teams own shipments tracking data.
Because it's in their control, they can expose near real-time
metrics delays, EDA variance without going through central teams.
This leads to instant feedback loops, daily optimization,
and reduce delivery time.
And ML teams often spend 60 to 80% of their time wrangling data.
Data mesh reduces this by delivering ready to use reliable.
Well-documented data products.
For example, a fraud detection model can plug into a domain own stream of payment
events with consistent schemas rather than scrapping logs or waiting for ET jobs.
Now, enabling technologies for database, so let's, implementing a database at scale
requires a robust technical foundation.
Here are some of the key enabling technologies.
Apache Iceberg.
An OpenTable format that brings AIP compliance, schema evolution,
time travel to big data.
For example, if the product catalog domain updates, a product name
iceberg ensures consumers can time travel and compare changes supporting
reproducibility in ML models.
Data Lake tightly integrated with spark ecosystem.
Delta Lake provides transaction integrity over data lakes, for instance.
Your finance team using Delta can perform streaming and batch processing
over the same table while enforcing schema consistency, snowflake with
separation of compute and storage and its powerful data sharing capabilities.
Snowflakes allows domain teams to share data product across
departments and regions securely.
A global marketing team can publish campaign data once and
teams in other countries can query it without duplication.
AWS S3 offer.
The foundation layer S3 is used for durable, scalable object storage.
It works with iceberg or delta to serve the as the backend for domain owned
data products with versioning and IEM based access, it enforce governance
policies at the object levels.
And some of the tools like Airflow, Dexter, and Prefect Orchestrate
pipelines while Data Hub Open Metadata and Amon provide metadata
management, lineage and discoverability.
For example, a data scientist can look up a data product, see its owner's freshness,
lineage, and press it before using it.
Now let's talk about the real world implementation.
With examples four.
Zando as one of the Europe's largest e-commerce platforms, zando embraced data
mesh to decentralized data ownership.
Each domain team now wants its data products and publishes them with
standardized metadata and SLAs.
Your product manager in logistics can access inventory productions without
having to wait for a central team.
JP Morgan Chase.
In highly regulated environments like finance, federated governance is critical.
JP Morgan used policy as code and lineage enforcement to decentralized
while maintaining compliance.
The risk analysis team can safely build models on credit transaction data with
trust in its prominence and control Netflix while not branding a data mesh.
Netflix follows the principles closely.
Teams publish data through APIs and internal tooling, and other teams can
subscribe, transform, and build insights.
Their internal data marketplace allows personalization and content teams to
shared behavioral insights in real time.
Now, let's talk about, the business impact the data mesh
creates to the organizations.
40% increase in team autonomy at a large online marketplace.
Domain teams build and deploy their own products, improving responsiveness
to change, 35% improvement in data quality at a FinTech startup.
Domain ownership led to faster road cost analysis and issue resolution.
50% increase in collaboration.
Cross-functional teams at a healthcare provider now co-design
data contracts, ensuring shared understanding, 60% reduction in
time to insight in a SaaS company.
The product analytics team ready use dashboard refresh latency
from hearts to minutes by owning their own event pipeline.
These are just metrics they represent.
Faster feedback loops, better trust, and tighter integrations.
Between data producers and consumers.
Now let's talk about some of the challenges in implementing data mesh.
While data mesh is powerful, it comes with challenges like cultural change.
Not every team is ready to own data.
Teams must shift from data is someone else jobs too.
We won and serve this, for example, HRT Morning.
Their recruiting funnel data may need upskilling to support
schema changes and documentation.
Platform maturity.
Without strong tooling, the burden on domain increases.
A company that launched mesh too early without governance templates
saw schema drift and broken pipelines.
Security and compliance, distributed ownership expands the attack surface.
Role-based access.
Audit logging, PII, tagging must be automated.
For example, a healthcare organization must enforce HIPAA policies even
as domain teams manage patient's data in clear communication,
training and incentives are needed.
Teams need to know the why and how of the new model, and be
given the tools to succeed.
Now, let's talk about some of the best practices for adopting data mass.
Organizational readiness assessment.
You evaluate whether your team align naturally by domain and if they have the
skills and leadership support to own data.
And start with the pilot domain.
Pick a domain with a clear value case and motivated team.
Example, your marketing team that wants to optimize campaign spend
using real time engagement data build data play platform as a product.
Create golden parts for ingestion.
Documentation, validation and publishing include observability
and alerting by default, and also education and enablement.
Offer training and product thinking, data quality, security, and platform tools.
Celebrate wins.
For example, your domain team, saving time through automation
and keep it governance in check.
Start with minimal viable policies and evolve based on feedback.
Use tooling to enforce standardized without creating manual overhead.
Now let's talk about the future of enterprise data management.
So we know the future of enterprise data is gonna be decentralized,
governed, and real time with data mesh.
We are not just improving performance, we are rethinking the.
The data flows in an organization as a, and AI and real time personalization
becoming standard, an organization needs to scale not just storage, but
trust, discoverability and velocity.
Imagine an enterprise where product teams publish features and marketing
launches, campaigns and fraud systems, train new models, all powered with
the same interconnected, trusted.
Mesh of data products, that's where we are headed, and data mesh provides the roadmap
and thank you.