Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
I'm Man Mohana.
Today I'm presenting how organizations solve iot seal challenges using
Snowflake plus a structured feature store architecture to reduce cost while
improving reliability, consistency, ML performance and development agility.
This approach comes from real developments, manufacturing, utilities.
Energy grids and smart monitoring systems, we'll break down what problems
exist, where pipelines fail, how feature stores solve them, and how snowflake
architecture brings performance and cost control that traditional systems cannot.
Data explosion.
The scale of IOG today is unprecedented.
Many organizations think of IOT as just a sensor, but each sensor
represents continuous telemetry streams, manufacturing lines,
push vibration, temperature talk, error logs, and many more.
Millions of readings per hour.
Cities run environmental networks, power rates, traffic monitors, and more.
Vehicle freights push engine health speed GPS, weather, all streamed continuously.
Traditional infrastructure was never built for this, so the
challenge is in chest storing data.
It's sustaining throughput while maintaining contextual meaningful amount.
Three problems always show up.
Number one, number of sensors, readings per r explodes, and
pipelines must continuously consume them without pausing or buffering.
Number two, these streams are continuous, so nothing ever stops.
Ingestion must be always on fault, tolerant, and incremental.
Number three, heterogeneity.
Different sample rates, precision formats.
When smart meters report every few seconds and vehicles report irregular
burst, you get temporal misalignment and that breaks simple SQ analytics.
You need infrastructure that can ingest, clean, normalize.
Model and serve at streaming velocity.
IOT data is meaningless without historical context.
A vibration right now tells nothing unless compared to last hour or last week.
Our sessional patterns.
So ML features must include rolling window metrics, lag signals, trend
acceleration, sessional behavior.
This is where ad hoc scripts fall apart.
Doing consistent temporal processing across thousands of sensors or across
dozens of models is impossible.
Without structured repeatable engineering, so the feature store becomes not
a luxury, but a survival strategy.
In many real deployments, over 70% of ML accuracy actually
comes from temporal context.
Now there are feeding.
Iot data quality issues are unavoidable like now, values, network gaps,
collaboration drifts mainly noise.
The danger is not bad data.
It's inconsistent handling across tools, teams, and cetera.
Data engineers handle ingestion one way.
Scientists write manual transformations.
Operations team build bi aggregations.
Result of all of it is same metrics.
Computer three different ways.
Models break when teams update independently using
predictions become impossible.
This fragmentation.
Escalates cost and destroys trust.
So what we do, we centralize transformations into a
uniform feature store.
In real world pipelines, I've consistently seen three to six copies of the same
logic written by different teams because there is no centralized feature stored.
Feature store concept.
A feature store fixes that fragmentation, centralized repositories of canonical
features, reusable definitions across multiple ML models in practice.
Feature reuse cuts engineering effort by 60, 80%.
' cause once the logic exists, it automatically benefits every model
versioning to guarantee reproducibility.
Optimized serving for low latency inference.
Think about sixteens building six pipelines.
To achieve something,
but to create one pipeline for six teams.
Snowflake architecture gives us a unique advantage for iot.
Number one, variant enables ingestion of semi-structured json.
From sensors without schema, gymnastics
time, travel means historical reproducibility.
If data changed, models remain reproducible.
Streams plus materialized views lets us do incremental change capture,
meaning we only process new sensor data.
Not terabytes of history.
Incremental updates routinely reduce compute load by 40 60% because you
stop reprocessing historical heta.
That hasn't changed.
In short, snowflake trades, brute force processing for
intelligent incremental work.
And that's the core cost reducer.
This architecture works at any scale because it's modular,
raw layer preserves, truth timestamps, device metadata, transform layer.
Performs all feature logic, windowing smoothing health metrics.
Serving layer is de-normalized for direct fast lookup by ML systems.
This eliminates Ty Pipelines feature becomes.
Software artifacts, not scattered scripts.
This layered approach is how teams safely expand to hundreds or hundreds
of thousands of sensors without rewriting pipelines every day.
DBT turns analytics into maintainable software.
Explicit dependency graphs, reusable macros for rolling statistics or anomaly
scores, unit prevent data drifts.
I've seen automated TVT tests.
Catching a huge amount of bad upstream sensor data long before
it pollutes dashboards or models,
version control and code review.
This enforces discipline.
Every feature is testable, traceable, reproducible, and documented.
That's how IOD ML can scale beyond pipelines to team infrastructure.
A skill handles 80% of transformations, formations for the
specialized 20% spectral analysis.
Signal decomposition, anomaly scoring.
Snowflake executes Python inside compute.
This is critical because no radar copied out of Snowflake.
Governance stays intact.
Libraries like Pandas SciPi.
Psyche learn available at scale.
This gives us hybrid power, SQL speed width, python expressiveness.
Isn't it great?
This hybrid approach keeps AQ fast for bulk work while leveraging Python for.
About 20% of the cases that requires signal processing, sophistication.
Cost optimization, we deliberately minimize Cost
compute is matched per model.
Tiny warehouse for simple aggressions, medium for white joints.
Large only for parallel tasks.
Matching workloads to warehouse size commonly yields 50% plus compute
saving in production storage is tiered granular for recent data down.
Sample for older.
Cold archive for compliance.
Caching ensures repeated model lookups.
Cause zero compute caching can eliminate nearly all costs for repeated lookups,
especially for inference windows.
Cost governance becomes architectural, not accidental
performance patterns and trade offs.
Support multiple access patterns.
Batch mode, perfect for ML training and mass scoring.
Point lookups.
Acceptable latency, but benefits from caching, decentralized tables.
Intentionally trade a bit of storage to enable blazing fast inference.
In many deployments, de normalizing improves serving latency by two to five x
in IOD ml latency.
Predictability is more valuable than pure speed.
Production operations and governance.
Production means governance monitor data quality drift pipeline.
Health monitoring feature drift is critical.
I've seen unnoted drift degraded model accuracy by 20 to 30% within a month.
Access control.
Via RBAC plus row security, multiple feature versions active
simultaneously cost tagging per model team to eliminate budget surprises.
This creates a robust operational backbone supporting multiple IO OT workloads.
Safely.
Real deployments.
For example, in manufacturing, unified features improve accuracy and route
engineering efforts significantly.
Smart buildings data, qualit tests, cart failures.
Before dashboards are ML models.
Pull your decisions.
Usually meters, snowflakes, columnary, design plus clustering
handled millions of customers at lower storage and lower latency.
Bottom line feature stores plus snowflake.
Predictable cost and reliable insight.
Key takeaways start with access patterns.
Architecture follows usage quality first.
Mindset saves millions.
Later.
Modular transformations reduce risk and onboarding pain.
Treat features as engineered assets.
Tester version governed.
This is not a one-off pipeline, it's a platform strategy.
Treating feature as engineered assert with test lineage and
reuse is what unlocks these gains.
Typically 50 to 70% less engineering effort and noticeably.
Higher ML reliability.
So the key idea is consistency delivers both cost savings and accuracy.
This approach scales whether you're handling thousands
or millions of IOD signals.
Thank you for listening.
I data doesn't have to be chaotic or expensive.
Structured feature engineering on snowflake drives predictability,
scalability, and trust.
Happy to dive deeper or answer any questions.
Please drop your questions in the forum and I can get back to you.
Thank you.