Transcript
This transcript was autogenerated. To make changes, submit a PR.
Welcome to today's talk about building real-time feature
pipelines and JavaScript's role in modern data-driven applications.
In the next few minutes, I'll walk through how JavaScript can power the entire
feature lifecycle for ingesting events, computing features, caching them for
sub millisecond rates, and running ML inference at the edge and in the browser.
Coming to the evolution of web applications, we've moved from batch
driven server rendered pages to dynamic personalized applications.
Modern users expect real-time updates and context aware UI that shift requires
features, computed now not overnight, and delivered with predictable latency.
In traditional architecture, as you can see, we've had server rendered
pages, batch data processing, and the feature updates were
consistently and delayed by a lot.
So in modern data driven applications, the users are expecting more realtime
feature computations and instant model inference and hyper-personalized content,
which is modeled according to their needs, their activity, and their preferences.
To the feature pipeline challenge in this, the real time pipelines face.
Three hard problems.
Past feature computation, fresh, yet efficient caching and consistent
logic across node servers, edge workers, and browsers.
In feature computation we have steps like processing raw data into useful
features that to at scale with as less data garbage as possible and maintaining
low latency and high throughput.
Within caching, we want to balance freshness with performance.
We want to have as close to real time data as possible, but at the same time,
inter with intelligent storage and retrieval mechanisms, we need to make
sure that the performance stays intact.
If we miss any one of these, we get stale experiences, training, serving SKU and
jittery performances which eventually results in cross environment consistency.
Now coming to why we need JavaScript for feature pipelines,
JavaScript is universal.
The same modules run on node at the edge and in browser it has
rich ecosystem for streams, caches, inference, and syco aate maps cleanly.
So that map cleanly, that maps cleanly to event driven workloads.
So when we come to native ing support, it's about built in JavaScript has
built in ing and evade patterns and event prevent architecture.
That means that, it's readily suited for realtime ingestions or realtime data
streams, and there is a non characteristic of non-blocking IO operations, which
is essential for low latency pipelines.
Coming to the ecosystem JavaScript leverages mature libraries for data
processing, streaming caching, also from Apache Kafka clients to TensorFlow js.
The JavaScript ecosystem provides production ready
tools for every pipeline stage.
So when it comes to as I said previously, when it comes to streaming
data with Kafka, Evenflow looks like this, it, it starts with an
ingestion and then to transform and then to publish With Kafka in Node JI.
We consume user events, compute rolling counts or session stats, and publish
features to red our downstream topics.
We keep transformer, we keep transforms.
Item, pot and window, so retrace and back pressure, don't corrupt results.
Here's an example and you can see in the slide which is about consuming new events
or creating events for downstream topics.
And use this Kafka, we try to build a real time feature serving architecture,
which contains, which essentially contains four building blocks.
One is a future competition service in node, where Node just
microservices process the incoming data streams and compute features
using the con corresponding business logic and corresponding statistical
transformations, which we build as data engineers or data scientists
based on the needs of the company.
Two Redis feature store with sensible tls.
We where with the register, we, the computed features are cash in red
with a configurable time to live values enabling sub millisecond
retrieval for online inferences.
And coming to three, it's about the importance of web sockets, which
push updates instead of polling.
And fourth is the edge or browser in.
So the UI reacts instantly because that's the end goal of interacting and app.
The where the application interacts with the user.
And that browser applications used cached features with TensorFlow JS
or O-N-N-X-J js, and any similar libraries to run ML models locally,
eliminating the server round trips.
So when it comes to.
Let, when it comes to implementing efficient caching, caching strategies
within Redis or similar tool like a pub sub server we use multilayer caching
by access pattern, browser memory.
For OnPage hot features, Redis for shared online reads and edge cache
for global reach, we balance freshness and cost with TTLs plus pops up or
web sockets to refresh just in time.
As you can see in the diagram, we have a browser memory cache, and between
the CDN and the actual browser, we have a distributed cache, which is
distributed across regions and is readily available for users based on
their needs, based on the activity and based on the traffic through, from and
through the server for that website.
Because some websites or some applications do have like peak
times, off peak, et cetera.
So when it comes to adhering to client side ml inference with JavaScript we have
today browsers and edge runtime can run realtime ml. We have lot of browsers.
Like recently, Chad GPT released a browser called Atlas that was completely
built on top of charge GPT and running AI models within the system locally.
So we want to load a lightweight model, read the cached features.
Call predict and render.
So what happens with this is it reduces latency, it protects
privacy and lower serving costs.
Here we have an example of using Tanza j TensorFlow gs, which actually runs a model
locally on the system, on the computer.
And then it it performs feature engineering, extract the features
and does a prediction based on the user activity or user usage patterns.
So when it comes to ensuring server client logic, consistency we, one
of the biggest challenges in feature pipelines is maintaining identical
behavior across different environments.
And removing a, like we want to remove inconsistent feature competition, which
leads to training, serving skew, and the model performance degrades despite.
Like already we are deploying a lightweight model onto the system.
So we want to keep the data as pristine or as clean as possible so that we
avoid unpredictable user experiences.
So for, and we want to ship isomorphic modules for the feature logic.
We test with same fixtures in node and the browser, and we locks schemas with
TypeScript types or J schema and monitor distribution to catch the drift early.
So these are the four steps which we perform to make sure that we want
to en to ensure the server client logic is performing consistently
across different environments.
And once we do this performance tuning, we want to make sure that
we monitor the perfor performance.
So for that, we use a different me like important metrics like track.
Where we track throughput, error rates, resource usage, feature freshness, and
end-to-end latency from ingestion to ui.
We use these signals to tune window sizes, cash details, model
size, and fan out strategies.
I. So some of the key metrics to track over here are in within the latency
distributions which is very important.
We have the P 50, P 95 and P 99 latency.
We which for like feature competition retrieval and inference operations.
We watch these latencies and also other attributes like features per second, cash
hit ratio, web socket delivery success and skew between training and serving
These metrics will actually help us to understand where to optimize first.
So in some cases we might need to increase the resource utilization as I said,
during peak on, off peak times, or in some cases we need to scale it down to
save some costs, or in some cases we might need to understand or tweak the feature
engineering so that we can create fresh features rather than using stale features.
So when it comes to practical application patterns we generally think about
three common wins recommendations that update with every interaction which are
highly used in recommendation engines, be it streaming app or any e-commerce
applications, et cetera, where we build personalized product or content
recommendations based on user interaction, item similarity, computations, and and
different AB testing, Strat strategy.
And there's also a thing of personalized UI and predictive prefetching.
What that does is it does dynamic content filtering, and it does ui
contextual UI adaptations and based on the user's next step or the
prediction, it does the local model do.
It fetches it fetches the data or it fetches what the
user might need in future.
And does adaptive notification timing.
We also have live analytics with client side aggregations, which
are streamed over web sockets for us to monitor in the backend.
Some of the architectural best practices to achieve these goals
are decomp, compute, decouple compute from serving so you can
scale the end version independently.
Designed for failure with fallback values and grateful TTLs version.
Features like APIs to roll forward are back or roll back safely.
So to achieve this, when it comes to implementation roadmap, we start simple.
We ship offline features with the cache, and then slowly start adding
real time computation with Kafka.
And then we start by introducing client side inference.
With small models, which can be installed locally or which
can understand user patterns.
And then we scale based on observed metrics which is completely data
driven, D data bag and not assumptions.
Coming to some of the key takeaways when it comes to building this
end-to-end pipelines using JavaScript.
JavaScript is basically a one language across the stack that reduces friction
because it can be used on the UI side, on the client side, on the server side,
so it can be used across the entire stack so that, it reduces friction within the
engineers and also the makes the code readable, makes the maintenance easier.
And streaming plus caching delivers fresh, low latency features.
Client said ML is production ready and privacy friendly.
Together, these enable hyper-personalized experiences at scale.
Thank you so much.
I'm Chen Shekar.
Shari Kuri again, and I'm happy to connect and discuss more.
Thank you.