Conf42 JavaScript 2025 - Online

- premiere 5PM GMT

Building Real-Time Feature Pipelines: JavaScript's Role in Modern Data-Driven Applications

Video size:

Abstract

Unlock JavaScript’s AI superpowers! Build intelligent web apps that adapt in real-time. Learn to create scalable feature pipelines, integrate ML inference, and deliver personalized experiences. From Node.js to browser AI, transform how users interact with your applications

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Welcome to today's talk about building real-time feature pipelines and JavaScript's role in modern data-driven applications. In the next few minutes, I'll walk through how JavaScript can power the entire feature lifecycle for ingesting events, computing features, caching them for sub millisecond rates, and running ML inference at the edge and in the browser. Coming to the evolution of web applications, we've moved from batch driven server rendered pages to dynamic personalized applications. Modern users expect real-time updates and context aware UI that shift requires features, computed now not overnight, and delivered with predictable latency. In traditional architecture, as you can see, we've had server rendered pages, batch data processing, and the feature updates were consistently and delayed by a lot. So in modern data driven applications, the users are expecting more realtime feature computations and instant model inference and hyper-personalized content, which is modeled according to their needs, their activity, and their preferences. To the feature pipeline challenge in this, the real time pipelines face. Three hard problems. Past feature computation, fresh, yet efficient caching and consistent logic across node servers, edge workers, and browsers. In feature computation we have steps like processing raw data into useful features that to at scale with as less data garbage as possible and maintaining low latency and high throughput. Within caching, we want to balance freshness with performance. We want to have as close to real time data as possible, but at the same time, inter with intelligent storage and retrieval mechanisms, we need to make sure that the performance stays intact. If we miss any one of these, we get stale experiences, training, serving SKU and jittery performances which eventually results in cross environment consistency. Now coming to why we need JavaScript for feature pipelines, JavaScript is universal. The same modules run on node at the edge and in browser it has rich ecosystem for streams, caches, inference, and syco aate maps cleanly. So that map cleanly, that maps cleanly to event driven workloads. So when we come to native ing support, it's about built in JavaScript has built in ing and evade patterns and event prevent architecture. That means that, it's readily suited for realtime ingestions or realtime data streams, and there is a non characteristic of non-blocking IO operations, which is essential for low latency pipelines. Coming to the ecosystem JavaScript leverages mature libraries for data processing, streaming caching, also from Apache Kafka clients to TensorFlow js. The JavaScript ecosystem provides production ready tools for every pipeline stage. So when it comes to as I said previously, when it comes to streaming data with Kafka, Evenflow looks like this, it, it starts with an ingestion and then to transform and then to publish With Kafka in Node JI. We consume user events, compute rolling counts or session stats, and publish features to red our downstream topics. We keep transformer, we keep transforms. Item, pot and window, so retrace and back pressure, don't corrupt results. Here's an example and you can see in the slide which is about consuming new events or creating events for downstream topics. And use this Kafka, we try to build a real time feature serving architecture, which contains, which essentially contains four building blocks. One is a future competition service in node, where Node just microservices process the incoming data streams and compute features using the con corresponding business logic and corresponding statistical transformations, which we build as data engineers or data scientists based on the needs of the company. Two Redis feature store with sensible tls. We where with the register, we, the computed features are cash in red with a configurable time to live values enabling sub millisecond retrieval for online inferences. And coming to three, it's about the importance of web sockets, which push updates instead of polling. And fourth is the edge or browser in. So the UI reacts instantly because that's the end goal of interacting and app. The where the application interacts with the user. And that browser applications used cached features with TensorFlow JS or O-N-N-X-J js, and any similar libraries to run ML models locally, eliminating the server round trips. So when it comes to. Let, when it comes to implementing efficient caching, caching strategies within Redis or similar tool like a pub sub server we use multilayer caching by access pattern, browser memory. For OnPage hot features, Redis for shared online reads and edge cache for global reach, we balance freshness and cost with TTLs plus pops up or web sockets to refresh just in time. As you can see in the diagram, we have a browser memory cache, and between the CDN and the actual browser, we have a distributed cache, which is distributed across regions and is readily available for users based on their needs, based on the activity and based on the traffic through, from and through the server for that website. Because some websites or some applications do have like peak times, off peak, et cetera. So when it comes to adhering to client side ml inference with JavaScript we have today browsers and edge runtime can run realtime ml. We have lot of browsers. Like recently, Chad GPT released a browser called Atlas that was completely built on top of charge GPT and running AI models within the system locally. So we want to load a lightweight model, read the cached features. Call predict and render. So what happens with this is it reduces latency, it protects privacy and lower serving costs. Here we have an example of using Tanza j TensorFlow gs, which actually runs a model locally on the system, on the computer. And then it it performs feature engineering, extract the features and does a prediction based on the user activity or user usage patterns. So when it comes to ensuring server client logic, consistency we, one of the biggest challenges in feature pipelines is maintaining identical behavior across different environments. And removing a, like we want to remove inconsistent feature competition, which leads to training, serving skew, and the model performance degrades despite. Like already we are deploying a lightweight model onto the system. So we want to keep the data as pristine or as clean as possible so that we avoid unpredictable user experiences. So for, and we want to ship isomorphic modules for the feature logic. We test with same fixtures in node and the browser, and we locks schemas with TypeScript types or J schema and monitor distribution to catch the drift early. So these are the four steps which we perform to make sure that we want to en to ensure the server client logic is performing consistently across different environments. And once we do this performance tuning, we want to make sure that we monitor the perfor performance. So for that, we use a different me like important metrics like track. Where we track throughput, error rates, resource usage, feature freshness, and end-to-end latency from ingestion to ui. We use these signals to tune window sizes, cash details, model size, and fan out strategies. I. So some of the key metrics to track over here are in within the latency distributions which is very important. We have the P 50, P 95 and P 99 latency. We which for like feature competition retrieval and inference operations. We watch these latencies and also other attributes like features per second, cash hit ratio, web socket delivery success and skew between training and serving These metrics will actually help us to understand where to optimize first. So in some cases we might need to increase the resource utilization as I said, during peak on, off peak times, or in some cases we need to scale it down to save some costs, or in some cases we might need to understand or tweak the feature engineering so that we can create fresh features rather than using stale features. So when it comes to practical application patterns we generally think about three common wins recommendations that update with every interaction which are highly used in recommendation engines, be it streaming app or any e-commerce applications, et cetera, where we build personalized product or content recommendations based on user interaction, item similarity, computations, and and different AB testing, Strat strategy. And there's also a thing of personalized UI and predictive prefetching. What that does is it does dynamic content filtering, and it does ui contextual UI adaptations and based on the user's next step or the prediction, it does the local model do. It fetches it fetches the data or it fetches what the user might need in future. And does adaptive notification timing. We also have live analytics with client side aggregations, which are streamed over web sockets for us to monitor in the backend. Some of the architectural best practices to achieve these goals are decomp, compute, decouple compute from serving so you can scale the end version independently. Designed for failure with fallback values and grateful TTLs version. Features like APIs to roll forward are back or roll back safely. So to achieve this, when it comes to implementation roadmap, we start simple. We ship offline features with the cache, and then slowly start adding real time computation with Kafka. And then we start by introducing client side inference. With small models, which can be installed locally or which can understand user patterns. And then we scale based on observed metrics which is completely data driven, D data bag and not assumptions. Coming to some of the key takeaways when it comes to building this end-to-end pipelines using JavaScript. JavaScript is basically a one language across the stack that reduces friction because it can be used on the UI side, on the client side, on the server side, so it can be used across the entire stack so that, it reduces friction within the engineers and also the makes the code readable, makes the maintenance easier. And streaming plus caching delivers fresh, low latency features. Client said ML is production ready and privacy friendly. Together, these enable hyper-personalized experiences at scale. Thank you so much. I'm Chen Shekar. Shari Kuri again, and I'm happy to connect and discuss more. Thank you.
...

Venkata Chandra Sekhar Sastry Chilkuri

Senior Data Engineer @ Apple

Venkata Chandra Sekhar Sastry Chilkuri's LinkedIn account



Join the community!

Learn for free, join the best tech learning community

Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Access to all content