Conf42 Rustlang 2025 - Online

- premiere 5PM GMT

Rust-Powered Data Engineering: Building Performance-Critical Systems for Global Impact

Video size:

Abstract

Discover how Rust is revolutionizing data engineering with 10x performance gains and zero-cost safety! Learn to build petabyte-scale pipelines that process global climate data in real-time while using 60% less compute. See live demos of memory-safe systems serving billions of users daily.

Summary

Transcript

This transcript was autogenerated. To make changes, submit a PR.
Hello everyone. Good morning. Good afternoon. I'm from Jane to India. Today I'm excited to present rest power data engineering, building performance critical systems for global impact in a world drowning in data for 1 75 zetabytes process to annually. How do we build systems that are fast, secure, and ready for global challenges? That's what we will explore. Imagine processing petabytes of data without crashes or breaches. Traditional tools like Java or Python of in Fall Shock garbage collection causes cause unpredictability, memory leaks, risk, corruption. We need to rethink for safety and scale. Rest offers that performance of low level language with high level safety. So why dust? Let's break it down First, memory, safety, rest Ownership. Model tracks who owns data preventing books like null pointer at compiled time. No runtime overhead. Zero cost abstractions. Let lets you write clean code without slowing down Petabytes processing. Concurrency in the list. The compiler catches risk conditions, so no more debugging, deadlocks, and also on the predictability. New garbage collection means consistent performance like c plus, but C, in my experience, I've seen teams spending weeks on memory issues on C plus birth and rest eliminates that. These aren't hypotheticals. Thus delivers 10 times throughput or JBM tools in streaming, 99.99% uptime for massive workloads, 80% lower latency at the edge, and 60% saving while zero copy in production. This means handling billions of records without hiccups. Think about your own pipeline. Could you use an 80% latency kit? Rust isn't alone. Two Q Powers a sync IO for high throughput. Apache Arrow enables fast columnar data. Great for analytics. Data fusion is like a rest waste spark for queries. The ecosystem is booming, making rust practical today, but does this work in real world? Let's take a look at the case studies. The first case study is on the client monitoring system. Terabytes of satellite data is stored in remote spots, needing real time analysis on limited hardware. The rest solution would be to efficient pipe to develop efficient pipelines that run flawlessly on the edge, which pro which help us in growing 65% faster processing, 40% cheaper infra, and reach to remote areas using this edge technology. This powers global climate impact and think monitoring deforestation in real time. For environmental engineers here, this means deploying AI where it's needed the most in health systems, public health systems, national scale data for epidemic tracking. No room for error. Ensure memory, safety, and privacy at compiled time. With quarries hitting eight millisecond on billions of records, zero breaches near perfect uptime during crisis. This even sees slides by enabling instant insights. These wins comes from first core features and let's deep dive on those. Rest. Ownership is key. It prevents corruption by enforcing single ownership or borrowing. Buffer flows all at compile time. This makes data flows clear and maintainable perfect for complex pipelines. It's like a strict librarian where you borrow a book, but obviously you can't keep it forever or lend it out twice. Here is a sync processing with Tokyo, which read lines with zero copy. It processes data. Processes without copying data, and also at the source, it parts passes the streams efficiently, and once it has passes the streamed stream data, it can also transform the data in parallel without logs, which in the next step can serialize safely to the data sync. This avoids overhead in high volume data. Loop also reads without blocking clear buffers to reuse. Code creates array and batches. Zero copy. We share data without duplication. Column data is used for first queries. It also integrates with Python, which is ideal for mixed language teams. And if you're in analytics, a arrows columnar format. Speeds of aggregation, massively building ethical data systems with rust, where also we need to take a look at the efficient query execution, and this is also SQL is to optimize execution where you can register your CSV, for example. Run. Like a VG temperature with average temperature, with filters, and it can provide you a zero copy results. And this itself is a example of data freedom in action, which is vectorized for speed. Rest is in this fast, it is ethical. Tool ownership enforces privacy boundaries, efficient bias detection in ML without slowdowns. It is also sustainable because it takes low energy to process these data, which in turn cuts carbon. Dust lets us bend responsibility responsibly at scale. In this slide, it talks about the lock free with automate and the code shows a safer counter. Where streaming with Tokyo Arrow provides higher throughput. It also optimizes for energy savings. So start small. Try risk pipeline in your next project. These are just plug and play experiment to see this in action trends. The future trends of first data engineering, Western data engineering. Obviously Rust is Cloud native ML is integrated. This is also specialized tool which provides us the performance and safety at lower cost, fewer vulnerabilities, global scalability and this is why Rusty is the future of data engineering. In summary, rusty transforms data engineering for impact. Fast, safe and ethical programming. Thank you.
...

Ritesh Kumar Sinha

Associate Vice President - Relationship Management @ Kotak Mahindra Bank

Ritesh Kumar Sinha's LinkedIn account



Join the community!

Learn for free, join the best tech learning community for a price of a pumpkin latte.

Annual
Monthly
Newsletter
$ 0 /mo

Event notifications, weekly newsletter

Delayed access to all content

Immediate access to Keynotes & Panels

Community
$ 8.34 /mo

Immediate access to all content

Courses, quizes & certificates

Community chats

Join the community (7 day free trial)