Transcript
This transcript was autogenerated. To make changes, submit a PR.
Hello everyone.
Good morning.
Good afternoon.
I'm from Jane to India.
Today I'm excited to present rest power data engineering, building performance
critical systems for global impact in a world drowning in data for 1
75 zetabytes process to annually.
How do we build systems that are fast, secure, and ready for global challenges?
That's what we will explore.
Imagine processing petabytes of data without crashes or breaches.
Traditional tools like Java or Python of in Fall Shock garbage collection
causes cause unpredictability, memory leaks, risk, corruption.
We need to rethink for safety and scale.
Rest offers that performance of low level language with high level safety.
So why dust?
Let's break it down
First, memory, safety, rest Ownership.
Model tracks who owns data preventing books like null pointer at compiled time.
No runtime overhead.
Zero cost abstractions.
Let lets you write clean code without slowing down Petabytes processing.
Concurrency in the list.
The compiler catches risk conditions, so no more debugging, deadlocks,
and also on the predictability.
New garbage collection means consistent performance like c plus, but C,
in my experience, I've seen teams spending weeks on memory issues on C
plus birth and rest eliminates that.
These aren't hypotheticals.
Thus delivers 10 times throughput or JBM tools in streaming, 99.99%
uptime for massive workloads, 80% lower latency at the edge, and 60%
saving while zero copy in production.
This means handling billions of records without hiccups.
Think about your own pipeline.
Could you use an 80% latency kit?
Rust isn't alone.
Two Q Powers a sync IO for high throughput.
Apache Arrow enables fast columnar data.
Great for analytics.
Data fusion is like a rest waste spark for queries.
The ecosystem is booming, making rust practical today, but
does this work in real world?
Let's take a look at the case studies.
The first case study is on the client monitoring system.
Terabytes of satellite data is stored in remote spots, needing real
time analysis on limited hardware.
The rest solution would be to efficient pipe to develop efficient
pipelines that run flawlessly on the edge, which pro which help us in
growing 65% faster processing, 40% cheaper infra, and reach to remote
areas using this edge technology.
This powers global climate impact and think monitoring
deforestation in real time.
For environmental engineers here, this means deploying
AI where it's needed the most
in health systems, public health systems, national scale data for epidemic tracking.
No room for error.
Ensure memory, safety, and privacy at compiled time.
With quarries hitting eight millisecond on billions of records, zero breaches
near perfect uptime during crisis.
This even sees slides by enabling instant insights.
These wins comes from first core features and let's deep dive on those.
Rest.
Ownership is key.
It prevents corruption by enforcing single ownership or borrowing.
Buffer flows all at compile time.
This makes data flows clear and maintainable perfect
for complex pipelines.
It's like a strict librarian where you borrow a book, but obviously you can't
keep it forever or lend it out twice.
Here is a sync processing with Tokyo, which read lines with zero copy.
It processes data.
Processes without copying data, and also at the source, it parts passes
the streams efficiently, and once it has passes the streamed stream data, it
can also transform the data in parallel without logs, which in the next step
can serialize safely to the data sync.
This avoids overhead in high volume data.
Loop also reads without blocking clear buffers to reuse.
Code creates array and batches.
Zero copy.
We share data without duplication.
Column data is used for first queries.
It also integrates with Python, which is ideal for mixed language teams.
And if you're in analytics, a arrows columnar format.
Speeds of aggregation, massively
building ethical data systems with rust, where also we need to take a look at the
efficient query execution, and this is also SQL is to optimize execution where
you can register your CSV, for example.
Run.
Like a VG temperature with average temperature, with filters, and it
can provide you a zero copy results.
And this itself is a example of data freedom in action,
which is vectorized for speed.
Rest is in this fast, it is ethical.
Tool ownership enforces privacy boundaries, efficient bias
detection in ML without slowdowns.
It is also sustainable because it takes low energy to process these
data, which in turn cuts carbon.
Dust lets us bend responsibility responsibly at scale.
In this slide, it talks about the lock free with automate and
the code shows a safer counter.
Where streaming with Tokyo Arrow provides higher throughput.
It also optimizes for energy savings.
So start small.
Try risk pipeline in your next project.
These are just plug and play experiment to see this in action
trends.
The future trends of first data engineering, Western data engineering.
Obviously Rust is Cloud native ML is integrated.
This is also specialized tool which provides us the performance and safety
at lower cost, fewer vulnerabilities, global scalability and this is why
Rusty is the future of data engineering.
In summary, rusty transforms data engineering for impact.
Fast, safe and ethical programming.
Thank you.