Conf42: Cloud Native 2024

Enriching the data vs Filtering in Spark

Spark is known for its in-memory computation. But in-memory computation, particularly inner-join on large datasets, causes issues with backtracing on how data got filtered out in each stage. This talk highlights lessons learned from production and how we pivoted towards one over the other.

Join the community!

Learn for free, join the best tech learning community

Newsletter

$ 0 /mo

Event notifications, weekly newsletter

Access to all content

Email address

First Name

Last Name

Company

Job Title

Phone Number

Country

Gokul Prabagaren

Conf42 Speaker profile

Conf42: Cloud Native 2024

Enriching the data vs Filtering in Spark

Gokul Prabagaren

Join the community!

Featured event

2026

2025

Info

Gokul Prabagaren

Conf42 Speaker profile

Conf42: Cloud Native 2024

Enriching the data vs Filtering in Spark

Gokul Prabagaren

Join the community!