Spark is known for its in-memory computation. But in-memory computation, particularly inner-join on large datasets, causes issues with backtracing on how data got filtered out in each stage. This talk highlights lessons learned from production and how we pivoted towards one over the other.
Learn for free, join the best tech learning community
Event notifications, weekly newsletter
Access to all content