October 27, 2020

How Apache Spark 3.0 and Delta Lake Enhance Data Lake Reliability

Available on-demand

Apache Spark ™️ has become the de facto open source standard for big data processing due to its ease of use and performance. And the open source Delta Lake project enhances Spark’s lead with new capabilities like ACID transactions, Schema Enforcement and Time Travel.

These features help ensure that data lakes and data pipelines can deliver high-quality, reliable data to downstream data teams for successful data analytics and machine learning projects.

In this webinar, learn the advantages of combining Apache Spark 3.0 and Delta Lake. You’ll also get a walk-through of Apache Spark 3.0 as part of our Databricks Runtime 7.0 Beta.

Here’s what we’ll cover:

How to use Apache Spark for big data processing and how to simplify architectures with unified batch and streaming

How Delta Lake addresses the technical challenges around data lake architectures and ensures reliable data for Spark processing

How the new Adaptive Query Execution framework within Spark 3.0 yields query performance gains — of up to 1.5x

How Dynamic Partition Pruning significantly speeds up performance

Get up to speed on the latest contributions from the Spark community for fast and scalable data processing. And find out how you can try them today on Databricks for free.

Watch on-demand today!

Denny Lee, Staff Developer Advocate

Share with your friends: