< Back

How to Collaborate with Data Lake Management Communities

March 11, 2021 6:00 PM

General introduction and background on Data Lake Management.

Chris Crosbie Google

General introduction and background on Data Lake Management.

Modernizing Apache Hive Metastore for the next decade

Feng Lu Google

The Apache Hive project was introduced in 2010 and has enjoyed widespread popularity in the community for a decade.
Being the de facto technical metadata standard, Apache Hive Metastore is a critical backbone for building data lake
applications. The recent need of providing ACID properties, time travel, end to end security, and streaming support
on tabular datasets renders the Hive Metastore data model inadequate. We present a number of open source initiatives
at Google that aim to uplift and rejuvenate Hive Metastore for the next decade with design rationales explained. Namely,
first-class versioned metadata support for new table formats (e.g., Iceberg, Delta), native gRPC interface for Hive Metastore
API access, and running Hive Metastore on cloud-native databases like Cloud Spanner. We will wrap up the talk with development
progress and call for community contributions.

How Delta Lake Address Data Lake Challenges

Ajay Singh Databricks

Delta Lake is an open source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs. Join the session to learn about specific challenges of Data Lakes and how Delta Lake addresses them to bring the benefits of data warehousing on data lakes.