Fine grained root cause and impact analysis with CDAP Lineage
Yuki Jung Google
Lineage is a critical aspect of data governance in large enterprises, and provides traceability for data as it flows through a data system. It can unlock various use cases such as root cause analysis (discover the cause of a bad data event) and impact analysis (gauge the impact of a change before making the change). In this talk, the speaker will demonstrate how CDAP’s granular data lineage capabilities can solve these use cases for enterprises.
Accelerating EMR Spark with Alluxio on S3
Dipti Borkar Alluxio
Apache Spark and Alluxio are cousin open-source projects that originated from UC Berkeley’s AMPLab. Running Spark with Alluxio is a popular stack particularly for hybrid environments. In this session, Dipti will briefly introduce Alluxio, share the top 10 tips for performance tuning for real-world workloads, and demo Alluxio with Spark.
How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh
Zhamak Dehghani ThoughtWorks
Many enterprises are investing in their next generation data lake, with the hope of democratizing data at scale to provide business insights and ultimately make automated intelligent decisions. Data platforms based on the data lake architecture have common failure modes that lead to unfulfilled promises at scale. To address these failure modes we need to shift from the centralized paradigm of a lake, or its predecessor data warehouse. We need to shift to a paradigm that draws from modern distributed architecture: considering domains as the first class concern, applying platform thinking to create self-serve data infrastructure, and treating data as a product.