Fine grained root cause and impact analysis with CDAP Lineage
Yuki Jung Google
Lineage is a critical aspect of data governance in large enterprises, and provides traceability for data as it flows through a data system. It can unlock various use cases such as root cause analysis (discover the cause of a bad data event) and impact analysis (gauge the impact of a change before making the change). In this talk, the speaker will demonstrate how CDAP’s granular data lineage capabilities can solve these use cases for enterprises.
Accelerating workloads and bursting remote data with Google Dataproc using Alluxio
Dipti Borkar & Roderick Yao Alluxio
Google Cloud Dataproc is a popular managed on-demand service to run Spark, Presto and many other compute workloads. Alluxio, an open source data orchestration technology, helps speed up Dataproc workloads by providing a distributed caching layer within the Dataproc Cluster. In addition, Alluxio enables “Zero-copy” bursting allowing users to run compute workloads even on data that’s remote on-prem or another cloud. In this session, Dipti from Alluxio and Roderick from Google Cloud will share an overview of Alluxio and Google Dataproc and the benefits the two together bring. It will include a demo of initializing a Dataproc cluster with Alluxio to run workloads on remote data.