EDW Optimization with Hadoop and CDAP
Sagar Kapare Cask
The cost of maintaining a traditional Enterprise Data Warehouse (EDW) is skyrocketing as legacy systems buckle under the weight of exponentially growing data and increasingly complex processing needs. Hadoop, with its massive horizontal scalability, and CDAP which offers pre-built pipelines for EDW Offload in a drag&drop studio environment, can help.
Sagar demonstrates Cask’s solution, which shows how to build code-free, scalable, and enterprise-grade pipelines for delivering an easy-to-use and efficient EDW offload solution. He also shows how interactive data preparation, data pipeline automation, and fast querying capabilities over voluminous data can help unlock new use-cases.
Future proof, portable batch and streaming pipelines using Apache Beam
Malo Denielou Google
Apache Beam is a top-level Apache project which aims at providing a unified API for efficient and portable data processing pipeline. Beam handles both batch and streaming use cases and neatly separates properties of the data from runtime characteristics, allowing pipelines to be portable across multiple runtimes, both open-source (e.g., Apache Flink, Apache Spark, Apache Apex, …) and proprietary (e.g., Google Cloud Dataflow). This talk covers the basics of Apache Beam, describes the main concepts of the programming model and talks about the current state of the project (new python support, first stable version). We also illustrate the concepts with a use case running on several runners.
Turning a data pond into a data lake with Apache NiFi
Gene Peters Telligent Data
In recent years, there has been a drive for organizations to consolidate their analytic data — both internal and external — into a central source of truth: the data lake. But how do you actually go about populating this lake in a scalable, low-latency fashion? Enter Apache NiFi. From piping 3rd party vendor data accessed through RESTful APIs into Apache Kafka clusters, to syncing on-premise HDFS with a cloud-based object store, NiFi provides the glue to bring together the many varied components of a big data ecosystem. At Telligent Data, we use Apache NiFi as the backbone of the software and services we provide. This talk covers how to take advantage of NiFi’s realtime streaming capabilities to replicate siloed data sources into a unified data lake.