Realtime Cube Updates with Kylin/Kafka Integration
Seshu Adunuthula eBay
Apache Kylin is an open source Distributed Analytics Engine contributed from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop with support for extremely large data sets.
Kylin has traditionally supported end of the day processing of the cubes resulting in large multi-hour cube build times depending on the number of rows added and
In this talk we will introduce the concept of “Cube Segments” – the ability to build cubes on micro batches of data subscribed from Kakfa Topics.
We will also present an internal use case where SEO Attribution report with a 24+ hour processing window is now available within minutes.
High Volume Streaming Analytics with CDAP
Jia-long Wu Lotame
In this talk, we’ll present the design of our new data stream processing application at Lotame and describe how we achieve significant reduction in cluster resource utilization while allowing faster updates of client audience data and better ad-hoc query support with the new platform.
We will examine the challenges faced in counting uniques in a high volume stream processing environment, and present a novel approach using time windowed HyperLogLog aggregates. We’ll also discuss how CDAP enable us to roll out this new platform quickly and share some valuable lessons and best practices we learned during the development cycle.
Introducing Athena stream processing platform
Yuanchi Ning Uber
Athena is a stream processing platform for Uber’s near real time analytics applications, built using Samza. We will be discussing some of the existing and upcoming use cases and how they impact the Uber partners / riders. The talk will go through the tooling built around Samza for easier user on-boarding – such as deployment manager, integration with typesafe config system, unit test framework, Graphite integration, metric whitelisting and so on. We’ll also go over some of the issues observed during this process.