< Back

Using Apache Kylin, High Volume Streaming Analytics, Introducing Athena

August 18, 2015 6:00 PM

Realtime Cube Updates with Kylin/Kafka Integration

Seshu Adunuthula eBay

Apache Kylin is an open source Distributed Analytics Engine contributed from eBay that provides SQL interface and multi-dimensional analysis (OLAP) on Hadoop with support for extremely large data sets.

Kylin has traditionally supported end of the day processing of the cubes resulting in large multi-hour cube build times depending on the number of rows added and
dimension cardinality.

In this talk we will introduce the concept of “Cube Segments” – the ability to build cubes on micro batches of data subscribed from Kakfa Topics.

We will also present an internal use case where SEO Attribution report with a 24+ hour processing window is now available within minutes.

High Volume Streaming Analytics with CDAP

Jia-long Wu Lotame

In this talk, we’ll present the design of our new data stream processing application at Lotame and describe how we achieve significant reduction in cluster resource utilization while allowing faster updates of client audience data and better ad-hoc query support with the new platform.

We will examine the challenges faced in counting uniques in a high volume stream processing environment, and present a novel approach using time windowed HyperLogLog aggregates. We’ll also discuss how CDAP enable us to roll out this new platform quickly and share some valuable lessons and best practices we learned during the development cycle.

Introducing Athena stream processing platform

Yuanchi Ning Uber

Athena is a stream processing platform for Uber’s near real time analytics applications, built using Samza. We will be discussing some of the existing and upcoming use cases and how they impact the Uber partners / riders. The talk will go through the tooling built around Samza for easier user on-boarding – such as deployment manager, integration with typesafe config system, unit test framework, Graphite integration, metric whitelisting and so on. We’ll also go over some of the issues observed during this process.