Building an ECA Rules Engine for IoT with CDAP
Bhooshan Mogal Cask
Event-condition-action (ECA) rules, where actions are triggered by events, under specific conditions, are the basis of many IoT use cases. In this talk, Bhooshan will explain the fundamentals of an Apache Hadoop-based ECA framework. The framework allows continuous ingestion of any kind of data e.g. from device sensors. It contains a dynamic, distributable rules engine, which can apply rules upon incoming data in real-time. Users can then take actions (which are pluggable) upon applying these rules. Bhooshan will demonstrate Cask’s solution, which leverages Spark Streaming as the real-time engine and offers REST API’s for easily building custom applications. His demo will also show how to send events into the system, create rules easily and code-free, execute rules, and send notifications.
Demonstrating the Benefits of Hyper-Acceleration for both Batch and Streaming Spark Processing
Roop Ganguly BigStream
Today full-stack developers, BI/analytics teams and IT ops are discovering that a growing number of Apache Spark workloads are requiring maximum compute capacity and performance. At this year’s Spark Summit, Spark creator Matei Zaharia cited a “compute bottleneck” for Spark applications and that acceleration strategies in hardware (FPGAs/GPUs) and software must become “first class resources” in any Spark project. In this presentation, we will present two case studies that demonstrate viable acceleration strategies for Spark applications.
The first demonstrates performance increases for the well-known TPC-DS decision support benchmark using software-based acceleration on Amazon EMR. Software acceleration is able to provide 2-4x speedup for these benchmarks, without a single line of code change. The second case presents configurable a FPGA-based acceleration strategy applied to an online adtech ETL application. This approach generates speedups of nearly 7x, while simultaneously
solving an unbounded delay issue for the unaccelerated code.
Improving Application and Cluster Performance for Big Data Stacks
Kunal Agarwal Unravel Data
Long gone are the days of the piecemeal approach to Big Data… at least we’d like them to be. Logs are hard to interpret, and they don’t provide a comprehensive record of what happens on the Big Data stack. And each application on the stack may create a hundred tasks, and hence a hundred data streams. Logs are also historical by nature and difficult to use to pinpoint problems in real-time, while infrastructure monitoring can provide low-level visibility. For example, they can give you a CPU utilization graph, or a network I/O graph, but won’t tell you a lot about applications. What’s missing is an intelligent 360 degree “view” into the entire Big Data stack. In this session, we will explore new ways to monitor and optimize your Big Data app performance, resource utilization and data management in the age of DataOps. The session will include use cases from companies that made improvements using these techniques including Box and Autodesk.