< Back

Fluentd & Docker, Turbocharging CDAP, and Building a Data Science Platform

October 14, 2015 6:00 PM

Fluentd and Docker Integration

John Hammink Treasure Data

New to Docker? Application logging? As with any production application framework, logging is an essential piece. But traditionally logging in complex architectures has been a mess.

Fluentd is an open source data collector, which lets you unify the data collection and consumption for a better use and understanding of data. Based on open source software and built by the team that went on to found Treasure Data, Fluentd works by becoming a common interface between logging sources and log destinations by using community-developed input and output plugins around a common core. For more info, read here: http://www.fluentd.org.

Learn and see firsthand how Fluentd and Docker integration work together to simplify the logging story for your container-based apps.

This presentation Includes a live demo. We’ll:

  1. Log into a docker instance and build a contaner;
  2. Configure Fluentd as our logging driver within our container;
  3. Send events through Fluentd and route them to Treasure Data;
  4. Query the Treasure Data interface to view our events;
  5. Discuss where we can go from here.

Additionally, we’ll look at ways to route our queried data to other systems, including Amazon Redshift, Postgres, S3, Riak, Tableau and more.

Turbocharging CDAP Applications With Ampool

Milind Bhandarkar Ampool

In this talk, we will describe how Ampool takes advantage of CDAP’s extensibility and enables fast analytical data pipelines. CDAP provides consistent developer interfaces to both processing frameworks, and storage abstractions. This allows developers to build Hadoop solutions using multiple processing paradigms, with uniform capabilities to build, test, deploy & manage across multiple environments. More importantly, CDAP provides core data abstractions that allow applications to be decoupled from storage engines, such as HDFS & HBase. We will outline Ampool’s vision of the modern data architecture, and how CDAP & Ampool are working together on realizing it. We will provide details about extending CDAP’s data abstractions to allow unprecedented speeds for Hadoop solutions built with CDAP.

Tips for Building a Data Science Platform

David Chaiken Altiscale

Data scientists need to be advocates for a self-serve, Hadoop-based environment that is productive, reliable and a joy to use. This talk presents five tips to make your Big Data environment successful, and shows how best-of-breed tools like Spark fit together with the components of the Hadoop ecosystem.

These tips are valid whether the environment is built on premises, on top of infrastructure as a service, or deployed as a service. That said, the talk concludes by pointing out that buying the underlying platform as a service is the fastest path to deriving business value from big data.