< Back

Introducing Pachyderm, and Big Data at TubeMogul

April 27, 2016 6:00 PM

Introducing Pachyderm

Joe Doliner Pachyderm

Pachyderm is a big data analytics platform deployed with Kubernetes and Docker. Pachyderm is inspired by the Hadoop ecosystem but shares no code with it. Instead, we leverage the container ecosystem to provide the broad functionality of Hadoop with the ease of use of Docker.

In this talk, we’ll show you how you can build streaming data workflows.

There are two bold new ideas in Pachyderm:
• Containers as the core primitive for computation — which means each stage in your workflow can be written using any languages or libraries you want.
• Version Control for data — view diffs of your data and incrementally process only the new data as it streams in.

These ideas lead directly to a system that’s much more powerful, flexible and easy to use. Pachyderm is open source so check it out on GitHub.

Leveraging Big Data at TubeMogul to convert Events --> Insights --> Actions

Murtaza Doctor & John Trenkle TubeMogul

TubeMogul is a leader in digital advertising delivering our client’s creative content to desktops, mobile phones , programmatic TV and, ultimately, any device that can show engaging Ads to users. Over the course of 10 years, the scale of data flowing through our RTB (Real-Time Bidding) system has increased exponentially. As this flow has increased, so has our data ecosystem evolved to handle the collection and ETL of this data for the purposes of billing clients, fueling Optimization, Machine Learning, and Analytics. In this talk we’ll discuss the path we’ve followed that has employed Hadoop, Hive, Spark and Presto, as well as Cascading and other variations to fulfill specific functions of our system. We’ll talk about specific use cases in our platform and will end with a hint, the directions that this trajectory is taking us.