Centralized Metadata for Multi Cloud Data Pipelines
Rohit Sinha Google
Enterprises are increasingly adopting multi-cloud strategies to meet their business requirements. For ETL workloads, running data pipelines in different environments provides flexibility and higher resiliency. With this move, one of the key challenges is to have a centralized control plane that coordinates pipelines across all environments and collects all the metadata in a central place to provide a unified view of all business, technical and operational metadata. In this talk, we will discuss use-cases that can be enabled by centralized control and metadata capabilities for multi-cloud data pipelines.
Rohit Sinha is a software engineer at Google where he works on open source Big Data Application Platform CDAP (cdap.io). Prior to Google, he worked at Cask where he was responsible for building software fueling the next generation of Big Data applications.
Build Apps Without Pipelines: Shortest Path from Complex Data to Live Apps
Anirudh Ramanathan Rocket
With the vast array of datasets available today, investment management firms have significant opportunity to use non-traditional, alternative data to enhance their research. Efficiently combining and analyzing disparate data streams—real-time and semi-structured—to support investment decisions is critical to remain competitive in this space.
In this talk, we will use Rockset, a serverless search and analytics engine, to demonstrate how developers can easily plug in alternative data and build apps on those data sets. Specifically, we will work with the JSON data stream from Twitter’s Firehose streams API and NASDAQ’s Company Lookup directory exported in CSV format. Using Rockset, we will demonstrate how these two data sets can be loaded, and immediately queried and joined using SQL, without any upfront data preparation or complex data pipelines.
Rockset is founded by former members of Facebook’s online data team, who helped create RocksDB, Facebook’s TAO, Unicorn, and HDFS.
Anirudh Ramanathan leads Product Engineering at Rockset. He is an Apache Spark committer and a Kubernetes maintainer. Prior to this, he was on the Kubernetes team at Google where he worked on Google Kubernetes Engine, core controllers, and founded SIG Big Data, a group focused on containerized Big Data and ML workloads (Apache Airflow, Kubeflow, JupyterHub and HDFS).
Modernizing Big Data Infrastructure with Docker and Kubernetes
Ravikumar Alluboyina & Tushar Doshi Robin Systems
Docker is clearly a de-facto choice for enterprises, and many are moving farther with cluster orchestration using Kubernetes. Unfortunately, both were designed for stateless use cases, so running data-centric workloads like NoSQL, Big-Data or RDBMs seems like a stretch. However, Robin has come to the rescue with a Hyper-converged Kubernetes Platform designed with and for a stateful workload in mind. In this talk we will demonstrate a usecase to run your Data Intense Apps-as-a-Service – NoSQL, Big-Data or even RDBMs, with an AppStore experience – simpler, faster, easier.
Ravikumar and Tushar are both Senior Software Engineers at Robin Systems.