SlideShare una empresa de Scribd logo
1 de 33
Making sense of Apache Bigtop, ODPi and
why it all matters to Apache Apex
Roman Shaposhnik, rvs@apache.org,
@rhatr
Director of Open Source Strategy,
Pivotal Inc.
A slide deck build via “Apache Way”
• Bigtop community contributors
• Roman Shaposhnik
• Konstantin Boudnik
• Nate D'Amico
• Evans Ye & Darren Chen (Trend Micro)
What is Apache Bigtop?
• Apache Bigtop is to Hadoop what Debian is to Linux
• A 100% open, community driven distribution of bigdata
management platform based on Apache Hadoop
• A place where all communities around big data come
together
• The thing everybody (Pivotal, Cloudera, Hortonworks,
WANDisco, IBM, Amazon, TrendMicro) is building off of
• A cutting edge, quickly evolving distribution and a set
of tools
GNU Software Linux kernel
Hadoop Ecosystem
(Pig, Hive, Spark) Linux kernel
Hadoop
(HDFS + YARN + MR)
ODPi is a nonprofit organization committed to simplification &
standardization of the big data ecosystem with a common reference
specification called ODPi Core.
As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop®
and Big Data Technologies for the Enterprise.
February 2015 December 2015September 2015
What has ODPi done so far (1.0.1)?
• Runtime specification
• https://github.com/odpi/specs/blob/master/ODPi-Runtime.md
• Validation testsuite
• http://repo.odpi.org/ODPi/1.0/acceptance-tests/
• Reference implementation binaries
• http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
What are we working on?
• Operations specification
• https://github.com/odpi/specs/blob/master/ODPi-Operations.md
• ISV “ODPi compatible” policy
• Expanding ODPi core beyond Apache Hadoop & Ambari
• Hive
• ????
• How can you help?
• Share usecases
• Test against reference implementation
• Contribute to upstream ASF projects
What’s in is Bigtop?
• A set of binary packages
• just like CDH/PHD/HDP/ODPi/etc.
• Integration code
• Packaging code
• Deployment code
• Orchestration code
• Validation code
• Continuous Integration infrastructure
Integration/packaging
• Linux packages
• RPM, DEB
• RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu
• VirtualBox, VMWare, etc. VM images
• Challenge: Linux packaging is node-centric
• “smart” tarballs
• Docker or BOSH images
Integration testing based on iTest
• Clean-room provisioning
• these ain’t your gramp’s unit tests
• Versioned test artifacts
• JVM-base test artifacts
• Matching stacks of components and integration tests
• Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
Puppet 3.x deployment
• Master-less puppet
• $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node
• Cluster topology is kept in Hiera
bigtop::hadoop_head_node: "hadoopmaster.example.com"
hadoop::hadoop_storage_dirs:
- ”/mnt”
hadoop_cluster_node::cluster_components:
- yarn
- zookeeper
bigtop::bigtop_repo_uri:
"http://bigtop-
One click Bigtop provisioning
Who is this for?
• For Hadoop app developers, cluster admins, users
• Run a Hadoop cluster to test your code on
• Try & test configurations before applying to Production
• Play around with Bigtop Big Data Stack
• For contributors
• Easy to test your packaging, deployment, testing code
• For vendors
• CI out of the box —> patch upstream code made easier
Works great, but…
• Need to add vagrant public key into docker images
• Too many issues with auto-created boot2docker
hosting VM
• A bug for docker provider keep opening for almost
2y
• Waiting for machine to boot' hangs infinitely
• Can not share same code for different providers
anyway
• Not all the docker options supported in Vagrantfile
• Does not support Docker Swarm
Docker Compose
Implementation
• Create docker containers:
• docker-compose scale bigtop=3
• Volumes:
• Bigtop Puppet configurations
• Bigtop Puppet code
• /etc/hosts
•Compatible with Docker Machine and Swarm
Docker Machine and Swarm
Juju orchestration
$ juju boostrap
$ juju deploy hadoop-processing
https://jujucharms.com/hadoop-
processing/
Juju orchestration
$ juju add-unit slave -n 2
Juju orchestration
$ juju action do namenode/0 smoke-test
$ juju action do resourcemanager/0
smoke-test
$ watch -n 0.5 juju action status
Early Mission Accomplished
Foundation for commercial Hadoop distros/services
Leveraged by app providers…
Blue prints for data engineering
• BigPetStore
• Data Generator
• Examples using tools in Hadoop ecosystem to process
data
• Build system and tests for integrating tools and multiple
JVM languages
• Started by Dr. Jay Vyas, prinicipal software engineer at
Red Hat, Inc.
Datamodel
Transaction Purchase Model
Lambda/Stream Architectures
HDFS + Zookeeper +
New focus and target end users
Data engineers vs distro
builders
Enhance
Operations/Deployment
Reference implementations
& tutorials
Data data data…
Smarter/Realistic test data
-bigpetstore
-bigtop-bazaar
-weather data gen
Tutorial/Learning Data sets
-githubarchive.org
-more tbd…
Thank You, Q&A

Más contenido relacionado

La actualidad más candente

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopDataWorks Summit
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseeNovance
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandeNovance
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju CharmsAltoros
 
Puppet at Spotify
Puppet at SpotifyPuppet at Spotify
Puppet at SpotifyPuppet
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...Daniel Krook
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overviewroundman
 
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...OpenShift Origin
 
Build a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerBuild a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerK Rain Leander
 
Delve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsDelve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsLachlan Evenson
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Matt Ray
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamoneNovance
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltStack
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012Matt Ray
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...NETWAYS
 
Containers and CloudStack
Containers and CloudStackContainers and CloudStack
Containers and CloudStackShapeBlue
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101Timothy Spann
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...OpenShift Origin
 
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Origin
 

La actualidad más candente (20)

Open Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of HadoopOpen Source Recipes for Chef Deployments of Hadoop
Open Source Recipes for Chef Deployments of Hadoop
 
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to IcehouseOpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
OpenStack in Action 4! Thierry Carrez - From Havana to Icehouse
 
High Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit PortlandHigh Availability from the DevOps side - OpenStack Summit Portland
High Availability from the DevOps side - OpenStack Summit Portland
 
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
Cloud Foundry Deployment Tools:  BOSH vs Juju CharmsCloud Foundry Deployment Tools:  BOSH vs Juju Charms
Cloud Foundry Deployment Tools: BOSH vs Juju Charms
 
Puppet at Spotify
Puppet at SpotifyPuppet at Spotify
Puppet at Spotify
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
OpenShift Overview
OpenShift OverviewOpenShift Overview
OpenShift Overview
 
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
From Zero to Cloud: Revolutionize your Application Life Cycle with OpenShift ...
 
Build a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-managerBuild a Basic Cloud Using RDO-manager
Build a Basic Cloud Using RDO-manager
 
Delve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOpsDelve into Helm - Advanced DevOps
Delve into Helm - Advanced DevOps
 
Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013Chef for OpenStack: OpenStack Spring Summit 2013
Chef for OpenStack: OpenStack Spring Summit 2013
 
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick HamonOpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
OpenStack in Action! 5 - Dell - OpenStack powered solutions - Patrick Hamon
 
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web ScaleSaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
SaltConf14 - Craig Sebenik, LinkedIn - SaltStack at Web Scale
 
Chef for OpenStack December 2012
Chef for OpenStack December 2012Chef for OpenStack December 2012
Chef for OpenStack December 2012
 
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...OSDC 2018 | Three years running containers with Kubernetes in Production by T...
OSDC 2018 | Three years running containers with Kubernetes in Production by T...
 
Containers and CloudStack
Containers and CloudStackContainers and CloudStack
Containers and CloudStack
 
kolla
kollakolla
kolla
 
Api world apache nifi 101
Api world   apache nifi 101Api world   apache nifi 101
Api world apache nifi 101
 
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
5 ways to install @OpenShift in 5 minutes (Lightening Talk given at #DevConfC...
 
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12XOpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
OpenShift Anywhere given at Infrastructure.Next Talk at #Scale12X
 

Similar a Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat OverviewMandi Walls
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Mandi Walls
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningEvans Ye
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningDataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakSean Roberts
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...Hortonworks
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopGanesh Raju
 
Neev Open Source Contributions
Neev Open Source ContributionsNeev Open Source Contributions
Neev Open Source ContributionsNeev Technologies
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013Docker, Inc.
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments buildacloud
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsPuppet
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersGiovanni Galloro
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for WindowsTerry Padgett
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...Ganesh Raju
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013dotCloud
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...OpenShift Origin
 
Containerdays Intro to Habitat
Containerdays Intro to HabitatContainerdays Intro to Habitat
Containerdays Intro to HabitatMandi Walls
 

Similar a Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex (20)

Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
Habitat Overview
Habitat OverviewHabitat Overview
Habitat Overview
 
Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017Habitat Workshop at Velocity London 2017
Habitat Workshop at Velocity London 2017
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Hadoop Everywhere & Cloudbreak
Hadoop Everywhere & CloudbreakHadoop Everywhere & Cloudbreak
Hadoop Everywhere & Cloudbreak
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...Hortonworks Technical Workshop:   HDP everywhere - cloud considerations using...
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
 
State of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache BigtopState of Big Data on ARM64 / AArch64 - Apache Bigtop
State of Big Data on ARM64 / AArch64 - Apache Bigtop
 
Neev Open Source Contributions
Neev Open Source ContributionsNeev Open Source Contributions
Neev Open Source Contributions
 
Intro to Docker October 2013
Intro to Docker October 2013Intro to Docker October 2013
Intro to Docker October 2013
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments Deploying Hadoop-based Bigdata Environments
Deploying Hadoop-based Bigdata Environments
 
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata EnvironmentsDeploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
 
Transforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux ContainersTransforming Application Delivery with PaaS and Linux Containers
Transforming Application Delivery with PaaS and Linux Containers
 
DC HUG Hadoop for Windows
DC HUG Hadoop for WindowsDC HUG Hadoop for Windows
DC HUG Hadoop for Windows
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Intro Docker october 2013
Intro Docker october 2013Intro Docker october 2013
Intro Docker october 2013
 
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
Build Your Own PaaS, Just like Red Hat's OpenShift from LinuxCon 2013 New Orl...
 
Containerdays Intro to Habitat
Containerdays Intro to HabitatContainerdays Intro to Habitat
Containerdays Intro to Habitat
 

Más de Apache Apex

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexApache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017Apache Apex
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareApache Apex
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Apache Apex
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataApache Apex
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentApache Apex
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFSApache Apex
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingApache Apex
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to YarnApache Apex
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map ReduceApache Apex
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data HadoopApache Apex
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsApache Apex
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationApache Apex
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Apache Apex
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentApache Apex
 

Más de Apache Apex (20)

Low Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache ApexLow Latency Polyglot Model Scoring using Apache Apex
Low Latency Polyglot Model Scoring using Apache Apex
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra TagareActionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
Actionable Insights with Apache Apex at Apache Big Data 2017 by Devendra Tagare
 
Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)Developing streaming applications with apache apex (strata + hadoop world)
Developing streaming applications with apache apex (strata + hadoop world)
 
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache ApexApache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
Apache Big Data EU 2016: Next Gen Big Data Analytics with Apache Apex
 
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache ApexApache Big Data EU 2016: Building Streaming Applications with Apache Apex
Apache Big Data EU 2016: Building Streaming Applications with Apache Apex
 
Intro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big DataIntro to Apache Apex @ Women in Big Data
Intro to Apache Apex @ Women in Big Data
 
Deep Dive into Apache Apex App Development
Deep Dive into Apache Apex App DevelopmentDeep Dive into Apache Apex App Development
Deep Dive into Apache Apex App Development
 
Hadoop Interacting with HDFS
Hadoop Interacting with HDFSHadoop Interacting with HDFS
Hadoop Interacting with HDFS
 
Introduction to Real-Time Data Processing
Introduction to Real-Time Data ProcessingIntroduction to Real-Time Data Processing
Introduction to Real-Time Data Processing
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to Yarn
Introduction to YarnIntroduction to Yarn
Introduction to Yarn
 
Introduction to Map Reduce
Introduction to Map ReduceIntroduction to Map Reduce
Introduction to Map Reduce
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
 
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data TransformationsKafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
Kafka to Hadoop Ingest with Parsing, Dedup and other Big Data Transformations
 
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) ApplicationBuilding Your First Apache Apex (Next Gen Big Data/Hadoop) Application
Building Your First Apache Apex (Next Gen Big Data/Hadoop) Application
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
Intro to YARN (Hadoop 2.0) & Apex as YARN App (Next Gen Big Data)
 
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and EnrichmentIngesting Data from Kafka to JDBC with Transformation and Enrichment
Ingesting Data from Kafka to JDBC with Transformation and Enrichment
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex

  • 1. Making sense of Apache Bigtop, ODPi and why it all matters to Apache Apex Roman Shaposhnik, rvs@apache.org, @rhatr Director of Open Source Strategy, Pivotal Inc.
  • 2. A slide deck build via “Apache Way” • Bigtop community contributors • Roman Shaposhnik • Konstantin Boudnik • Nate D'Amico • Evans Ye & Darren Chen (Trend Micro)
  • 3. What is Apache Bigtop? • Apache Bigtop is to Hadoop what Debian is to Linux • A 100% open, community driven distribution of bigdata management platform based on Apache Hadoop • A place where all communities around big data come together • The thing everybody (Pivotal, Cloudera, Hortonworks, WANDisco, IBM, Amazon, TrendMicro) is building off of • A cutting edge, quickly evolving distribution and a set of tools
  • 5. Hadoop Ecosystem (Pig, Hive, Spark) Linux kernel Hadoop (HDFS + YARN + MR)
  • 6. ODPi is a nonprofit organization committed to simplification & standardization of the big data ecosystem with a common reference specification called ODPi Core. As a shared industry effort , ODPi is focused on promoting and advancing the state of Apache Hadoop® and Big Data Technologies for the Enterprise.
  • 7. February 2015 December 2015September 2015
  • 8.
  • 9. What has ODPi done so far (1.0.1)? • Runtime specification • https://github.com/odpi/specs/blob/master/ODPi-Runtime.md • Validation testsuite • http://repo.odpi.org/ODPi/1.0/acceptance-tests/ • Reference implementation binaries • http://repo.odpi.org/ODPi/1.0/{centos6, ubuntu-14.04}
  • 10. What are we working on? • Operations specification • https://github.com/odpi/specs/blob/master/ODPi-Operations.md • ISV “ODPi compatible” policy • Expanding ODPi core beyond Apache Hadoop & Ambari • Hive • ???? • How can you help? • Share usecases • Test against reference implementation • Contribute to upstream ASF projects
  • 11. What’s in is Bigtop? • A set of binary packages • just like CDH/PHD/HDP/ODPi/etc. • Integration code • Packaging code • Deployment code • Orchestration code • Validation code • Continuous Integration infrastructure
  • 12. Integration/packaging • Linux packages • RPM, DEB • RHEL/CentOS(Fedora), SLES(OpenSUSE), Debian, Ubuntu • VirtualBox, VMWare, etc. VM images • Challenge: Linux packaging is node-centric • “smart” tarballs • Docker or BOSH images
  • 13. Integration testing based on iTest • Clean-room provisioning • these ain’t your gramp’s unit tests • Versioned test artifacts • JVM-base test artifacts • Matching stacks of components and integration tests • Plug’n’play architecture: Gradle/Groovy, JARs/artifacts
  • 14. Puppet 3.x deployment • Master-less puppet • $ puppet apply bigtop-deploy/puppet/manifests/site.pp # on each node • Cluster topology is kept in Hiera bigtop::hadoop_head_node: "hadoopmaster.example.com" hadoop::hadoop_storage_dirs: - ”/mnt” hadoop_cluster_node::cluster_components: - yarn - zookeeper bigtop::bigtop_repo_uri: "http://bigtop-
  • 15. One click Bigtop provisioning
  • 16. Who is this for? • For Hadoop app developers, cluster admins, users • Run a Hadoop cluster to test your code on • Try & test configurations before applying to Production • Play around with Bigtop Big Data Stack • For contributors • Easy to test your packaging, deployment, testing code • For vendors • CI out of the box —> patch upstream code made easier
  • 17. Works great, but… • Need to add vagrant public key into docker images • Too many issues with auto-created boot2docker hosting VM • A bug for docker provider keep opening for almost 2y • Waiting for machine to boot' hangs infinitely • Can not share same code for different providers anyway • Not all the docker options supported in Vagrantfile • Does not support Docker Swarm
  • 19. Implementation • Create docker containers: • docker-compose scale bigtop=3 • Volumes: • Bigtop Puppet configurations • Bigtop Puppet code • /etc/hosts •Compatible with Docker Machine and Swarm
  • 21. Juju orchestration $ juju boostrap $ juju deploy hadoop-processing
  • 23. Juju orchestration $ juju add-unit slave -n 2
  • 24. Juju orchestration $ juju action do namenode/0 smoke-test $ juju action do resourcemanager/0 smoke-test $ watch -n 0.5 juju action status
  • 25. Early Mission Accomplished Foundation for commercial Hadoop distros/services Leveraged by app providers…
  • 26.
  • 27. Blue prints for data engineering • BigPetStore • Data Generator • Examples using tools in Hadoop ecosystem to process data • Build system and tests for integrating tools and multiple JVM languages • Started by Dr. Jay Vyas, prinicipal software engineer at Red Hat, Inc.
  • 31. New focus and target end users Data engineers vs distro builders Enhance Operations/Deployment Reference implementations & tutorials
  • 32. Data data data… Smarter/Realistic test data -bigpetstore -bigtop-bazaar -weather data gen Tutorial/Learning Data sets -githubarchive.org -more tbd…