SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
HKG18-220: State of Big Data on
AArch64 - Apache BigTop
Ganesh Raju, Jun He & Naresh Bhat
Big Data Team, LEG
Agenda
● Why and What is Bigtop
● Bigtop patches for AArch64
● Challenges of porting
● Walk through of setup and installation process
● Demo on Provisioning and Smoke Tests
Why Apache Bigtop?
● Hadoop is a collection of many components
○ Numerous versions (Dependency hell)
○ Lots of patches
○ No stable development environment with certified
binaries
○ No proper integrated tests
● With Bigtop - Build, Deploy in cluster with puppet,
Configure, Install and Test
● Juju orchestration
● Blueprints
● Seamless integration into CI
What is in Apache Bigtop?
● Output
○ A set of binaries(deb and rpm) just like HDP, ODPi, CDH, etc
○ Docker images
○ Docker sandbox images
● Integration code, Packaging code, Deployment code, Orchestration code
● Validation code
○ Integration tests
■ Clean slate provisioning
■ Dependency integration artifacts
■ Versioned test artifacts
■ Plug and play artifacts
■ JVM-based artifacts
○ Packaging tests
○ Smoke tests
● Continuous Integration
Consumers of Bigtop
Some of consumers of Bigtop
● ODPi
● Hortonworks
● Amazon
● Canonical
● EMC
● Pivotal
● Infosys
● Capgemini
● Ebay
● Intel
● TrendMicro
● WANdisco
Bigtop v1.2
A few highlights of the v1.2 series release include:
● 6 Distros, 2 archs (x86, and ppc64le) supported
○ ARM support with v1.3
● A newly introduced Bigtop Sandbox feature
● A faster Docker Provisioner which is rewritten to fully embrace Docker
ecosystem
● OpenJDK 8 support
● Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72 are used
● And many upgrades of the ecosystem projects
(Apex, Crunch, Flume, Ignite, Mahout, Oozie, Phoenix, and many others)
Components of Bigtop
alluxio v1.0.1 Greenplum gpdb v5.0.0-alpha.0 Apache pig v0.15.0
Apache ambari v2.5.0 Apache hadoop v2.7.3 Quantcast qfs v1.1.4
Apache apex v3.5.0 Apache hama v0.7.0 Apache solr v4.10.4
groovy v2.4.10 Apache hbase v1.1.3 Apache spark 1.1 v1.6.2
Apache commons - jsvc v1.0.15 Apache hive v1.2.1 Apache spark 2.0 v2.1.1
Apache tomcat v6.0.45 Apache hue v3.11.0 Apache sqoop v1 v1.4.6
bigtop_utils v1.2.0 Apache ignite v1.9.0 Apache sqoop v2 v1.99.4
Apache crunch v0.14.0 Apache kafka v0.10.1.1 Apache tajo v0.11.1
Pig UDF datafu v1.3.0 kite v1.1.0 Apache tez v0.6.2
Apache flink v1.1.3 Apache mahout v0.12.2 ycsb v0.4.0
Apache flume v1.7.0 Apache oozie v4.3.0 Apache zeppelin v0.7.0
Apache giraph v1.1.0 Apache phoenix v4.9.0-HBase-1.1 Apache zookeeper v3.4.6
Contributions from ARM Ecosystem
● AArch64 CI nodes are running on Linaro DevCloud
○ 3 distros are supported: Debian-9, Fedora-26, Ubuntu-16.04
https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
Contributions from ARM Ecosystem
● Patches to enable components on AArch64
○ Build
■ Hadoop, Solr, Hbase, Ignite, …
○ Package
■ Hama, Solr, Oozie, Hue, …
○ Deploy
■ Service scripts, Automation scripts, Dockerfiles,…
○ Test
■ SmokeTests and Provisioner settings
Contributions from ARM Ecosystem
● Lessons learned
○ Dependency issues
■ Native binaries: protobuf, phantomjs, …
■ Jars: levedb-jni, ignite-shmem, jffi, …
■ Version mismatch: slf4j, log4j, log4j2, …
○ Repos
■ Official release did not support aarch64
● Had to create private/local repo
Challenges
● Cyclic references take a lot of effort to fix
● Though most big data companies all use Bigtop, there has not been
contributions coming in from them
● With founders moving out end of last year, and lead committers changing
focus, the project has lost momentum
Roadmap
● Make Bigtop a 1st class citizen on Kubernetes
● Test out Puppet deployment code in variety of different scenarios including
developers spinning up test clusters via Docker deployer
● Improve Docker deployer to be more developer friendly and hook it back into
Gradle
● Provide predefined sample stacks for specific use cases. For example:
○ Machine Learning: Hadoop+Spark+Zeppelin
○ Streaming: Hadoop+Kafka+Flink
● Create more tests
● Enable Ambari to install Bigtop stack. Utilize the work done for ODPi
Sandbox
● What is Sandbox?
A tool to build and run big data pseudo cluster using Docker
● Command to generate sandbox image
$ ./build.sh -a bigtop -o centos-7 -c "hdfs, yarn, spark, ignite"
● You can do a dry run using option “--dryrun” command
● How to run the sandbox image
$ docker run -d -p 50070:50070 bigtop/sandbox:centos-7_hdfs
Smoke Tests
● Uses yaml file to configure located under
<BIGTOP_SRC_TOP>/provisioner/docker/
● Components to configure
○ docker image
○ distro type
○ components to install
○ components to test
○ JDK
● Environment check
$ ./docker-hadoop.sh -E
● Execution
$ cd provisioner/docker
$ ./docker-hadoop.sh -C <smoke_test_cfg_yaml> -c <node_count> -s -d
Glossary
● Linaro Collaborate page
● Bigtop wiki page
● Smoke Test Collaborate page
● Smoke Test Results
Thank You
#HKG18
HKG18 keynotes and videos on: connect.linaro.org
For further information: www.linaro.org

Más contenido relacionado

La actualidad más candente

Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Databricks
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Giuseppe Paterno'
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 

La actualidad más candente (20)

How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 Best Practice of Compression/Decompression Codes in Apache Spark with Sophia... Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...
 
Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2Filesystem Comparison: NFS vs GFS2 vs OCFS2
Filesystem Comparison: NFS vs GFS2 vs OCFS2
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
The Future of Column-Oriented Data Processing With Apache Arrow and Apache Pa...
 
Greenplum User Case
Greenplum User Case Greenplum User Case
Greenplum User Case
 
Presto overview
Presto overviewPresto overview
Presto overview
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
Room 3 - 7 - Nguyễn Như Phúc Huy - Vitastor: a fast and simple Ceph-like bloc...
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon KimHDFS on Kubernetes—Lessons Learned with Kimoon Kim
HDFS on Kubernetes—Lessons Learned with Kimoon Kim
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
Ceph and RocksDB
Ceph and RocksDBCeph and RocksDB
Ceph and RocksDB
 
Apache Spark overview
Apache Spark overviewApache Spark overview
Apache Spark overview
 

Similar a State of Big Data on ARM64 / AArch64 - Apache Bigtop

Similar a State of Big Data on ARM64 / AArch64 - Apache Bigtop (20)

Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache ApexMaking sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
Making sense of Apache Bigtop's role in ODPi and how it matters to Apache Apex
 
Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO Delivering a bleeding edge community-led openstack distribution: RDO
Delivering a bleeding edge community-led openstack distribution: RDO
 
Docker module 1
Docker module 1Docker module 1
Docker module 1
 
Road to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache HopRoad to NODES - Handling Neo4j Data with Apache Hop
Road to NODES - Handling Neo4j Data with Apache Hop
 
Write microservice in golang
Write microservice in golangWrite microservice in golang
Write microservice in golang
 
Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0
 
Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015Red Hat Forum Benelux 2015
Red Hat Forum Benelux 2015
 
Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022Cloud native IPC for Microservices Workshop @ Containerdays 2022
Cloud native IPC for Microservices Workshop @ Containerdays 2022
 
Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!Multi-stage Docker builds to make building easy!
Multi-stage Docker builds to make building easy!
 
Deploying software at Scale
Deploying software at ScaleDeploying software at Scale
Deploying software at Scale
 
Galera on kubernetes_no_video
Galera on kubernetes_no_videoGalera on kubernetes_no_video
Galera on kubernetes_no_video
 
Gocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous DeploymentGocd – Kubernetes/Nomad Continuous Deployment
Gocd – Kubernetes/Nomad Continuous Deployment
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops Team
 
Docker to the Rescue of an Ops Team
Docker to the Rescue of an Ops TeamDocker to the Rescue of an Ops Team
Docker to the Rescue of an Ops Team
 
ODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro ConnectODPi (Open Data Platform Initiative) - Linaro Connect
ODPi (Open Data Platform Initiative) - Linaro Connect
 
BKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing HadoopBKK16-400B ODPI - Standardizing Hadoop
BKK16-400B ODPI - Standardizing Hadoop
 

Más de Ganesh Raju

Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_Cassandra
Ganesh Raju
 

Más de Ganesh Raju (9)

Technology trends, disruptions and Opportunities
Technology trends, disruptions and OpportunitiesTechnology trends, disruptions and Opportunities
Technology trends, disruptions and Opportunities
 
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
ODPi (Open Data Platform Initiative) - Standardizing Hadoop Ecosystem: Linaro...
 
Apache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro ConnectApache Ambari on ARM Server - Linaro Connect
Apache Ambari on ARM Server - Linaro Connect
 
Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64 Exploring Github Data with Apache Drill on ARM64
Exploring Github Data with Apache Drill on ARM64
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64Data Analytics and Machine Learning: From Node to Cluster on ARM64
Data Analytics and Machine Learning: From Node to Cluster on ARM64
 
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
Smart City Big Data Visualization on 96Boards - Linaro Connect Las Vegas 2016
 
Technology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and OpportunitiesTechnology Trends, Disruptions and Opportunities
Technology Trends, Disruptions and Opportunities
 
Certificate_DataStax_Cassandra
Certificate_DataStax_CassandraCertificate_DataStax_Cassandra
Certificate_DataStax_Cassandra
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

State of Big Data on ARM64 / AArch64 - Apache Bigtop

  • 1. HKG18-220: State of Big Data on AArch64 - Apache BigTop Ganesh Raju, Jun He & Naresh Bhat Big Data Team, LEG
  • 2. Agenda ● Why and What is Bigtop ● Bigtop patches for AArch64 ● Challenges of porting ● Walk through of setup and installation process ● Demo on Provisioning and Smoke Tests
  • 3. Why Apache Bigtop? ● Hadoop is a collection of many components ○ Numerous versions (Dependency hell) ○ Lots of patches ○ No stable development environment with certified binaries ○ No proper integrated tests ● With Bigtop - Build, Deploy in cluster with puppet, Configure, Install and Test ● Juju orchestration ● Blueprints ● Seamless integration into CI
  • 4. What is in Apache Bigtop? ● Output ○ A set of binaries(deb and rpm) just like HDP, ODPi, CDH, etc ○ Docker images ○ Docker sandbox images ● Integration code, Packaging code, Deployment code, Orchestration code ● Validation code ○ Integration tests ■ Clean slate provisioning ■ Dependency integration artifacts ■ Versioned test artifacts ■ Plug and play artifacts ■ JVM-based artifacts ○ Packaging tests ○ Smoke tests ● Continuous Integration
  • 5. Consumers of Bigtop Some of consumers of Bigtop ● ODPi ● Hortonworks ● Amazon ● Canonical ● EMC ● Pivotal ● Infosys ● Capgemini ● Ebay ● Intel ● TrendMicro ● WANdisco
  • 6. Bigtop v1.2 A few highlights of the v1.2 series release include: ● 6 Distros, 2 archs (x86, and ppc64le) supported ○ ARM support with v1.3 ● A newly introduced Bigtop Sandbox feature ● A faster Docker Provisioner which is rewritten to fully embrace Docker ecosystem ● OpenJDK 8 support ● Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72 are used ● And many upgrades of the ecosystem projects (Apex, Crunch, Flume, Ignite, Mahout, Oozie, Phoenix, and many others)
  • 7. Components of Bigtop alluxio v1.0.1 Greenplum gpdb v5.0.0-alpha.0 Apache pig v0.15.0 Apache ambari v2.5.0 Apache hadoop v2.7.3 Quantcast qfs v1.1.4 Apache apex v3.5.0 Apache hama v0.7.0 Apache solr v4.10.4 groovy v2.4.10 Apache hbase v1.1.3 Apache spark 1.1 v1.6.2 Apache commons - jsvc v1.0.15 Apache hive v1.2.1 Apache spark 2.0 v2.1.1 Apache tomcat v6.0.45 Apache hue v3.11.0 Apache sqoop v1 v1.4.6 bigtop_utils v1.2.0 Apache ignite v1.9.0 Apache sqoop v2 v1.99.4 Apache crunch v0.14.0 Apache kafka v0.10.1.1 Apache tajo v0.11.1 Pig UDF datafu v1.3.0 kite v1.1.0 Apache tez v0.6.2 Apache flink v1.1.3 Apache mahout v0.12.2 ycsb v0.4.0 Apache flume v1.7.0 Apache oozie v4.3.0 Apache zeppelin v0.7.0 Apache giraph v1.1.0 Apache phoenix v4.9.0-HBase-1.1 Apache zookeeper v3.4.6
  • 8. Contributions from ARM Ecosystem ● AArch64 CI nodes are running on Linaro DevCloud ○ 3 distros are supported: Debian-9, Fedora-26, Ubuntu-16.04 https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
  • 9. Contributions from ARM Ecosystem ● Patches to enable components on AArch64 ○ Build ■ Hadoop, Solr, Hbase, Ignite, … ○ Package ■ Hama, Solr, Oozie, Hue, … ○ Deploy ■ Service scripts, Automation scripts, Dockerfiles,… ○ Test ■ SmokeTests and Provisioner settings
  • 10. Contributions from ARM Ecosystem ● Lessons learned ○ Dependency issues ■ Native binaries: protobuf, phantomjs, … ■ Jars: levedb-jni, ignite-shmem, jffi, … ■ Version mismatch: slf4j, log4j, log4j2, … ○ Repos ■ Official release did not support aarch64 ● Had to create private/local repo
  • 11. Challenges ● Cyclic references take a lot of effort to fix ● Though most big data companies all use Bigtop, there has not been contributions coming in from them ● With founders moving out end of last year, and lead committers changing focus, the project has lost momentum
  • 12. Roadmap ● Make Bigtop a 1st class citizen on Kubernetes ● Test out Puppet deployment code in variety of different scenarios including developers spinning up test clusters via Docker deployer ● Improve Docker deployer to be more developer friendly and hook it back into Gradle ● Provide predefined sample stacks for specific use cases. For example: ○ Machine Learning: Hadoop+Spark+Zeppelin ○ Streaming: Hadoop+Kafka+Flink ● Create more tests ● Enable Ambari to install Bigtop stack. Utilize the work done for ODPi
  • 13. Sandbox ● What is Sandbox? A tool to build and run big data pseudo cluster using Docker ● Command to generate sandbox image $ ./build.sh -a bigtop -o centos-7 -c "hdfs, yarn, spark, ignite" ● You can do a dry run using option “--dryrun” command ● How to run the sandbox image $ docker run -d -p 50070:50070 bigtop/sandbox:centos-7_hdfs
  • 14. Smoke Tests ● Uses yaml file to configure located under <BIGTOP_SRC_TOP>/provisioner/docker/ ● Components to configure ○ docker image ○ distro type ○ components to install ○ components to test ○ JDK ● Environment check $ ./docker-hadoop.sh -E ● Execution $ cd provisioner/docker $ ./docker-hadoop.sh -C <smoke_test_cfg_yaml> -c <node_count> -s -d
  • 15. Glossary ● Linaro Collaborate page ● Bigtop wiki page ● Smoke Test Collaborate page ● Smoke Test Results
  • 16. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: www.linaro.org