SlideShare a Scribd company logo
1 of 10
The State of the Apache
  Hadoop Ecosystem

         Doug Cutting
      Cloudera & Apache
Outline
● the ecosystem
    ○   why we need it
    ○   what it is
    ○   why its strong
    ○   how it can evolve
●   highlights
    ○ current
    ○ next
●   wrap up
Why are we here?

Hardware has improved
  ● exponentially for decades
  ● both storage and compute

We can now store and process much more!
  ○ yet have been slow to leverage


Analyzing more data makes us smarter.
  ○ Norvig's Unreasonable Effectiveness of Data
The Ecosystem is the System
● Hadoop has become the kernel
  ○ of the distributed operating system for Big Data
  ○ a de-facto industry standard


● No one uses the kernel alone

● A collection of projects at Apache
Strengths of Apache
Mandates diversity & transparency
  ○ you control your fate

Insures against vendor lock-in
   ○ can't buy the ASF

Allows competing projects
    ○ survival of the fittest

Ecosystem as loose federation
   ○ lets platform evolve
What's new?
● Apache Hadoop 0.20.205
    ○ append
    ○ security


●   CDH3
    ○ Mahout included
    ○ Avro support across components
What's next?
● Apache Hadoop 0.23
   ○ HDFS
     ■ performance
     ■ scalability (federation)
     ■ availability (HA)
   ○ MR2


● CDH4
   ○ includes Hadoop 0.23
   ○ BigTop-based


● S4, Giraph, Crunch, Blur, ...
Apache BigTop (incubating)
Ecosystem as a project
  ○   integration tests       Includes:
  ○   compatible versions     ●   Hadoop
  ○   common packaging        ●   HBase
  ○   release is a set        ●   Zookeeper
                              ●   Avro
                              ●   Hive
Basis for CDH                 ●   Pig
  ○ like Fedora is for RHEL   ●   Oozie
                              ●   Flume
                              ●   Mahout
Community driven              ●   ...
Join the community
Hadoop and Big Data are still young.
  Hardware trends will continue.

Hadoop started with just two developers.
  Now it has hundreds.
  You can be the next.
  What do you need?
Thanks!
Questions?

More Related Content

Viewers also liked

SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
Tiago Rafael
 
Legal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best PracticesLegal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best Practices
Zapproved
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
DataWorks Summit
 

Viewers also liked (13)

SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
SLIDES DA SITUAÇÃO DE APRENDIZAGEM 4 - 2º ANO VOL.1
 
Legal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best PracticesLegal Hold and Data Preservation Best Practices
Legal Hold and Data Preservation Best Practices
 
Unik Slides
Unik SlidesUnik Slides
Unik Slides
 
4 infatec02
4 infatec024 infatec02
4 infatec02
 
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
El uso de la tecnología para aumentar el aprovechamiento académico en las cie...
 
Trend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache Bigtop
 
BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)BLAST(Basic Local Alignment Tool)
BLAST(Basic Local Alignment Tool)
 
4 infatec03
4 infatec034 infatec03
4 infatec03
 
Docker Based Hadoop Provisioning
Docker Based Hadoop ProvisioningDocker Based Hadoop Provisioning
Docker Based Hadoop Provisioning
 
4 infatec06
4 infatec064 infatec06
4 infatec06
 
Mutação gênica
Mutação gênicaMutação gênica
Mutação gênica
 
Desigualdade de gênero
Desigualdade de gêneroDesigualdade de gênero
Desigualdade de gênero
 
Sceneries
SceneriesSceneries
Sceneries
 

More from Cloudera, Inc.

More from Cloudera, Inc. (20)

Partner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptxPartner Briefing_January 25 (FINAL).pptx
Partner Briefing_January 25 (FINAL).pptx
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists Cloudera Data Impact Awards 2021 - Finalists
Cloudera Data Impact Awards 2021 - Finalists
 
2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists2020 Cloudera Data Impact Awards Finalists
2020 Cloudera Data Impact Awards Finalists
 
Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019Edc event vienna presentation 1 oct 2019
Edc event vienna presentation 1 oct 2019
 
Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19Machine Learning with Limited Labeled Data 4/3/19
Machine Learning with Limited Labeled Data 4/3/19
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19Introducing Cloudera DataFlow (CDF) 2.13.19
Introducing Cloudera DataFlow (CDF) 2.13.19
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19Introducing Cloudera Data Science Workbench for HDP 2.12.19
Introducing Cloudera Data Science Workbench for HDP 2.12.19
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
 
Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19Leveraging the cloud for analytics and machine learning 1.29.19
Leveraging the cloud for analytics and machine learning 1.29.19
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18Leveraging the Cloud for Big Data Analytics 12.11.18
Leveraging the Cloud for Big Data Analytics 12.11.18
 
Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3Modern Data Warehouse Fundamentals Part 3
Modern Data Warehouse Fundamentals Part 3
 
Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2Modern Data Warehouse Fundamentals Part 2
Modern Data Warehouse Fundamentals Part 2
 
Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1Modern Data Warehouse Fundamentals Part 1
Modern Data Warehouse Fundamentals Part 1
 
Extending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the PlatformExtending Cloudera SDX beyond the Platform
Extending Cloudera SDX beyond the Platform
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18Federated Learning: ML with Privacy on the Edge 11.15.18
Federated Learning: ML with Privacy on the Edge 11.15.18
 
Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360Analyst Webinar: Doing a 180 on Customer 360
Analyst Webinar: Doing a 180 on Customer 360
 
Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18Build a modern platform for anti-money laundering 9.19.18
Build a modern platform for anti-money laundering 9.19.18
 
Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18Introducing the data science sandbox as a service 8.30.18
Introducing the data science sandbox as a service 8.30.18
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Hadoop World 2011 Keynote: The State of the Apache Hadoop Ecosystem

  • 1. The State of the Apache Hadoop Ecosystem Doug Cutting Cloudera & Apache
  • 2. Outline ● the ecosystem ○ why we need it ○ what it is ○ why its strong ○ how it can evolve ● highlights ○ current ○ next ● wrap up
  • 3. Why are we here? Hardware has improved ● exponentially for decades ● both storage and compute We can now store and process much more! ○ yet have been slow to leverage Analyzing more data makes us smarter. ○ Norvig's Unreasonable Effectiveness of Data
  • 4. The Ecosystem is the System ● Hadoop has become the kernel ○ of the distributed operating system for Big Data ○ a de-facto industry standard ● No one uses the kernel alone ● A collection of projects at Apache
  • 5. Strengths of Apache Mandates diversity & transparency ○ you control your fate Insures against vendor lock-in ○ can't buy the ASF Allows competing projects ○ survival of the fittest Ecosystem as loose federation ○ lets platform evolve
  • 6. What's new? ● Apache Hadoop 0.20.205 ○ append ○ security ● CDH3 ○ Mahout included ○ Avro support across components
  • 7. What's next? ● Apache Hadoop 0.23 ○ HDFS ■ performance ■ scalability (federation) ■ availability (HA) ○ MR2 ● CDH4 ○ includes Hadoop 0.23 ○ BigTop-based ● S4, Giraph, Crunch, Blur, ...
  • 8. Apache BigTop (incubating) Ecosystem as a project ○ integration tests Includes: ○ compatible versions ● Hadoop ○ common packaging ● HBase ○ release is a set ● Zookeeper ● Avro ● Hive Basis for CDH ● Pig ○ like Fedora is for RHEL ● Oozie ● Flume ● Mahout Community driven ● ...
  • 9. Join the community Hadoop and Big Data are still young. Hardware trends will continue. Hadoop started with just two developers. Now it has hundreds. You can be the next. What do you need?