SlideShare una empresa de Scribd logo
1 de 26
Hadoop: Today and Tomorrow
Steve Loughran– Hortonworks
stevel at hortonworks.com
@steveloughran

London, April 2012




© Hortonworks Inc. 2012
About me:
• HP Labs:
   –Deployment, cloud infrastructure, Hadoop-in-Cloud
• Apache – member and committer
   –Ant (author, Ant in Action), Axis 2
   –Hadoop
       –Dynamic deployments
       –Diagnostics on failures
       –Cloud infrastructure integration
• Joined Hortonworks in 2012
   –UK based: R&D + customer engagement



                                                        Page 2
      © Hortonworks Inc. 2012
About Hortonworks
 From developing and running the world's largest Hadoop clusters to
 advancing open source Apache Hadoop for the broader market



                                  Hadoop at Yahoo!
                                     40K+ Servers
                                     170PB Storage
                                   5M+ Monthly Jobs
                                   1000+ Active Users



                                          2011




                               HDP, training & support

                                                                 Page 3
     © Hortonworks Inc. 2012
Where is Hadoop?
• Today: Hadoop 1.x
  –Status & Roadmap

• Tomorrow: Hadoop 2.x
  –YARN
  –HDFS HA

• Enterprise integration



                               Page 4
     © Hortonworks Inc. 2012
Releases slowed with Hadoop take up
                              0.20.0   0.20.1   0.20.2   0.21.0   0.20.20{3,4,5}.0




• 64 Releases
• Branches from the last 2.5 years:
   –0.20.{0,1,2} – Stable release without security
   –0.20.2xx.y – Stable release with security

   –0.21.0 – released, unstable, deprecated
   –0.22.0 – orphan, unstable, lack of community



                                                                             Page 5
Now: two release branches, one dev
Hadoop 1.x
• Stable, used in production systems
• The one to use today


Hadoop 2.0
• The successor
• Not quite ready for use


Hadoop 2.x "trunk"
• Where features & fixes first go in
• If you want to help –start here
                                       Page 6
Today: Hadoop 1.x
• A stable Hadoop release from the ASF
   –Merges various Hadoop 0.20.* branches
     (security, HBase support, …)
   –A stable branch for patching and back-porting
• Highlights:
   –Security
   –HBase support (“append” operation)
   –WebHDFS
   –“new” MapReduce APIs complete & usable
   –Distribution packaging includes RPM files



                                                    Page 7
      © Hortonworks Inc. 2012
WebHDFS: fast direct HTTP access
~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=open

GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54,
GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31,
GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11,
GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33,
...




 Potential Uses:
   Out of cluster access to HDFS
   Cross-cluster, cross version HDFS access
   Native filesystem clients


                         dfs.webhdfs.enabled=true
                                                                         Page 8
       © Hortonworks Inc. 2012
Hortonworks Data Platform HDP1
Based on Hadoop 1.0, adds
 –HCatalog for table and schema management
 –Open APIs for metadata, data movement, app & job
  management
 –Consumable “standard Hadoop” stack:
   Hadoop 1.0.x core (HDFS, MapReduce)
   Pig 0.9.x data flow programming language
   Hive 0.8.x SQL-like language
   HBase 0.92.x column table datastore
   HCatalog 0.3.x table and schema management
   ZooKeeper 3.4.x coordinator

                                                     Page 9
     © Hortonworks Inc. 2012
Post-SQL KVS & Column Tables




Project Voldemort


                               Page 10
    © Hortonworks Inc. 2012
Analysis tooling maturing




                              Pig

                                    DataFu

                                       Page 11
    © Hortonworks Inc. 2012
Ingress


                                             Kafka



                             Fluentd


                               facebook / scribe
                                                     Page 12
   © Hortonworks Inc. 2012
Keep an eye on the graph layer



        Apache
        Giraph
                                 Hama
                             Workshop:
                             Beyond MapReduce



                                                Page 13
   © Hortonworks Inc. 2012
Tomorrow: Hadoop 2.0
• HDFS Federation
  – Clear separation of Namespace and Block Storage
  – Snapshots
  – Improved scalability and isolation
• HDFS HA
  – Active/Standby failover of Namenodes
• Next Generation MapReduce architecture (aka YARN)
  – New architecture enables other application types to plug in
  – Resource Manager a foundation for HA and fault tolerance
• Performance!


                                In beta 2012
                                                                  Page 14
      © Hortonworks Inc. 2012
HDFS HA
                                          ZK        ZK    ZK
                             Heartbeat                             Heartbeat


       FailoverController                                            FailoverController
             Active                                                       Standby

                          Cmds
Monitor Health                                                            Monitor Health
of NN. OS, HW                                                             of NN. OS, HW
                             NN                            NN
                            Active                       Standby



Block Reports to Active & Standby
DN fencing: Update cmds from one

                                     DN        DN         DN
           © Hortonworks Inc. 2012
YARN: foundation of a datacentre OS
                                                   Node
                                                  Manager


                                           Container   App Mstr


           Client

                                Resource           Node
                                Manager           Manager
           Client

                                           App Mstr    Container




             MapReduce Status                      Node
                                                  Manager
               Job Submission
               Node Status
             Resource Request              Container   Container




 Multiple topology-aware applications in a single cluster

     © Hortonworks Inc. 2012
Microsoft embraces Hadoop




 Good for enterprises & developers
 Great for end users!

                                     Page 17
   © Hortonworks Inc. 2012
Oracle accepts NoSQL
May 2011:
  “Don't be risking your data on NoSQL databases.”

Sept 2011:
   “Oracle NoSQL Database provides network-accessible
multi-terabyte distributed key/value pair storage with
predictable latency. ”

• Oracle need compatible SQL & NoSQL business plans
• & to justify high-end servers over “commodity” x86 boxes
• Could drive Hadoop-centric JVM development


                                                             18
      © Hortonworks Inc. 2012
Open Source “Enterprise” Tooling

Application Layer
• Spring Data for Hadoop in Beta
• Cascading → Apache 2.0 License


OS Layer
• RedHat building Hadoop story
• Canonical assisting Hadoop packaging




                                         Page 19
      © Hortonworks Inc. 2012
What does all this mean?




                            Page 20
  © Hortonworks Inc. 2012
facebook: 45 PB, Yahoo! 180+PB




                             Page 21
   © Hortonworks Inc. 2012
Hadoop has the momentum
• Platform: stable version & evolving version
• Tooling & layers: ecosystem
• Commercial training and support
• Adoption by enterprise vendors




                                                Page 22
     © Hortonworks Inc. 2012
Hadoop is the Big Data Platform




                                                 Page 23
© Hortonworks Inc. 2011
Get involved with the Apache project!

•Join the -user mailing lists
  – common-user@hadoop.apache.org
  – hdfs-user@hadoop.apache.org
  – mapreduce-user@hadoop.apache.org
•File bug reports in JIRA
•Contribute to the documentation
•Add: patches, tests, features, …



                                        Page 24
     © Hortonworks Inc. 2012
Questions?

hortonworks.com




                             Page 25
   © Hortonworks Inc. 2012
hortonworks.com




                             Page 26
   © Hortonworks Inc. 2012

Más contenido relacionado

La actualidad más candente

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...StampedeCon
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computingJoey Echeverria
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...Hortonworks
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Cloudera, Inc.
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course WorkshopDataWorks Summit
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultNETWAYS
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataCloudera, Inc.
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNDataWorks Summit
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarHortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNHortonworks
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Hortonworks
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013Michael Noel
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMongoDB
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderDataWorks Summit
 

La actualidad más candente (20)

Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
 
The power of hadoop in cloud computing
The power of hadoop in cloud computingThe power of hadoop in cloud computing
The power of hadoop in cloud computing
 
Hortonworks Technical Workshop - build a yarn ready application with apache ...
Hortonworks Technical Workshop -  build a yarn ready application with apache ...Hortonworks Technical Workshop -  build a yarn ready application with apache ...
Hortonworks Technical Workshop - build a yarn ready application with apache ...
 
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
Hadoop World 2011: The Hadoop Stack - Then, Now and in the Future - Eli Colli...
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices WorkshopHortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks Technical Workshop - Operational Best Practices Workshop
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Internet of things Crash Course Workshop
Internet of things Crash Course WorkshopInternet of things Crash Course Workshop
Internet of things Crash Course Workshop
 
OSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier RenaultOSDC 2013 | Introduction into Hadoop by Olivier Renault
OSDC 2013 | Introduction into Hadoop by Olivier Renault
 
Data Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big DataData Science Day New York: The Platform for Big Data
Data Science Day New York: The Platform for Big Data
 
Enabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARNEnabling Diverse Workload Scheduling in YARN
Enabling Diverse Workload Scheduling in YARN
 
Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix Hortonworks Technical Workshop: HBase and Apache Phoenix
Hortonworks Technical Workshop: HBase and Apache Phoenix
 
Hortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts PresentationHortonworks for Financial Analysts Presentation
Hortonworks for Financial Analysts Presentation
 
YARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider WebinarYARN Ready - Integrating to YARN using Slider Webinar
YARN Ready - Integrating to YARN using Slider Webinar
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 
Apache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARNApache Ambari: Managing Hadoop and YARN
Apache Ambari: Managing Hadoop and YARN
 
Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012Writing Yarn Applications Hadoop Summit 2012
Writing Yarn Applications Hadoop Summit 2012
 
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
SPSMEL 2012 - SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 / 2013
 
Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search Hortonworks Technical Workshop - HDP Search
Hortonworks Technical Workshop - HDP Search
 
Moving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDBMoving from C#/.NET to Hadoop/MongoDB
Moving from C#/.NET to Hadoop/MongoDB
 
Authoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using SliderAuthoring and Hosting Applications on YARN using Slider
Authoring and Hosting Applications on YARN using Slider
 

Similar a Hadoop: today and tomorrow

How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopPOSSCON
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGskumpf
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataHortonworks
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSHortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Mac Moore
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Hortonworks
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsDataWorks Summit
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitDataWorks Summit
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopHortonworks
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Mac Moore
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Hortonworks
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview EMC
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014Hortonworks
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureVinod Kumar Vavilapalli
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramHortonworks
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureDataWorks Summit
 

Similar a Hadoop: today and tomorrow (20)

Inside hadoop-dev
Inside hadoop-devInside hadoop-dev
Inside hadoop-dev
 
How YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in HadoopHow YARN Enables Multiple Data Processing Engines in Hadoop
How YARN Enables Multiple Data Processing Engines in Hadoop
 
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUGReal-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
Real-Time Processing in Hadoop for IoT Use Cases - Phoenix HUG
 
Supporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big DataSupporting Financial Services with a More Flexible Approach to Big Data
Supporting Financial Services with a More Flexible Approach to Big Data
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFSDiscover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
 
Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015Storm Demo Talk - Colorado Springs May 2015
Storm Demo Talk - Colorado Springs May 2015
 
Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]Discover.hdp2.2.h base.final[2]
Discover.hdp2.2.h base.final[2]
 
Introduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI ToolsIntroduction to Microsoft HDInsight and BI Tools
Introduction to Microsoft HDInsight and BI Tools
 
Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?Hortonworks - What's Possible with a Modern Data Architecture?
Hortonworks - What's Possible with a Modern Data Architecture?
 
Internet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop SummitInternet of Things Crash Course Workshop at Hadoop Summit
Internet of Things Crash Course Workshop at Hadoop Summit
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015Storm Demo Talk - Denver Apr 2015
Storm Demo Talk - Denver Apr 2015
 
Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]Discover.hdp2.2.ambari.final[1]
Discover.hdp2.2.ambari.final[1]
 
Hadoop Overview
Hadoop Overview Hadoop Overview
Hadoop Overview
 
YARN - Strata 2014
YARN - Strata 2014YARN - Strata 2014
YARN - Strata 2014
 
Munich HUG 21.11.2013
Munich HUG 21.11.2013Munich HUG 21.11.2013
Munich HUG 21.11.2013
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and FutureHadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
 
Introduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready ProgramIntroduction to the Hortonworks YARN Ready Program
Introduction to the Hortonworks YARN Ready Program
 
Apache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and FutureApache Hadoop YARN: Present and Future
Apache Hadoop YARN: Present and Future
 

Más de Steve Loughran

The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is overSteve Loughran
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionSteve Loughran
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!Steve Loughran
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()Steve Loughran
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming DeployedSteve Loughran
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveSteve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupSteve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSteve Loughran
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresSteve Loughran
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object StoresSteve Loughran
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionSteve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateSteve Loughran
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARNSteve Loughran
 

Más de Steve Loughran (20)

Hadoop Vectored IO
Hadoop Vectored IOHadoop Vectored IO
Hadoop Vectored IO
 
The age of rename() is over
The age of rename() is overThe age of rename() is over
The age of rename() is over
 
What does Rename Do: (detailed version)
What does Rename Do: (detailed version)What does Rename Do: (detailed version)
What does Rename Do: (detailed version)
 
Put is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit EditionPut is the new rename: San Jose Summit Edition
Put is the new rename: San Jose Summit Edition
 
@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!@Dissidentbot: dissent will be automated!
@Dissidentbot: dissent will be automated!
 
PUT is the new rename()
PUT is the new rename()PUT is the new rename()
PUT is the new rename()
 
Extreme Programming Deployed
Extreme Programming DeployedExtreme Programming Deployed
Extreme Programming Deployed
 
Testing
TestingTesting
Testing
 
I hate mocking
I hate mockingI hate mocking
I hate mocking
 
What does rename() do?
What does rename() do?What does rename() do?
What does rename() do?
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and HiveDancing Elephants: Working with Object Storage in Apache Spark and Hive
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
 
Apache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User GroupApache Spark and Object Stores —for London Spark User Group
Apache Spark and Object Stores —for London Spark User Group
 
Spark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object storesSpark Summit East 2017: Apache spark and object stores
Spark Summit East 2017: Apache spark and object stores
 
Hadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object StoresHadoop, Hive, Spark and Object Stores
Hadoop, Hive, Spark and Object Stores
 
Apache Spark and Object Stores
Apache Spark and Object StoresApache Spark and Object Stores
Apache Spark and Object Stores
 
Household INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony EraHousehold INFOSEC in a Post-Sony Era
Household INFOSEC in a Post-Sony Era
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 editionHadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
 
Hadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the GateHadoop and Kerberos: the Madness Beyond the Gate
Hadoop and Kerberos: the Madness Beyond the Gate
 
Slider: Applications on YARN
Slider: Applications on YARNSlider: Applications on YARN
Slider: Applications on YARN
 
YARN Services
YARN ServicesYARN Services
YARN Services
 

Último

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 

Último (20)

A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 

Hadoop: today and tomorrow

  • 1. Hadoop: Today and Tomorrow Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran London, April 2012 © Hortonworks Inc. 2012
  • 2. About me: • HP Labs: –Deployment, cloud infrastructure, Hadoop-in-Cloud • Apache – member and committer –Ant (author, Ant in Action), Axis 2 –Hadoop –Dynamic deployments –Diagnostics on failures –Cloud infrastructure integration • Joined Hortonworks in 2012 –UK based: R&D + customer engagement Page 2 © Hortonworks Inc. 2012
  • 3. About Hortonworks From developing and running the world's largest Hadoop clusters to advancing open source Apache Hadoop for the broader market Hadoop at Yahoo! 40K+ Servers 170PB Storage 5M+ Monthly Jobs 1000+ Active Users 2011 HDP, training & support Page 3 © Hortonworks Inc. 2012
  • 4. Where is Hadoop? • Today: Hadoop 1.x –Status & Roadmap • Tomorrow: Hadoop 2.x –YARN –HDFS HA • Enterprise integration Page 4 © Hortonworks Inc. 2012
  • 5. Releases slowed with Hadoop take up 0.20.0 0.20.1 0.20.2 0.21.0 0.20.20{3,4,5}.0 • 64 Releases • Branches from the last 2.5 years: –0.20.{0,1,2} – Stable release without security –0.20.2xx.y – Stable release with security –0.21.0 – released, unstable, deprecated –0.22.0 – orphan, unstable, lack of community Page 5
  • 6. Now: two release branches, one dev Hadoop 1.x • Stable, used in production systems • The one to use today Hadoop 2.0 • The successor • Not quite ready for use Hadoop 2.x "trunk" • Where features & fixes first go in • If you want to help –start here Page 6
  • 7. Today: Hadoop 1.x • A stable Hadoop release from the ASF –Merges various Hadoop 0.20.* branches (security, HBase support, …) –A stable branch for patching and back-porting • Highlights: –Security –HBase support (“append” operation) –WebHDFS –“new” MapReduce APIs complete & usable –Distribution packaging includes RPM files Page 7 © Hortonworks Inc. 2012
  • 8. WebHDFS: fast direct HTTP access ~:$ GET http://nnode:50070/webhdfs/v1/results/part-r-00000.csv?op=open GATE4,eb8bd736445f415e18886ba037f84829,55000,2007-01-14,14:01:54, GATE4,ec58edcce1049fa665446dc1fa690638,8030803000,2007-01-14,13:52:31, GATE4,b6f07ce00f09035a6683c5e93e3c04b8,30000,2007-01-28,12:41:11, GATE4,a1bc345b756090854e9dd0011087c6c0,30000,2007-01-28,12:59:33, ... Potential Uses: Out of cluster access to HDFS Cross-cluster, cross version HDFS access Native filesystem clients dfs.webhdfs.enabled=true Page 8 © Hortonworks Inc. 2012
  • 9. Hortonworks Data Platform HDP1 Based on Hadoop 1.0, adds –HCatalog for table and schema management –Open APIs for metadata, data movement, app & job management –Consumable “standard Hadoop” stack: Hadoop 1.0.x core (HDFS, MapReduce) Pig 0.9.x data flow programming language Hive 0.8.x SQL-like language HBase 0.92.x column table datastore HCatalog 0.3.x table and schema management ZooKeeper 3.4.x coordinator Page 9 © Hortonworks Inc. 2012
  • 10. Post-SQL KVS & Column Tables Project Voldemort Page 10 © Hortonworks Inc. 2012
  • 11. Analysis tooling maturing Pig DataFu Page 11 © Hortonworks Inc. 2012
  • 12. Ingress Kafka Fluentd facebook / scribe Page 12 © Hortonworks Inc. 2012
  • 13. Keep an eye on the graph layer Apache Giraph Hama Workshop: Beyond MapReduce Page 13 © Hortonworks Inc. 2012
  • 14. Tomorrow: Hadoop 2.0 • HDFS Federation – Clear separation of Namespace and Block Storage – Snapshots – Improved scalability and isolation • HDFS HA – Active/Standby failover of Namenodes • Next Generation MapReduce architecture (aka YARN) – New architecture enables other application types to plug in – Resource Manager a foundation for HA and fault tolerance • Performance! In beta 2012 Page 14 © Hortonworks Inc. 2012
  • 15. HDFS HA ZK ZK ZK Heartbeat Heartbeat FailoverController FailoverController Active Standby Cmds Monitor Health Monitor Health of NN. OS, HW of NN. OS, HW NN NN Active Standby Block Reports to Active & Standby DN fencing: Update cmds from one DN DN DN © Hortonworks Inc. 2012
  • 16. YARN: foundation of a datacentre OS Node Manager Container App Mstr Client Resource Node Manager Manager Client App Mstr Container MapReduce Status Node Manager Job Submission Node Status Resource Request Container Container Multiple topology-aware applications in a single cluster © Hortonworks Inc. 2012
  • 17. Microsoft embraces Hadoop Good for enterprises & developers Great for end users! Page 17 © Hortonworks Inc. 2012
  • 18. Oracle accepts NoSQL May 2011: “Don't be risking your data on NoSQL databases.” Sept 2011: “Oracle NoSQL Database provides network-accessible multi-terabyte distributed key/value pair storage with predictable latency. ” • Oracle need compatible SQL & NoSQL business plans • & to justify high-end servers over “commodity” x86 boxes • Could drive Hadoop-centric JVM development 18 © Hortonworks Inc. 2012
  • 19. Open Source “Enterprise” Tooling Application Layer • Spring Data for Hadoop in Beta • Cascading → Apache 2.0 License OS Layer • RedHat building Hadoop story • Canonical assisting Hadoop packaging Page 19 © Hortonworks Inc. 2012
  • 20. What does all this mean? Page 20 © Hortonworks Inc. 2012
  • 21. facebook: 45 PB, Yahoo! 180+PB Page 21 © Hortonworks Inc. 2012
  • 22. Hadoop has the momentum • Platform: stable version & evolving version • Tooling & layers: ecosystem • Commercial training and support • Adoption by enterprise vendors Page 22 © Hortonworks Inc. 2012
  • 23. Hadoop is the Big Data Platform Page 23 © Hortonworks Inc. 2011
  • 24. Get involved with the Apache project! •Join the -user mailing lists – common-user@hadoop.apache.org – hdfs-user@hadoop.apache.org – mapreduce-user@hadoop.apache.org •File bug reports in JIRA •Contribute to the documentation •Add: patches, tests, features, … Page 24 © Hortonworks Inc. 2012
  • 25. Questions? hortonworks.com Page 25 © Hortonworks Inc. 2012
  • 26. hortonworks.com Page 26 © Hortonworks Inc. 2012

Notas del editor

  1. Picking on what is really new in this release compared to just merges and stability, webhdfs is something interesting.Set one config option and the DNs and NNs become web servers (using the chosen auth mechanism), offering read and write access to the data.This is integral to the cluster -you ask the NN for data, which triggers a 307 redirect to a DN with the data, which serves up up. A redirect that is handled transparently by all HTTP clients set up to handle redirects.
  2. This is what we're going to be shipping based on Hadoop 1.0, a packaging of the core Hadoop stack with management tooling
  3. There's a set of nosql databases running on or near HadoopApache HBase is the key one -look at the facebook papers on FB chat to see how this works in the field.Cassandra -not directly dependent on Hadoop but you can run Pig and Hive queries against its data, and it implements the HDFS filesystem API so you can host TTs on the same nodes as your cassandra data and get data-local work.Accumulo is going to be mentioned as it is in incubation, donated to the ASF by the NSA. Apparently it has good security on access to keys and values, which shows that some orgs put security ahead of other features in nosql-land, and that government orgs are starting to play in this space -and contribute code back.
  4. -don’t' write at the Java level if you can help it, both Pig and Hive are a lot more productive.SQL houses should play with Hive.Pig is very good for experimentation, and is ability to call User Defined Functions lets you re-use tuned Java libraries -such as LinkedIn's DataFu
  5. Lots of ways to get data in. Most are focused on streaming from other servers in the same datacentre -like web servers, and collecting the logs.Scribe is designed to scale up well, with the option of discarding data under heavy loadKafka is from LinkedIn, nice code which can hook up behind log4j.
  6. If you are doing anything w/ social networks, connecting events, locations together etc, the graph layer should be of interest -it's up and coming as the next layer in the stack.There are two projects in the apache incubatorHama: graph layer with a big driver being a telcoGiraph -ex Y!, LinkedIn are using this.There's a workshop after Berlin Buzzwords on "beyond MR" that I'm co-organising; Giraph will be one of the topics there (along w/ YARN and Stratosphere)
  7. Hadoop
  8. This is the architecture of HDFS HA, skipping bits of the details and the roadmap of when features come out. Active/Standby HA, not shared-write (much, much harder). Failover initially manual, moving to automated.Failure controllers monitor NN health, and heartbeat to ZK so that others in the ZK farm can detect failures. DNs report to both, but only listen to one
  9. Hadoop 2 breaks up the JT into two tasks: the Resource Manager, which manages allocation of resources on servers, with the JT, which now becomes one of the possible “Application Masters” that can be deployed in a cluster. Breaking this up allows you toRun different JT's for different users & different versions of the MR APIs. (Facebook do this in their clusters with a static striping of TT's today)Run other topology-aware applications
  10. The NoSQL business plan is a key issue here -politics and marketing not technologyDB business pricing always put an upper financial limit on big dataOracle liked to own the customer data (and had loyal DBA support)Move to vertical solutions promised best hardware and discounting opportunities, but removed flexibility ('the IBM model')Hadoop challenges this: generic servers with many HDD, open source softwareThey will need to add something to Hadoop/HDFS that stops you moving away or getting support for others. Looking at the hardware, that could either be very-low latency IPC (benefits?) or something integrating SSD into the system (preheating SSD caching of queued job data, …?//)Closing on a brighter note, my colleagues and I have tales of terror from playing w/ JVM options on a big cluster, as you can be confident of reaching all corner cases within a short period of time. If Oracle start using Hadoop as a driver for JVM performance and qualification -and return those tweaks to openjdk, we all benefit.
  11. Last but very much not least, there's growing integration of Hadoop in the OSS world. App levelSpring has a Spring Data for Hadoop project in Beta, which lets you integrate HDFS, MR and Pig jobs within a Spring application -as well as Cascading. You can do workflows here and really integrate with enterprise apps, especially if you use Spring already.Cascading, the Hadoop workflow language, has moved to an Apache License, to remove worries about GPL contamination of your codeAlso of interest is the fact that the Linux vendors are taking Hadoop seriously -which can only improve testing and stability of Hadoop.Finally, off this sheet: R connector for Hadoop-the statisticians get integration from their World -R- to the new datasets
  12. Facebook, Prineville, 45MB, one single cluster. Yahoo!, 180 PBIt means that Hadoop installations are becoming the largest known storage and compute systems in the planet. It's unlikely anyone in this audience's storage or B/W requirements will be as big, but for those in the audience who want them to become as big, Hadoop makes it possible both technically and financially
  13. The other thing it means is this: nothing else has the momentum and the support.People may say "ours is better", but that's like saying Solaris was better than Linux, or the 68K was better than the intel 8086. Better doesn't win. More valuable does, and because of its growing support, layers above, adoption and the ecosystem, it has the edge.This isn't an excuse to get complacent: Spring killed Java EE, even though EJB once had everything going for it.