SlideShare a Scribd company logo
1 of 39
Hadoop 23 (dotNext):
                 Experiences,
Customer Impact & Deployment
  Hadoop User Group Sunnyvale Meet up – 17 October 2012
  Viraj Bhat: viraj@yahoo-inc.com
About Me
 β€’ Principal Engg in the Yahoo! Grid Team since May 2008
 β€’ PhD from Rutgers University, NJ
    – Specialization in Data Streaming, Grid, Autonomic Computing
 β€’ Worked on streaming data from live simulations executing in
   NERSC (CA), ORNL (TN) to Princeton Plasma Physics Lab (PPPL -
   NJ)
    – Library introduce less then 5% overhead on computation
 β€’ PhD Thesis on In-Transit data processing for peta-scale simulation
   workflows
 β€’ Developed CorbaCoG kit for Globus
 β€’ Active contributor to Hadoop Apache, Pig, HCat and developer of
   Hadoop Vaidya



                                   -2-
Agenda

 β€’ Overview and Introduction
 β€’ YARN
 β€’ Federation
 β€’ Hadoop 23 Experiences




                               -3-
Hadoop Technology Stack at Yahoo!
β€’   HDFS – Distributed File System                        Oozie
β€’   Map/Reduce – Data Processing
    Paradigm                                            HCatalog
β€’   HBase and HFile – columnar
    storage                                     Hive               PIG
β€’   PIG – Data Processing Language
β€’   HIVE – SQL like query processing                     Map Reduce
    language
β€’   HCatalog – Table abstraction on             HBase
    top of big data allows interaction
    with Pig and Hive
                                          File Format (HFile)
β€’   Oozie – Workflow Management
    System
                                                          HDFS

                                                                         4


                                         -4-
Hadoop 0.23 (dotNext) Highlights
 β€’ Major Hadoop release adopted by Yahoo! in over 2
   years (after Hadoop 0.20)
    – Built and stabilized by the Yahoo! Champaign Hadoop team
 β€’ Primary focus is scalability
    – YARN aka MRv2 – Job run reliability
        β€’ Agility & Evolution
    – HDFS Federation – larger namespace & scalability
        β€’ Larger aggregated namespace
        β€’ Helps for better storage consolidation in Yahoo!
        β€’ Undergoing customer testing

 β€’ Hadoop 23 release does not target availability
        β€’ Addressed in Hadoop 2.0 and beyond

                                    -5-
Hadoop 23 Story at Yahoo!
 β€’ Extra effort was taken in Yahoo! to certify applications
   with Hadoop 23
 β€’ Sufficient time was provided for users to test their
   applications in Hadoop 23
 β€’ Users are encouraged to get accounts to test if their
   applications run on a sandbox cluster which has
   Hadoop 23 installed
 β€’ Roll Out Plan – In Progress
    – Q4-2012 through Q1 2013 Hadoop 23 will be installed in
      a phased manner on 50k nodes at Yahoo!
    – 3 Large Customer Grids were successfully upgraded to
      Hadoop 23

                             -6-
YET ANOTHER RESOURCE
NEGOTIATOR (YARN)

NEXT GENERATION OF HADOOP MAP-REDUCE


                 -7-
Hadoop MapReduce in Hadoop 1.0.2
 β€’ JobTracker
    – Manages cluster resources and job
      scheduling
 β€’ TaskTracker
    – Per-node agent
    – Manage tasks




                               -8-
Paradigm shift with Hadoop 23
 β€’ Split up the two major functions of JobTracker
    – Cluster resource management
    – Application life-cycle management
 β€’ MapReduce becomes user-land library




                                -9-
Components of YARN
 β€’ Resource Manager
    – Global resource scheduler
    – Hierarchical queues
 β€’ Node Manager
    – Per-machine agent
    – Manages the life-cycle of container
    – Container resource monitoring
 β€’ Application Master
    – Per-application
    – Manages application scheduling and task execution




                                  - 10 -
Architecture of YARN




                       - 11 -
Architecture of YARN




                       - 12 -
Architecture of YARN




                       - 13 -
Experiences of YARN – High Points
β€’ Scalable
   – Largest YARN cluster in the world built at Yahoo! running on
     (Hadoop 0.23.3), with no scalability issues so far 
   – Ran tests to validate that YARN should scale to 10,000 nodes.
β€’ Surprisingly Stable
β€’ Web Services
β€’ Better Utilization of Resources at Yahoo!
   – No fixed partitioning between Map and Reduce Tasks
   – Latency from resource available to resource re-assigned is far
     better than 1.x in big clusters




                                  - 14 -
Performance (0.23.3 vs. 1.0.2)
 β€’ HDFS

   – Read (Throughput 5.37% higher)

 β€’ MapReduce

   – Sort (Runtime 4.59% smaller, Throughput 3.98% higher)

   – Shuffle (Shuffle Time 13.25% smaller)

   – Gridmix (Runtime 5.29% smaller)
   – Small Jobs – Uber AM (Word Count 3.5x faster, 27.7x
     fewer resources)



                             - 15 -
Synergy with new Compute Paradigms
β€’ MPI (www.open-mpi.org nightly snapshot)
β€’ Machine Learning (Spark)
β€’ Real-time Streaming (S4 and Storm coming soon)
β€’ Graph Processing (GIRAPH-13 coming soon)




                         - 16 -
The Not So Good
 β€’ Oozie on YARN can have potential deadlocks (MAPREDUCE-
   4304)
    – UberAM can mitigate this
 β€’ Some UI scalability issues (YARN-151, MAPREDUCE-4720)
    – Some pages download very large tables and paginate in
      JavaScript
 β€’ Minor incompatibilities in the distributed cache
 β€’ No generic history server (MAPREDUCE-3061)
 β€’ AM failures hard to debug (MAPREDUCE-4428, MAPREDUCE-
   3688)




                                 - 17 -
HADOOP 23 FEATURES
HDFS FEDERATION

           - 18 -
Non Federated HDFS Architecture




                 - 19 -
Non Federated HDFS Architecture
                                                       β€’ Single Namespace Volume
                                                          – Namespace Volume = Namespace +
Block Storage Namespace




                          Namenode
                                                             Blocks
                                       NS

                               Block Management
                                                       β€’ Single namenode with a namespace
                                                          – Entire namespace is in memory

                          Datanode          Datanode      – Provides Block Management
                                     Storage           β€’ Datanodes store block replicas
                                                          – Block files stored on local file system




                                                             - 20 -
Limitation - Single Namespace
 β€’ Scalability
    – Storage scales horizontally - namespace doesn’t
    – Limited number of files, dirs and blocks
        β€’ 250 million files and blocks at 64GB Namenode heap size
 β€’ Performance
    – File system operations throughput limited by a single node
       β€’ 120K read ops/sec and 6000 write ops/sec
 β€’ Poor Isolation
    – All the tenants share a single namespace
        β€’ Separate volume for tenants is not possible
    – Lacks separate namespace for different categories of applications
        β€’ Experimental apps can affect production apps
        β€’ Example - HBase could use its own namespace
 β€’ Isolation is problem, even in a small cluster


                                     - 21 -
HDFS Federation
         Namespace       NN-1                    NN-k                   NN-n

                                                                            Foreign NS
                                NS1                     NS k                    n
                                           ...                    ...


                                  Pool 1             Pool k               Pool n
         Block Storage




                                                   Block Pools




                           Datanode 1              Datanode 2             Datanode m
                                ...                     ...                    ...
                                                 Common Storage

β€’   An administrative/operational feature for better managing resources required at Yahoo!
β€’   Multiple independent Namenodes and Namespace Volumes in a cluster
     β€Ί      Namespace Volume = Namespace + Block Pool
β€’   Block Storage as generic storage service
     β€Ί      Set of blocks for a Namespace Volume is called a Block Pool
     β€Ί      DNs store blocks for all the Namespace Volumes – no partitioning
                                                         - 22 -
Managing Namespaces
β€’ Federation has multiple namespaces                                  /
                                                                          Client-side
                                                                          mount-table
β€’ Client-side implementation of mount
  tables
   – No single point of failure                        data project   home      tmp
   – No hotspot for root and top level
     directories
β€’ Applications using Federation should                                           NS4
  use the viewfs:// schema
   – The viewfs:// URI schema can be
     used as the default file system replacing
                                                 NS1        NS2           NS3
     the hdfs:// schema




                                   - 23 -
Hadoop 23 Federation
 β€’ Federation Testing is underway
    – Many ecosystems such as Pig have completed testing
    – Real load testing will only be possible when multiple co-located
      Grids transition to Hadoop 23
 β€’ Adoption of Federation will allow for better consolidation
   storage resources
    – Many data feeds are duplicated across various Grids




                                 - 24 -
HADOOP 23 IMPACT ON END
USERS AND ECOSYSTEM
DEVELOPERS




           - 25 -
Hadoop 23 Command Line
 β€’ New environment variables:
    –   $HADOOP_COMMON_HOME
    –   $HADOOP_MAPRED_HOME
    –   $HADOOP_HDFS_HOME

 β€’ hadoop command to execute mapred or hdfs sub-
   commands has been deprecated
    – Old usage (will work)
             – hadoop queue –showacls
             – hadoop fs -ls
             – hadoop mapred job -kill <job_id>

    – New Usage
             – mapred queue -showacls
             – hdfs dfs –ls <path>
             – mapred job -kill <job_id>




                                     - 26 -
Hadoop 23 Map Reduce
 β€’ An application that is using a version of Hadoop 1.0.2 will not work
   in Hadoop 0.23

 β€’ Hadoop 0.23 version is API compatible with Hadoop 0.20.205 and
   Hadoop 1.0.2
     – Not binary compatible

 β€’ Hadoop Java programs will not require any code change, However
   users have to recompile with Hadoop 0.23
     – If code change is required, please let us know.

 β€’ Streaming applications should work without modifications

 β€’ Hadoop Pipes (using C/C++ interface) application will require a re-
   compilation with new libraries


                                   - 27 -
Hadoop 23 Compatibility - Pig
 β€’ Pig versions 0.9.2, 0.10 and beyond will be fully supported on Hadoop
   0.23
     – Packaging problem: Generating 2 different pig.jar with different versions
       of Hadoop

 β€’ No Changes in Pig script if it uses relative paths in HDFS

 β€’ Changes in pig script is required if HDFS absolute path (hdfs:// ) is
   used
     – HDFS Federation part of Hadoop 23 requires the usage of viewfs:// (HDFS
       discussion to follow)
     – Change hdfs:// schema to use viewfs:// schema


 β€’ Java UDF’s must be re-compiled with Hadoop 23 compatible jar
     – Customer Loaders and Storers in Pig are affected




                                       - 28 -
Hadoop 23 Compatibility - Oozie
 β€’ Oozie 3.1.4 and later versions compatible with Hadoop 23

 β€’ No changes in workflow definition or job properties
    – No need to redeploy the Oozie coordinator jobs


 β€’ Java code, streaming, pipes apps need to be recompiled with
   Hadoop 0.23 jars for binary compatibility

 β€’ Existing user workflow and coordinator definition (XML) should
   continue to work as expected

 β€’ Users β€œresponsibility” to package the right Hadoop 23 compatible
   jars
    β€’   Hadoop 23 compatible pig.jar needs to be packaged for Pig action



                                     - 29 -
Hadoop 23 - Oozie Dev Challenges
 β€’ Learning curve for maven builds
    – Build iterations, local maven staging repo staleness
 β€’ Queue configurations, container allocations require revisiting
   the design
 β€’ Many iterations of Hadoop 23 deployment
    – Overhead to test Oozie compatibility with new release
 β€’ Initial deployment of YARN did not have a view of the
   Application Master (AM) logs
    – Manual ssh to AM for debugging launcher jobs




                                 - 30 -
Hadoop 23 Compatibility - Hive
 β€’ Hive version 0.9 and upwards are fully supported


 β€’ Hive SQL/scripts should continue to work without any
   modification


 β€’ Java UDF’s in Hive must be re-compiled with Hadoop
   23 compatible hive.jar




                            - 31 -
Hadoop 23 – Hive Dev Challenges
 β€’ Deprecation of code in MiniMRCluster that fetches the stack
   trace from the JobTracker β€œno longer” works
    – Extra amount of time in debugging and rewriting test cases
 β€’ Incompatibility of HDFS commands between Hadoop 1.0.2
   and 0.23
    – -rmr vs. -rm -r
    – mkdir vs. mkdir –p
    – Results in fixing tests in new ways or inventing workarounds
      so that they run in both Hadoop 1.0.2 and Hadoop 0.23
 β€’ As Hive uses MapRed API’s; more work required for
   certification
    – Would be good to move to MapReduce API’s (for example: Pig)


                                - 32 -
Hadoop 23 - HCat
 β€’ HCat 0.4 and upwards version is certified to work with
   Hadoop 23




                            - 33 -
Hadoop 23 Job History Log Format
 β€’ History API & Log format have changed
    – Affects all applications and tools that directly use Hadoop
      History API
    – Logs stored as Avro serialization in JSon format
 β€’ Affected many tools which rely on Job Logs
    – Hadoop Vaidya – had to be rewritten with the new
      JobHistoryParser




                                  - 34 -
Hadoop 23 Queues
 β€’ Hadoop 23 has support for Hierarchical Queues
    – In Yahoo! it has been configured as a flat queue to limit
      customer disruption
    – Customer testing is being conducted




                                 - 35 -
32/64 bit JDK 1.7
 β€’ Currently certifying Hadoop 23 and its ecosystems on 32 bit
   1.7 JDK


 β€’ 64 bit 1.7 JDK certification for Hadoop and its ecosystems
   would be taken up in Q1 2013




                               - 36 -
Hadoop 23 Operations and Services
 β€’ Grid Operations at Yahoo! transitioned Hadoop 1.0.2
   Namenode to Hadoop 23 smoothly
    – No data was lost
 β€’ Matched the container configurations on Hadoop 23 clusters
   with the old Map Reduce slots
    – Map Reduce slots were configured based on memory hence
      transition was smooth
 β€’ Scheduling, planning and migration of Hadoop 1.0.2
   applications to Hadoop 23 for about 100+ customers was a
   major task for solutions
    – Many issues were caught in the last minute needed emergency
      fixes (globbing, pig.jar packaging, change in mkdir command )
    – Hadoop 0.23.4 build planned

                                - 37 -
Acknowledgements
 β€’ YARN – Robert Evans, Thomas Graves, Jason Lowe
 β€’ Pig - Rohini Paliniswamy
 β€’ Hive and HCatalog – Chris Drome
 β€’ Oozie – Mona Chitnis and Mohammad Islam
 β€’ Services and Operations – Rajiv Chittajallu and Kimsukh
   Kundu




                              - 38 -
References
 β€’ 0.23 Documentation
    – http://people.apache.org/~acmurthy/hadoop-0.23/
 β€’ 0.23 Release Notes
    – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop-
      project-dist/hadoop-common/releasenotes.html
 β€’ YARN Documentation
    – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop-
      yarn/hadoop-yarn-site/YARN.html
 β€’ HDFS Federation Documentation
    – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop-
      yarn/hadoop-yarn-site/Federation.html



                               - 39 -

More Related Content

What's hot

HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitterctrezzo
Β 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
Β 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop EcosystemJ Singh
Β 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem OverviewGerrit van Vuuren
Β 
Presentation
PresentationPresentation
Presentationch samaram
Β 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopGeorge Ang
Β 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batchboorad
Β 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseAntonio Severien
Β 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteDataWorks Summit
Β 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
Β 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceUwe Printz
Β 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconYiwei Ma
Β 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
Β 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars GeorgeJAX London
Β 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Gavin Heavyside
Β 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksCloudera, Inc.
Β 

What's hot (19)

Introduction to h base
Introduction to h baseIntroduction to h base
Introduction to h base
Β 
Cloud computing era
Cloud computing eraCloud computing era
Cloud computing era
Β 
HBase @ Twitter
HBase @ TwitterHBase @ Twitter
HBase @ Twitter
Β 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
Β 
The Hadoop Ecosystem
The Hadoop EcosystemThe Hadoop Ecosystem
The Hadoop Ecosystem
Β 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
Β 
Presentation
PresentationPresentation
Presentation
Β 
Deploying Grid Services Using Hadoop
Deploying Grid Services Using HadoopDeploying Grid Services Using Hadoop
Deploying Grid Services Using Hadoop
Β 
TriHUG - Beyond Batch
TriHUG - Beyond BatchTriHUG - Beyond Batch
TriHUG - Beyond Batch
Β 
NoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBaseNoSQL: Cassadra vs. HBase
NoSQL: Cassadra vs. HBase
Β 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great TasteIn-memory Caching in HDFS: Lower Latency, Same Great Taste
In-memory Caching in HDFS: Lower Latency, Same Great Taste
Β 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
Β 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
Β 
Hadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduceHadoop 2 - Beyond MapReduce
Hadoop 2 - Beyond MapReduce
Β 
Facebook keynote-nicolas-qcon
Facebook keynote-nicolas-qconFacebook keynote-nicolas-qcon
Facebook keynote-nicolas-qcon
Β 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
Β 
Intro to HBase - Lars George
Intro to HBase - Lars GeorgeIntro to HBase - Lars George
Intro to HBase - Lars George
Β 
Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010Introduction to Hadoop - ACCU2010
Introduction to Hadoop - ACCU2010
Β 
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, HortonworksHadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Hadoop World 2011: HDFS Federation - Suresh Srinivas, Hortonworks
Β 

Similar to Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment

Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructureelliando dias
Β 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
Β 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop OverviewBrian Enochson
Β 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
Β 
Big data and hadoop anupama
Big data and hadoop anupamaBig data and hadoop anupama
Big data and hadoop anupamaAnupama Prabhudesai
Β 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
Β 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File Systemelliando dias
Β 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
Β 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureCloudera, Inc.
Β 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFSKavyaGo
Β 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsDataWorks Summit
Β 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014Plamen Jeliazkov
Β 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Sandeep Kunkunuru
Β 

Similar to Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment (20)

Petabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructurePetabyte scale on commodity infrastructure
Petabyte scale on commodity infrastructure
Β 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Β 
Asbury Hadoop Overview
Asbury Hadoop OverviewAsbury Hadoop Overview
Asbury Hadoop Overview
Β 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
Β 
Big data and hadoop anupama
Big data and hadoop anupamaBig data and hadoop anupama
Big data and hadoop anupama
Β 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
Β 
Hadoop Distributed File System
Hadoop Distributed File SystemHadoop Distributed File System
Hadoop Distributed File System
Β 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
Β 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
Β 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
Β 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
Β 
Strata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and FutureStrata + Hadoop World 2012: HDFS: Now and Future
Strata + Hadoop World 2012: HDFS: Now and Future
Β 
Hadoop - HDFS
Hadoop - HDFSHadoop - HDFS
Hadoop - HDFS
Β 
Managing growth in Production Hadoop Deployments
Managing growth in Production Hadoop DeploymentsManaging growth in Production Hadoop Deployments
Managing growth in Production Hadoop Deployments
Β 
Giraffa - November 2014
Giraffa - November 2014Giraffa - November 2014
Giraffa - November 2014
Β 
Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1Hadoop: Components and Key Ideas, -part1
Hadoop: Components and Key Ideas, -part1
Β 
Hadoop DB
Hadoop DBHadoop DB
Hadoop DB
Β 
Evolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage SubsystemEvolving HDFS to Generalized Storage Subsystem
Evolving HDFS to Generalized Storage Subsystem
Β 
Drop acid
Drop acidDrop acid
Drop acid
Β 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Β 

More from Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
Β 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
Β 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
Β 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
Β 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
Β 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
Β 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
Β 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
Β 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
Β 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
Β 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
Β 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
Β 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
Β 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
Β 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
Β 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
Β 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
Β 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
Β 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
Β 

More from Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Β 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Β 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Β 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Β 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
Β 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Β 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
Β 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Β 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Β 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Β 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Β 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Β 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
Β 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
Β 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Β 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Β 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Β 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
Β 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
Β 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Β 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
Β 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Β 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel AraΓΊjo
Β 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Β 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
Β 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
Β 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
Β 

Recently uploaded (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
Β 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
Β 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Β 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Β 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Β 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
Β 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Β 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Β 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
Β 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Β 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
Β 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Β 

Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment

  • 1. Hadoop 23 (dotNext): Experiences, Customer Impact & Deployment Hadoop User Group Sunnyvale Meet up – 17 October 2012 Viraj Bhat: viraj@yahoo-inc.com
  • 2. About Me β€’ Principal Engg in the Yahoo! Grid Team since May 2008 β€’ PhD from Rutgers University, NJ – Specialization in Data Streaming, Grid, Autonomic Computing β€’ Worked on streaming data from live simulations executing in NERSC (CA), ORNL (TN) to Princeton Plasma Physics Lab (PPPL - NJ) – Library introduce less then 5% overhead on computation β€’ PhD Thesis on In-Transit data processing for peta-scale simulation workflows β€’ Developed CorbaCoG kit for Globus β€’ Active contributor to Hadoop Apache, Pig, HCat and developer of Hadoop Vaidya -2-
  • 3. Agenda β€’ Overview and Introduction β€’ YARN β€’ Federation β€’ Hadoop 23 Experiences -3-
  • 4. Hadoop Technology Stack at Yahoo! β€’ HDFS – Distributed File System Oozie β€’ Map/Reduce – Data Processing Paradigm HCatalog β€’ HBase and HFile – columnar storage Hive PIG β€’ PIG – Data Processing Language β€’ HIVE – SQL like query processing Map Reduce language β€’ HCatalog – Table abstraction on HBase top of big data allows interaction with Pig and Hive File Format (HFile) β€’ Oozie – Workflow Management System HDFS 4 -4-
  • 5. Hadoop 0.23 (dotNext) Highlights β€’ Major Hadoop release adopted by Yahoo! in over 2 years (after Hadoop 0.20) – Built and stabilized by the Yahoo! Champaign Hadoop team β€’ Primary focus is scalability – YARN aka MRv2 – Job run reliability β€’ Agility & Evolution – HDFS Federation – larger namespace & scalability β€’ Larger aggregated namespace β€’ Helps for better storage consolidation in Yahoo! β€’ Undergoing customer testing β€’ Hadoop 23 release does not target availability β€’ Addressed in Hadoop 2.0 and beyond -5-
  • 6. Hadoop 23 Story at Yahoo! β€’ Extra effort was taken in Yahoo! to certify applications with Hadoop 23 β€’ Sufficient time was provided for users to test their applications in Hadoop 23 β€’ Users are encouraged to get accounts to test if their applications run on a sandbox cluster which has Hadoop 23 installed β€’ Roll Out Plan – In Progress – Q4-2012 through Q1 2013 Hadoop 23 will be installed in a phased manner on 50k nodes at Yahoo! – 3 Large Customer Grids were successfully upgraded to Hadoop 23 -6-
  • 7. YET ANOTHER RESOURCE NEGOTIATOR (YARN) NEXT GENERATION OF HADOOP MAP-REDUCE -7-
  • 8. Hadoop MapReduce in Hadoop 1.0.2 β€’ JobTracker – Manages cluster resources and job scheduling β€’ TaskTracker – Per-node agent – Manage tasks -8-
  • 9. Paradigm shift with Hadoop 23 β€’ Split up the two major functions of JobTracker – Cluster resource management – Application life-cycle management β€’ MapReduce becomes user-land library -9-
  • 10. Components of YARN β€’ Resource Manager – Global resource scheduler – Hierarchical queues β€’ Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring β€’ Application Master – Per-application – Manages application scheduling and task execution - 10 -
  • 14. Experiences of YARN – High Points β€’ Scalable – Largest YARN cluster in the world built at Yahoo! running on (Hadoop 0.23.3), with no scalability issues so far  – Ran tests to validate that YARN should scale to 10,000 nodes. β€’ Surprisingly Stable β€’ Web Services β€’ Better Utilization of Resources at Yahoo! – No fixed partitioning between Map and Reduce Tasks – Latency from resource available to resource re-assigned is far better than 1.x in big clusters - 14 -
  • 15. Performance (0.23.3 vs. 1.0.2) β€’ HDFS – Read (Throughput 5.37% higher) β€’ MapReduce – Sort (Runtime 4.59% smaller, Throughput 3.98% higher) – Shuffle (Shuffle Time 13.25% smaller) – Gridmix (Runtime 5.29% smaller) – Small Jobs – Uber AM (Word Count 3.5x faster, 27.7x fewer resources) - 15 -
  • 16. Synergy with new Compute Paradigms β€’ MPI (www.open-mpi.org nightly snapshot) β€’ Machine Learning (Spark) β€’ Real-time Streaming (S4 and Storm coming soon) β€’ Graph Processing (GIRAPH-13 coming soon) - 16 -
  • 17. The Not So Good β€’ Oozie on YARN can have potential deadlocks (MAPREDUCE- 4304) – UberAM can mitigate this β€’ Some UI scalability issues (YARN-151, MAPREDUCE-4720) – Some pages download very large tables and paginate in JavaScript β€’ Minor incompatibilities in the distributed cache β€’ No generic history server (MAPREDUCE-3061) β€’ AM failures hard to debug (MAPREDUCE-4428, MAPREDUCE- 3688) - 17 -
  • 18. HADOOP 23 FEATURES HDFS FEDERATION - 18 -
  • 19. Non Federated HDFS Architecture - 19 -
  • 20. Non Federated HDFS Architecture β€’ Single Namespace Volume – Namespace Volume = Namespace + Block Storage Namespace Namenode Blocks NS Block Management β€’ Single namenode with a namespace – Entire namespace is in memory Datanode Datanode – Provides Block Management Storage β€’ Datanodes store block replicas – Block files stored on local file system - 20 -
  • 21. Limitation - Single Namespace β€’ Scalability – Storage scales horizontally - namespace doesn’t – Limited number of files, dirs and blocks β€’ 250 million files and blocks at 64GB Namenode heap size β€’ Performance – File system operations throughput limited by a single node β€’ 120K read ops/sec and 6000 write ops/sec β€’ Poor Isolation – All the tenants share a single namespace β€’ Separate volume for tenants is not possible – Lacks separate namespace for different categories of applications β€’ Experimental apps can affect production apps β€’ Example - HBase could use its own namespace β€’ Isolation is problem, even in a small cluster - 21 -
  • 22. HDFS Federation Namespace NN-1 NN-k NN-n Foreign NS NS1 NS k n ... ... Pool 1 Pool k Pool n Block Storage Block Pools Datanode 1 Datanode 2 Datanode m ... ... ... Common Storage β€’ An administrative/operational feature for better managing resources required at Yahoo! β€’ Multiple independent Namenodes and Namespace Volumes in a cluster β€Ί Namespace Volume = Namespace + Block Pool β€’ Block Storage as generic storage service β€Ί Set of blocks for a Namespace Volume is called a Block Pool β€Ί DNs store blocks for all the Namespace Volumes – no partitioning - 22 -
  • 23. Managing Namespaces β€’ Federation has multiple namespaces / Client-side mount-table β€’ Client-side implementation of mount tables – No single point of failure data project home tmp – No hotspot for root and top level directories β€’ Applications using Federation should NS4 use the viewfs:// schema – The viewfs:// URI schema can be used as the default file system replacing NS1 NS2 NS3 the hdfs:// schema - 23 -
  • 24. Hadoop 23 Federation β€’ Federation Testing is underway – Many ecosystems such as Pig have completed testing – Real load testing will only be possible when multiple co-located Grids transition to Hadoop 23 β€’ Adoption of Federation will allow for better consolidation storage resources – Many data feeds are duplicated across various Grids - 24 -
  • 25. HADOOP 23 IMPACT ON END USERS AND ECOSYSTEM DEVELOPERS - 25 -
  • 26. Hadoop 23 Command Line β€’ New environment variables: – $HADOOP_COMMON_HOME – $HADOOP_MAPRED_HOME – $HADOOP_HDFS_HOME β€’ hadoop command to execute mapred or hdfs sub- commands has been deprecated – Old usage (will work) – hadoop queue –showacls – hadoop fs -ls – hadoop mapred job -kill <job_id> – New Usage – mapred queue -showacls – hdfs dfs –ls <path> – mapred job -kill <job_id> - 26 -
  • 27. Hadoop 23 Map Reduce β€’ An application that is using a version of Hadoop 1.0.2 will not work in Hadoop 0.23 β€’ Hadoop 0.23 version is API compatible with Hadoop 0.20.205 and Hadoop 1.0.2 – Not binary compatible β€’ Hadoop Java programs will not require any code change, However users have to recompile with Hadoop 0.23 – If code change is required, please let us know. β€’ Streaming applications should work without modifications β€’ Hadoop Pipes (using C/C++ interface) application will require a re- compilation with new libraries - 27 -
  • 28. Hadoop 23 Compatibility - Pig β€’ Pig versions 0.9.2, 0.10 and beyond will be fully supported on Hadoop 0.23 – Packaging problem: Generating 2 different pig.jar with different versions of Hadoop β€’ No Changes in Pig script if it uses relative paths in HDFS β€’ Changes in pig script is required if HDFS absolute path (hdfs:// ) is used – HDFS Federation part of Hadoop 23 requires the usage of viewfs:// (HDFS discussion to follow) – Change hdfs:// schema to use viewfs:// schema β€’ Java UDF’s must be re-compiled with Hadoop 23 compatible jar – Customer Loaders and Storers in Pig are affected - 28 -
  • 29. Hadoop 23 Compatibility - Oozie β€’ Oozie 3.1.4 and later versions compatible with Hadoop 23 β€’ No changes in workflow definition or job properties – No need to redeploy the Oozie coordinator jobs β€’ Java code, streaming, pipes apps need to be recompiled with Hadoop 0.23 jars for binary compatibility β€’ Existing user workflow and coordinator definition (XML) should continue to work as expected β€’ Users β€œresponsibility” to package the right Hadoop 23 compatible jars β€’ Hadoop 23 compatible pig.jar needs to be packaged for Pig action - 29 -
  • 30. Hadoop 23 - Oozie Dev Challenges β€’ Learning curve for maven builds – Build iterations, local maven staging repo staleness β€’ Queue configurations, container allocations require revisiting the design β€’ Many iterations of Hadoop 23 deployment – Overhead to test Oozie compatibility with new release β€’ Initial deployment of YARN did not have a view of the Application Master (AM) logs – Manual ssh to AM for debugging launcher jobs - 30 -
  • 31. Hadoop 23 Compatibility - Hive β€’ Hive version 0.9 and upwards are fully supported β€’ Hive SQL/scripts should continue to work without any modification β€’ Java UDF’s in Hive must be re-compiled with Hadoop 23 compatible hive.jar - 31 -
  • 32. Hadoop 23 – Hive Dev Challenges β€’ Deprecation of code in MiniMRCluster that fetches the stack trace from the JobTracker β€œno longer” works – Extra amount of time in debugging and rewriting test cases β€’ Incompatibility of HDFS commands between Hadoop 1.0.2 and 0.23 – -rmr vs. -rm -r – mkdir vs. mkdir –p – Results in fixing tests in new ways or inventing workarounds so that they run in both Hadoop 1.0.2 and Hadoop 0.23 β€’ As Hive uses MapRed API’s; more work required for certification – Would be good to move to MapReduce API’s (for example: Pig) - 32 -
  • 33. Hadoop 23 - HCat β€’ HCat 0.4 and upwards version is certified to work with Hadoop 23 - 33 -
  • 34. Hadoop 23 Job History Log Format β€’ History API & Log format have changed – Affects all applications and tools that directly use Hadoop History API – Logs stored as Avro serialization in JSon format β€’ Affected many tools which rely on Job Logs – Hadoop Vaidya – had to be rewritten with the new JobHistoryParser - 34 -
  • 35. Hadoop 23 Queues β€’ Hadoop 23 has support for Hierarchical Queues – In Yahoo! it has been configured as a flat queue to limit customer disruption – Customer testing is being conducted - 35 -
  • 36. 32/64 bit JDK 1.7 β€’ Currently certifying Hadoop 23 and its ecosystems on 32 bit 1.7 JDK β€’ 64 bit 1.7 JDK certification for Hadoop and its ecosystems would be taken up in Q1 2013 - 36 -
  • 37. Hadoop 23 Operations and Services β€’ Grid Operations at Yahoo! transitioned Hadoop 1.0.2 Namenode to Hadoop 23 smoothly – No data was lost β€’ Matched the container configurations on Hadoop 23 clusters with the old Map Reduce slots – Map Reduce slots were configured based on memory hence transition was smooth β€’ Scheduling, planning and migration of Hadoop 1.0.2 applications to Hadoop 23 for about 100+ customers was a major task for solutions – Many issues were caught in the last minute needed emergency fixes (globbing, pig.jar packaging, change in mkdir command ) – Hadoop 0.23.4 build planned - 37 -
  • 38. Acknowledgements β€’ YARN – Robert Evans, Thomas Graves, Jason Lowe β€’ Pig - Rohini Paliniswamy β€’ Hive and HCatalog – Chris Drome β€’ Oozie – Mona Chitnis and Mohammad Islam β€’ Services and Operations – Rajiv Chittajallu and Kimsukh Kundu - 38 -
  • 39. References β€’ 0.23 Documentation – http://people.apache.org/~acmurthy/hadoop-0.23/ β€’ 0.23 Release Notes – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop- project-dist/hadoop-common/releasenotes.html β€’ YARN Documentation – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop- yarn/hadoop-yarn-site/YARN.html β€’ HDFS Federation Documentation – http://people.apache.org/~acmurthy/hadoop-0.23/hadoop- yarn/hadoop-yarn-site/Federation.html - 39 -