SlideShare a Scribd company logo
1 of 21
Download to read offline
HDFS Federation
Sanjay Radia, Hadoop Architect
Yahoo! Inc



Apache Hadoop
India Summit 2011
                                 1
Outline

                                      Hadoop Components
• HDFS - Quick overview       HDFS           Distributed file
• Scaling HDFS - Federation                  system
                              MapReduce      Distributed
                                             computation
                              HBase          Column store

                              Pig            Dataflow language

                              Hive           Data warehouse

                              Zookeeper      Distributed
                                             coordination
                              Avro           Data Serialization

                              Oozie          Workflow
3
HDFS
                                       Namespace Metadata &
                                             Journal


 Backup                    Namespace        Block
 Namenode                  State            Map


Hierarchal Namespace
                          Namenode                   Block ID  Block Locations
File Name  BlockIDs

                                 Heartbeats & Block Reports


                                           Datanodes

b1   b3      b2    b4      b1   b3                            b3    b2             b6
                                        Block ID  Data
b2           b3            b5                                 b5              b5   b4




                        Horizontally Scale IO and Storage                               4
HDFS
                  Client reads and writes


                              Namespace    Block
                              State        Map

              1 open                                     1 create
                             Namenode

Client                                                              Client

              2 read              End-to-end checksum
                                                                        2 write


b1       b3        b2   b4   b1   b3                b3    b2                 b6

b2                 b3        b5                     b5                  b5   b4
                                           write               write

                                       Datanodes
                                                                                  5
HDFS Architecture :
    Computation close to the data


                           Hadoop Cluster
Data
Data data data data data
Data data data data data                     Block 1         Block 1
Data data data data data
                                                          Block 1
Data data data data data                                                      Results
Data data data data data                                          MAP         Data data data data
Data data data data data      Block 2                                         Data data data data
                                                                              Data data data data
Data data data data data         Block 2     MAP                              Data data data data
Data data data data data                                   Reduce             Data data data data
Data data data data data           Block 2                                    Data data data data
                                                                              Data data data data
Data data data data data                                                      Data data data data
Data data data data data                                                      Data data data data
Data data data data data
                                                                    MAP
                                              Block 3               Block 3
                                                        Block 3




                                                                              6
Quiz: What Is the Common Attribute?




                    7
HDFS
                Actively maintain data reliability


                            Namespace      Block
                            State          Map

                          Namenode


Bad/lost                  1.                       3.              Periodically
block replica             replicate                blockReceived   check block
                                                                   checksums


 b1    b3       b2   b4    b1    b3                      b3   b2              b6
                                         2. copy
 b2             b3         b5                            b5              b5   b4



                                      Datanodes
Hadoop at Yahoo!

                                                                                                                                                                     Availability SLA

250,000                                                                                                                                                Sandbox                                             99.69

             Total Nodes = 43,936                                                                                                                    Research                            99.47
             Total Storage = 206 PB
200,000
            1M+ Monthly Hadoop Jobs                                                                                                                Production                                                          99.85


                                                                                                                                                                  99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9
150,000


                                                                                                                                                                    Nodes running Hadoop at Yahoo!

100,000

                                                                                                                                                   Sandbox                   7,803



                                                                                                                                                                                       Over 43,000 nodes running Hadoop
 50,000




                                                                                                                                                  Research                                                                     22,334


     0
          2006 - 2006 - 2006 - 2006 - 2007 - 2007 - 2007 - 2007 - 2008 - 2008 - 2008 - 2008 - 2009 - 2009 - 2009 - 2009 - 2010 - 2010 - 2010 -
           Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3



                                                                                                                                                 Production                                      13,687




                                                                                                                                                              0       5000           10000         15000           20000                25000


                                                                                                                                                                                                                   9
Scaling Hadoop
 Early Gains
  •   Simple design allowed rapid improvements
      •   Namespace is all in RAM, simpler locking
      •   Improved memory usage in 0.16, JVM Heap configuration (Suresh Srinivas)

 Growth of number of files and storage is limited by adding RAM to namenode
      • 50G heap = 200M “fs objects” = 100M names + 100MBlocks
          •   14PB of storage (50MB blocksize)

          •   4K nodes

  - Job Tracker carries out both job lifecycle management and scheduling

 Yahoo’s Response:
  •   HDFS Federation: horizontal scaling of namespace (0.22)

  •   Next Generation of Map-Reduce - Complete overhaul of job tracker/task tracker

 Goal:
  •   Clusters of 6000 nodes, 100,000 cores & 10k concurrent jobs, 100 PB raw storage per cluster



                                                          10                        6 May 2010
Not to scale
     Scaling the Name Service:
     Options
                                                                     Block-reports for Billions of
                                                                     blocks requires rethinking
# clients                                                            block layer
100x                                     Good isolation
                                         properties
50x

                                                                                            Distributed NNs
20x
                                                                           Partial
                                                Multiple
                                                                           NS in memory
                                                 Namespace
                                                                           With Namespace
                                                volumes
                                                                           volumes
4x

                      Separate Bmaps from NN       Partial
        All NS
1x      in memory
                    Archives
                                                   NS (Cache)
                                                   in memory


                                                                                                              # names
        100M               200M                  1B             2B            10B            20B
                                                                                            11
Opportunity:
       Vertical & Horizontal scaling
           Vertical scaling
              More RAM, Efficiency in memory usage
              First class archives (tar/zip like)
              Partial namespace in main memory



Namenode          Horizontal: Federation


   Horizontal scaling/federation benefits:
   –   Scale
   –   Isolation, Stability, Availability
   –   Flexibility
   –   Other Namenode implementations or non-HDFS namespaces

                                                               12
Block (Object) Storage Subsystem


Block (Object) Storage Subsystem
• Shared storage provided as pools of blocks
• Namespaces (HDFS, others) use one or more block-pools
• Note: HDFS has 2 layers today – we are generalizing/extending it.
  Namespace




                        NS1             ...       NS k
                                                                ...             Foreign
                                                                                 NS n




                              Pools 1           Pools k               Pools n
  Block storage




                    B                         Block     Pools
                    a
                    l
                    a
                    nDatanode 1               Datanode 2               Datanode m
                    c    ...                      ...                           ...
                    e
                    r
                                                                                          13
1st Phase:
                B-Pool management inside Namenode


NN-1                         NN-k                         NN-n

       NS1             ...                          ...             Foreign
                                      NS k
                                                                     NS n


                                                                                Future:
                                                                              Move Block
             Pools 1             Pools k                  Pools n              mgt into
                                                                               separate
                              Block         Pools                               nodes
  B
  a
  l
  a
  n
  cDatanode 1                  Datanode 2                  Datanode m
  e    ...                            ...                           ...
  r




                                                                                       14
Future:
                                       Move block management out

                                                      ...                         ...                   Foreign
                                      NS1                           NS k                                 NS n




                                                                                                                  Easier to scale
                                                                                                                    horizontally
         1. Open                                                                                                  than the name
                                                                                                                      server




                                            Pools 1            Pools k                  Pools n
client
           2. getBlockLocations
                                                                                                           Block Manager
                                                            Block         Pools
                                  B
                                  a
                                  l
                                  a
         3. ReadBlock             n
                                  c
                                  e
                                  r

                                  Datanode 1                Datanode 2                   Datanode m
                                      ...                           ...                           ...

                                                                                                                                    15
What is a HDFS Cluster


           Current                        New
• HDFS Cluster             • HDFS Cluster
   – 1 Namespace              – N Namespaces
   – A set of blocks          – Set of block-pools
                                  • Each block-pool is set of blocks
                                  • Phase 1: 1 BP per NS
                                       – Implies N block-pools




• Implemented as
                           • Implemented as
   – 1 Namenode
                              – N Namenode
   – Set of DNs
                              – Set of DNs
                                  • Each DN stores the blocks for
                                    each block-pool



                                                                       16
Managing Namespaces

                                                                    /    Client-side
•   Federation has multiple namespaces                                   mount-table
    – don’t you need a single global
    namespace?
     – Key is to share the data and the
       names used to access the shared              data project   hom      tmp
       data.                                                       e

•   A global namespace is one way to do
    that – but even there we talk of
    several large “global” namespaces
•   Client-side mount table is another way
    to share
     – Shared mount-table => “global” shared
       view
     – Personalized mount-table => per-
       application view
         • Share the data that matter by mounting
           it
HDFS Federation Across Clusters
                       /
 Application                                               /      Application
 mount-                                                           mount-
 table in                                                         table in
 Cluster 2                                                        Cluster 1
                              home
                                                                       tmp
                                                           home
tmp
       data                project
                                      data       project




       Cluster 2
                                     Cluster 1                                  18
Nameserver as container for namespaces
• Nameserver as a container for namespaces
  • Each namespace with its own separate state
      •   Persistent state in shared storage (e.g. Book Keeper)
• Each nameserver serves a set of namespaces
  • Selected based on isolation and capacity
  • A namespace can be moved between nameserver




                                  …
           Nameserver                             Nameserver



                                …
               Shared persistent storage for namespace metadata
                               (e.g. Book keeper)
                                           19
Summary
 Federated HDFS (Jira HDFS-1052)
  •   Scale by adding independent Namenodes
      • Preserves the robustness of the Namenodes
      • Not much code change to the Namenode

  •   Generalizes the Block storage layer
      • Analogous to Sans & Luns
      • Can add other implementations of the Namenodes
      • Even other name services (HBase?)
      • Could move the Block management out of the Namenode in the future
      • But to truly scale to 10s or 100s Bilions of blocks we need to rethink the block map and block
        reports

  •   Benefits
      • Scale number of file names and blocks
      • Improved isolation and hence availability




                                                     20                          6 May 2010
Q&A




  21

More Related Content

What's hot

Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Carlos Castillo (ChaTo)
 
Root zone update for TLD Managers
Root zone update for TLD ManagersRoot zone update for TLD Managers
Root zone update for TLD Managerskimdavies
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesAnita de Waard
 
深入解析Oracle-数据库架构设计与性能优化实践
深入解析Oracle-数据库架构设计与性能优化实践深入解析Oracle-数据库架构设计与性能优化实践
深入解析Oracle-数据库架构设计与性能优化实践Guoqiang Gai
 
Int 1 comp ca and mm
Int 1 comp ca and mmInt 1 comp ca and mm
Int 1 comp ca and mmitslides2009
 
OB9-G-language-Arakawa
OB9-G-language-ArakawaOB9-G-language-Arakawa
OB9-G-language-Arakawatutorialsruby
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introductionBen Gardner
 

What's hot (8)

Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
Challenges Distributed Information Retrieval [RBY] (ICDE 2007 Turkey)
 
Lee oracle
Lee oracleLee oracle
Lee oracle
 
Root zone update for TLD Managers
Root zone update for TLD ManagersRoot zone update for TLD Managers
Root zone update for TLD Managers
 
Small Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific RepositoriesSmall Data: Bridging the Gap Between Generic and Specific Repositories
Small Data: Bridging the Gap Between Generic and Specific Repositories
 
深入解析Oracle-数据库架构设计与性能优化实践
深入解析Oracle-数据库架构设计与性能优化实践深入解析Oracle-数据库架构设计与性能优化实践
深入解析Oracle-数据库架构设计与性能优化实践
 
Int 1 comp ca and mm
Int 1 comp ca and mmInt 1 comp ca and mm
Int 1 comp ca and mm
 
OB9-G-language-Arakawa
OB9-G-language-ArakawaOB9-G-language-Arakawa
OB9-G-language-Arakawa
 
Practical semantics - An introduction
Practical semantics - An introductionPractical semantics - An introduction
Practical semantics - An introduction
 

Viewers also liked

Map reduce 기본 설명
Map reduce 기본 설명Map reduce 기본 설명
Map reduce 기본 설명Jinho Yoo
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache HiveHBaseCon
 
하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기beom kyun choi
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Keeyong Han
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationAdam Kawa
 
범용 PaaS 플랫폼 mesos(mesosphere)
범용 PaaS 플랫폼 mesos(mesosphere)범용 PaaS 플랫폼 mesos(mesosphere)
범용 PaaS 플랫폼 mesos(mesosphere)상욱 송
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례NAVER LABS
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsDataWorks Summit
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기beom kyun choi
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기beom kyun choi
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Edureka!
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with HadoopPhilippe Julio
 

Viewers also liked (15)

Tutorial Haddop 2.3
Tutorial Haddop 2.3Tutorial Haddop 2.3
Tutorial Haddop 2.3
 
Hadoop발표자료
Hadoop발표자료Hadoop발표자료
Hadoop발표자료
 
March 2011 HUG: HDFS Federation
March 2011 HUG: HDFS FederationMarch 2011 HUG: HDFS Federation
March 2011 HUG: HDFS Federation
 
Map reduce 기본 설명
Map reduce 기본 설명Map reduce 기본 설명
Map reduce 기본 설명
 
HBaseCon 2015: Analyzing HBase Data with Apache Hive
HBaseCon 2015: Analyzing HBase Data with Apache  HiveHBaseCon 2015: Analyzing HBase Data with Apache  Hive
HBaseCon 2015: Analyzing HBase Data with Apache Hive
 
하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기하둡 맵리듀스 훑어보기
하둡 맵리듀스 훑어보기
 
Hadoop Introduction (1.0)
Hadoop Introduction (1.0)Hadoop Introduction (1.0)
Hadoop Introduction (1.0)
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS FederationApache Hadoop YARN, NameNode HA, HDFS Federation
Apache Hadoop YARN, NameNode HA, HDFS Federation
 
범용 PaaS 플랫폼 mesos(mesosphere)
범용 PaaS 플랫폼 mesos(mesosphere)범용 PaaS 플랫폼 mesos(mesosphere)
범용 PaaS 플랫폼 mesos(mesosphere)
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase DeploymentsMulti-tenant, Multi-cluster and Multi-container Apache HBase Deployments
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
 
하둡 HDFS 훑어보기
하둡 HDFS 훑어보기하둡 HDFS 훑어보기
하둡 HDFS 훑어보기
 
하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기하둡 (Hadoop) 및 관련기술 훑어보기
하둡 (Hadoop) 및 관련기술 훑어보기
 
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 

Similar to Federated HDFS

Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache AccumuloJared Winick
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012StampedeCon
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作James Chen
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesOReillyStrata
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsRichard McDougall
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messagesyarapavan
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BIDenny Lee
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctdKaliyan S
 
Couchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveCouchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveDipti Borkar
 
CUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Simplilearn
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...Cloudera, Inc.
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopJim Dowling
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduceFARUK BERKSÖZ
 

Similar to Federated HDFS (20)

Understanding hdfs
Understanding hdfsUnderstanding hdfs
Understanding hdfs
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
 
Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012Facebook's HBase Backups - StampedeCon 2012
Facebook's HBase Backups - StampedeCon 2012
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Big Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure ConsiderationsBig Data/Hadoop Infrastructure Considerations
Big Data/Hadoop Infrastructure Considerations
 
Storage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook MessagesStorage Infrastructure Behind Facebook Messages
Storage Infrastructure Behind Facebook Messages
 
Above the cloud: Big Data and BI
Above the cloud: Big Data and BIAbove the cloud: Big Data and BI
Above the cloud: Big Data and BI
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
Exchange 2010 ha ctd
Exchange 2010 ha ctdExchange 2010 ha ctd
Exchange 2010 ha ctd
 
Couchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep diveCouchbase Server 2.0 - XDCR - Deep dive
Couchbase Server 2.0 - XDCR - Deep dive
 
CUBRID Cluster Introduction
CUBRID Cluster IntroductionCUBRID Cluster Introduction
CUBRID Cluster Introduction
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
HBaseCon 2012 | Gap Inc Direct: Serving Apparel Catalog from HBase for Live W...
 
Introduction to HDFS
Introduction to HDFSIntroduction to HDFS
Introduction to HDFS
 
Hops - Distributed metadata for Hadoop
Hops - Distributed metadata for HadoopHops - Distributed metadata for Hadoop
Hops - Distributed metadata for Hadoop
 
Hbase jdd
Hbase jddHbase jdd
Hbase jdd
 
Hadoop hbase mapreduce
Hadoop hbase mapreduceHadoop hbase mapreduce
Hadoop hbase mapreduce
 

More from huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink huguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 

More from huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink Streaming Dataflow with Apache Flink
Streaming Dataflow with Apache Flink
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 

Federated HDFS

  • 1. HDFS Federation Sanjay Radia, Hadoop Architect Yahoo! Inc Apache Hadoop India Summit 2011 1
  • 2. Outline Hadoop Components • HDFS - Quick overview HDFS Distributed file • Scaling HDFS - Federation system MapReduce Distributed computation HBase Column store Pig Dataflow language Hive Data warehouse Zookeeper Distributed coordination Avro Data Serialization Oozie Workflow
  • 3. 3
  • 4. HDFS Namespace Metadata & Journal Backup Namespace Block Namenode State Map Hierarchal Namespace Namenode Block ID  Block Locations File Name  BlockIDs Heartbeats & Block Reports Datanodes b1 b3 b2 b4 b1 b3 b3 b2 b6 Block ID  Data b2 b3 b5 b5 b5 b4 Horizontally Scale IO and Storage 4
  • 5. HDFS Client reads and writes Namespace Block State Map 1 open 1 create Namenode Client Client 2 read End-to-end checksum 2 write b1 b3 b2 b4 b1 b3 b3 b2 b6 b2 b3 b5 b5 b5 b4 write write Datanodes 5
  • 6. HDFS Architecture : Computation close to the data Hadoop Cluster Data Data data data data data Data data data data data Block 1 Block 1 Data data data data data Block 1 Data data data data data Results Data data data data data MAP Data data data data Data data data data data Block 2 Data data data data Data data data data Data data data data data Block 2 MAP Data data data data Data data data data data Reduce Data data data data Data data data data data Block 2 Data data data data Data data data data Data data data data data Data data data data Data data data data data Data data data data Data data data data data MAP Block 3 Block 3 Block 3 6
  • 7. Quiz: What Is the Common Attribute? 7
  • 8. HDFS Actively maintain data reliability Namespace Block State Map Namenode Bad/lost 1. 3. Periodically block replica replicate blockReceived check block checksums b1 b3 b2 b4 b1 b3 b3 b2 b6 2. copy b2 b3 b5 b5 b5 b4 Datanodes
  • 9. Hadoop at Yahoo! Availability SLA 250,000 Sandbox 99.69 Total Nodes = 43,936 Research 99.47 Total Storage = 206 PB 200,000 1M+ Monthly Hadoop Jobs Production 99.85 99.2 99.3 99.4 99.5 99.6 99.7 99.8 99.9 150,000 Nodes running Hadoop at Yahoo! 100,000 Sandbox 7,803 Over 43,000 nodes running Hadoop 50,000 Research 22,334 0 2006 - 2006 - 2006 - 2006 - 2007 - 2007 - 2007 - 2007 - 2008 - 2008 - 2008 - 2008 - 2009 - 2009 - 2009 - 2009 - 2010 - 2010 - 2010 - Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Qtr4 Qtr1 Qtr2 Qtr3 Production 13,687 0 5000 10000 15000 20000 25000 9
  • 10. Scaling Hadoop  Early Gains • Simple design allowed rapid improvements • Namespace is all in RAM, simpler locking • Improved memory usage in 0.16, JVM Heap configuration (Suresh Srinivas)  Growth of number of files and storage is limited by adding RAM to namenode • 50G heap = 200M “fs objects” = 100M names + 100MBlocks • 14PB of storage (50MB blocksize) • 4K nodes - Job Tracker carries out both job lifecycle management and scheduling  Yahoo’s Response: • HDFS Federation: horizontal scaling of namespace (0.22) • Next Generation of Map-Reduce - Complete overhaul of job tracker/task tracker  Goal: • Clusters of 6000 nodes, 100,000 cores & 10k concurrent jobs, 100 PB raw storage per cluster 10 6 May 2010
  • 11. Not to scale Scaling the Name Service: Options Block-reports for Billions of blocks requires rethinking # clients block layer 100x Good isolation properties 50x Distributed NNs 20x Partial Multiple NS in memory Namespace With Namespace volumes volumes 4x Separate Bmaps from NN Partial All NS 1x in memory Archives NS (Cache) in memory # names 100M 200M 1B 2B 10B 20B 11
  • 12. Opportunity: Vertical & Horizontal scaling Vertical scaling More RAM, Efficiency in memory usage First class archives (tar/zip like) Partial namespace in main memory Namenode Horizontal: Federation Horizontal scaling/federation benefits: – Scale – Isolation, Stability, Availability – Flexibility – Other Namenode implementations or non-HDFS namespaces 12
  • 13. Block (Object) Storage Subsystem Block (Object) Storage Subsystem • Shared storage provided as pools of blocks • Namespaces (HDFS, others) use one or more block-pools • Note: HDFS has 2 layers today – we are generalizing/extending it. Namespace NS1 ... NS k ... Foreign NS n Pools 1 Pools k Pools n Block storage B Block Pools a l a nDatanode 1 Datanode 2 Datanode m c ... ... ... e r 13
  • 14. 1st Phase: B-Pool management inside Namenode NN-1 NN-k NN-n NS1 ... ... Foreign NS k NS n Future: Move Block Pools 1 Pools k Pools n mgt into separate Block Pools nodes B a l a n cDatanode 1 Datanode 2 Datanode m e ... ... ... r 14
  • 15. Future: Move block management out ... ... Foreign NS1 NS k NS n Easier to scale horizontally 1. Open than the name server Pools 1 Pools k Pools n client 2. getBlockLocations Block Manager Block Pools B a l a 3. ReadBlock n c e r Datanode 1 Datanode 2 Datanode m ... ... ... 15
  • 16. What is a HDFS Cluster Current New • HDFS Cluster • HDFS Cluster – 1 Namespace – N Namespaces – A set of blocks – Set of block-pools • Each block-pool is set of blocks • Phase 1: 1 BP per NS – Implies N block-pools • Implemented as • Implemented as – 1 Namenode – N Namenode – Set of DNs – Set of DNs • Each DN stores the blocks for each block-pool 16
  • 17. Managing Namespaces / Client-side • Federation has multiple namespaces mount-table – don’t you need a single global namespace? – Key is to share the data and the names used to access the shared data project hom tmp data. e • A global namespace is one way to do that – but even there we talk of several large “global” namespaces • Client-side mount table is another way to share – Shared mount-table => “global” shared view – Personalized mount-table => per- application view • Share the data that matter by mounting it
  • 18. HDFS Federation Across Clusters / Application / Application mount- mount- table in table in Cluster 2 Cluster 1 home tmp home tmp data project data project Cluster 2 Cluster 1 18
  • 19. Nameserver as container for namespaces • Nameserver as a container for namespaces • Each namespace with its own separate state • Persistent state in shared storage (e.g. Book Keeper) • Each nameserver serves a set of namespaces • Selected based on isolation and capacity • A namespace can be moved between nameserver … Nameserver Nameserver … Shared persistent storage for namespace metadata (e.g. Book keeper) 19
  • 20. Summary  Federated HDFS (Jira HDFS-1052) • Scale by adding independent Namenodes • Preserves the robustness of the Namenodes • Not much code change to the Namenode • Generalizes the Block storage layer • Analogous to Sans & Luns • Can add other implementations of the Namenodes • Even other name services (HBase?) • Could move the Block management out of the Namenode in the future • But to truly scale to 10s or 100s Bilions of blocks we need to rethink the block map and block reports • Benefits • Scale number of file names and blocks • Improved isolation and hence availability 20 6 May 2010