SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
Deploying Grid Services
                         Using Hadoop

                           Allen Wittenauer
                             April 11, 2008


Yahoo! @ ApacheCon                             1
Agenda

      •    Grid Computing At Yahoo!
      •    Quick Overview of Hadoop
             – “Hadoop for Systems Administrators”
      •    Scaling Hadoop Deployments
      •    Yahoo!’s Next Generation Grid Infrastructure
      •    Questions (and maybe even Answers!)




Yahoo! @ ApacheCon                                        2
Grid Computing at Yahoo!


      • Drivers
          – 500M unique users per month
          – Billions of interesting events per day
          – “Data analysis is the inner-loop at Yahoo!”
      • Yahoo! Grid Vision and Focus
          – On-demand, shared access to vast pool of resources
          – Support for massively parallel execution (1000s of processors)
          – Data Intensive Super Computing (DISC)
          – Centrally provisioned and managed
          – Service-oriented, elastic
      • What We’re Not
          – Not “Grid” in the sense of scientific community (Globus, etc)
          – Not focused on public or 3rd-party utility (Amazon EC2/S3, etc)



Yahoo! @ ApacheCon                                                            3
Yahoo! Grid Services


      • Operate multiple grids within Yahoo!
      • 10,000s nodes, 100,000s cores, TBs RAM, PBs disk
      • Support large internal user community
             – Account management, training, etc
      • Manage data needs
             – Ingest TBs per day
      • Deploy and manage software (Hadoop, Pig, etc)




Yahoo! @ ApacheCon                                         4
“Grid” Computing

      •    What we really do is utility computing
             – Nobody knows what “utility computing” is, but everyone has heard of
               “grid computing”
                     • Grid computing implies sharing across resources owned by multiple,
                       independent organizations
                     • Utility computing implies sharing one owner’s resources by multiple,
                       independent customers
      •    Ultimate goal is to provide shared compute and storage resources
             – Instead of going to a hardware committee to provision balkanized
               resources, a project allocates a part of its budget for use on Yahoo!’s
               shared grids
             – Pay as you go
                     • Only buy 100 computers for 15 minutes of compute time vs. 100 computers
                       24x7




Yahoo! @ ApacheCon                                                                            5
What is a Yahoo! Grid Service?

      •    Thousands of machines using basic network hardware
             – It’s hard to program for many machines
      •    Clustering and sharing software
             – Hadoop, HOD, Torque, Maui, and other bits...
      •    Petabytes of data
             – It’s an engineering challenge to load so much data from many sources
      •    Attached development environment
             – A clean, well lit place to interact with a grid
      •    User support
             – Learning facilitation
      •    Usage tracking and billing
             – Someone has to pay the bills...




Yahoo! @ ApacheCon                                                              6
Quick MapReduce Overview

      •    Application writer specifies                Input 0       Input 1      Input 2
             – two functions: Map and Reduce
             – set of input files
      •    Workflow                                    Map 0          Map 1        Map 2
             – input phase generates a number of
               FileSplits from input files (one per
               Map Task)                                             Shuffle
             – Map phase executes user function
               to transform key/value inputs into
               new key/value outputs                      Reduce 0             Reduce 1
             – Framework sorts and shuffles
             – Reduce phase combines all k/v’s
               with the same key into new k/v’s
             – Output phase writes the resulting            Out 0                Out 1
               pairs to files
      •    cat * | grep | sort | uniq -c | cat > out
Yahoo! @ ApacheCon                                                                        7
Hadoop MapReduce: Process Level

      •    Job
             – Map Function + Reduce Function + List of inputs




                                                       Job
                                      Job
                                                     Tracker



                                         Task                     Task
                                        Tracker                  Tracker


                                      Task    Task             Task   Task


Yahoo! @ ApacheCon                                                           8
Hadoop Distributed File System


                                       Data
                                                FSFS
                                                 FS
                                                  FS
                                       Node


                            Data      Status

                                      Name
                          Metadata
                                      Node
                            Data
                                      Status

                                       Data
                                                FSFS
                                                 FS
                                                  FS
                                       Node




Yahoo! @ ApacheCon                                     9
Gateways: Multi-user Land

      •    Provided for two main purposes
             – Meaningful development interaction with a compute cluster
                     • High bandwidth, low latency, and few network barriers enable a tight
                       development loop when creating MapReduce jobs
             – Permission and privilege separation
                     • Limit exposure to sensitive data
                          – Hadoop 0.15 and lower lack users, permissions, etc.
                          – Hadoop 0.16 has users and “weak” permissions
      •    Characteristics
             – “Replacement” Lab machine
             – World-writable local disk space
                     • Any single-threaded processing
                     • Code debugging                                             Gateway




Yahoo! @ ApacheCon                                                                            11
Compute Cluster and Name Nodes

      •    Compute Cluster Node
             – Users cannot login!
             – All nodes run both MapReduce and HDFS
               frameworks
             – usually 500 to 2000 machines
             – Each cluster kept relatively homogenous
             – Hardware configuration
                                                                 Task
                     • 2xSockets (2 or 4 core)
                                                                Tracker
                     • 4x500-750G                       Name
                     • 6G-8G RAM                        Node
                                                               Data Node
      •    Name Node
             – 16G RAM
                     • 14G Java heap = 18-20 million files



Yahoo! @ ApacheCon                                                         12
Queuing and Scheduling

      •    Hadoop does not have an advanced scheduling
           system
             – MapReduce JobTracker manages one or more jobs
               running within a set of machines
             – Works well for “dedicated” applications, but does not
               work so well for shared resources
      •    Grid Services are intended to be a shared multi-user,
           multi-application environment
             – Need to combine Hadoop with an external queuing and
               scheduling system...




Yahoo! @ ApacheCon                                                     13
Hadoop On Demand (HoD)

                                                                            Resource
      •    Wrapper around PBS commands
                                                                           Mgmt/Sched
             – We use freely available Torque and Maui
      •    Big win: virtual private JobTracker clusters                       maui
             – Job isolation
             – Users create clusters of the size they need
                                                                           pbs_server
             – Submit jobs to their private JT
      •    Big costs:
             –   Lose data locality
                                                                            pbs_mom
             –   Increased complexity
             –   Lose a node for private JobTracker
             –   Single reducer doesn’t free unused nodes
                     • ~ 30% efficiency lost!
      •    Looking at changing Hadoop scheduling
             – Task scheduling flexibility combined with node elasticity

Yahoo! @ ApacheCon                                                              14
HoD Job Scheduling




                     1+3 Nodes   1+4 Nodes   1+5 Nodes



                     T       T           T    T



                     J       T   T       T    J      NN




Yahoo! @ ApacheCon                                        15
The Reality of a 1000 Node Grid



                          200   400    500    10




Yahoo! @ ApacheCon                                     16
Putting It All Together


                     Gateways
                                                RM


                                          JT         DN
                                          TT         DN
                                          JT         DN
                                          TT         DN
                                          TT         DN
                                          JT         DN



                                                NN


Yahoo! @ ApacheCon                                        17
Network Configuration




                                      Core1           Core2         Core3       Core4



                              GE

                             2xGE

                                         Switch         Switch              Switch
                     40 hosts/racks      H    H   H     H   H   H           H   H   H




Yahoo! @ ApacheCon                                                                      18
Yahoo!’s Next Generation
   Grid Infrastructure
     A Work In Progress




                           19
Background Information

      •    Internal deployments
             – Mostly Yahoo! proprietary technologies
      •    M45
             – Educational outreach grid
             – Non-Yahoo!’s using Yahoo! resources
                     • Legal required us not to use any Y! technology!
      •    Decision made to start from scratch!
             – Hard to share best practices
             – Potential legal issues
             – Don’t want to support two ways to do the same operation
      •    Internal grids converting to be completely OSS as possible
             – Custom glue code to deal with any Y!<-->OSS incompatibilities
                     • user and group data



Yahoo! @ ApacheCon                                                             20
Naming and Provisioning Services

      •    Naming services
             – Kerberos for secure authentication
             – DNS for host resolution
             – LDAP for everything else
      •    ISC DHCP
             – Reads table information from LDAP
             – In pairs for redundancy
      •    Kickstart
             – We run RHEL 5.x
             – base image + bcfg2
      •    bcfg2
             – host customization
             – centralized configuration management


Yahoo! @ ApacheCon                                      21
NFS for Multi-user Support

      •    NFS
             – Home Directories
             – Project Directories
                     • Group shared data


      •    Grids with service level agreements (SLAs) shouldn’t use NFS !
             – Single point of failure
                     • HA-NFS == $$$
             – Performance
             – Real data should be in HDFS




Yahoo! @ ApacheCon                                                          22
Big Picture
                                                      Remote Data Center
                         LDAP/KRB5/                           LDAP/KRB5/
                                             MMR
                            DNS                                  DNS
                     Read Only Replication


                                               DHCP             BCFG2/YUM/
                      LDAP/KRB Slaves
                                               Data              DHCP Farm



                             GRID1            GRID2              ...       GRIDx



                                               NFS:
                                              home
                                         project (per grid)
Yahoo! @ ApacheCon                                                                 23
Self Healing/Reporting

      •    Torque
             – Use the Torque node health mechanism to disable/fix ‘sick’ nodes
                     • Great reduction in amount of support issues
                     • Address problems in bulk


      •    Nagios
             – Usual stuff
             – Custom hooks into Torque


      •    Simon
             – Yahoo!’s distributed cluster and application monitoring tools
             – Similar to Ganglia
             – On the roadmap to be open sourced



Yahoo! @ ApacheCon                                                                24
Node Usage Report




Yahoo! @ ApacheCon                       25
Ranges and Groups

      •    Range: group of hosts
             – example: @GRID == all grid hosts
             – custom tools to manipulate hosts based upon ranges:
                     • ssh -r @GRID uptime
                          – Report uptime on all of the hosts in @GRID


      •    Netgroup
             – Used to implement ranges
             – The most underrated naming service switch ever?
                     • Cascaded!
                     • Scalable!
                     • Supported in lots of useful places!
                          – PAM (e.g., _succeed_if on Linux)
                          – NFS




Yahoo! @ ApacheCon                                                       26
Some Links

      •    Apache Hadoop
             – http://hadoop.apache.org/

      •    Yahoo! Hadoop Blog
             – http://developer.yahoo.com/blogs/hadoop/

      •    M45 Press Release
             – http://research.yahoo.com/node/1879

      •    Hadoop Summit and DISC Slides
             – http://research.yahoo.com/node/2104




Yahoo! @ ApacheCon                                        27
28

Más contenido relacionado

La actualidad más candente

Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopBrock Noland
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Krishnan Parasuraman
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101EMC
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentYahoo Developer Network
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopAllen Wittenauer
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015 clairvoyantllc
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwielerlucenerevolution
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera, Inc.
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Marcel Krcah
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryCloudera, Inc.
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationCloudera, Inc.
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreducehansen3032
 

La actualidad más candente (20)

Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop Ecosystem Overview
Hadoop Ecosystem OverviewHadoop Ecosystem Overview
Hadoop Ecosystem Overview
 
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
Hadoop World 2011: Building Scalable Data Platforms ; Hadoop & Netezza Deploy...
 
Hadoop 101
Hadoop 101Hadoop 101
Hadoop 101
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and DeploymentOct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
Oct 2012 HUG: Hadoop .Next (0.23) - Customer Impact and Deployment
 
Deploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache HadoopDeploying Grid Services Using Apache Hadoop
Deploying Grid Services Using Apache Hadoop
 
Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0Hadoop 2.0 handout 5.0
Hadoop 2.0 handout 5.0
 
Bigdata workshop february 2015
Bigdata workshop  february 2015 Bigdata workshop  february 2015
Bigdata workshop february 2015
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Architecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric BaldeschwielerArchitecting the Future of Big Data & Search - Eric Baldeschwieler
Architecting the Future of Big Data & Search - Eric Baldeschwieler
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With HadoopCloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)Hadoop in Practice (SDN Conference, Dec 2014)
Hadoop in Practice (SDN Conference, Dec 2014)
 
Hadoop Backup and Disaster Recovery
Hadoop Backup and Disaster RecoveryHadoop Backup and Disaster Recovery
Hadoop Backup and Disaster Recovery
 
Drill njhug -19 feb2013
Drill njhug -19 feb2013Drill njhug -19 feb2013
Drill njhug -19 feb2013
 
Hadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote PresentationHadoop World 2011: Mike Olson Keynote Presentation
Hadoop World 2011: Mike Olson Keynote Presentation
 
Large scale computing with mapreduce
Large scale computing with mapreduceLarge scale computing with mapreduce
Large scale computing with mapreduce
 

Similar a Deploying Grid Services Using Hadoop

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloudelliando dias
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephantsOvidiu Dimulescu
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataCyanny LIANG
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetCloudera, Inc.
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDYVenneladonthireddy1
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big DataJoe Alex
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosLester Martin
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoopbddmoscow
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoopyaevents
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברגTaldor Group
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.MaharajothiP
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.pptSathish24111
 

Similar a Deploying Grid Services Using Hadoop (20)

Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop and Distributed Computing
Hadoop and Distributed ComputingHadoop and Distributed Computing
Hadoop and Distributed Computing
 
Hadoop on Azure, Blue elephants
Hadoop on Azure,  Blue elephantsHadoop on Azure,  Blue elephants
Hadoop on Azure, Blue elephants
 
Hadoop distributed computing framework for big data
Hadoop distributed computing framework for big dataHadoop distributed computing framework for big data
Hadoop distributed computing framework for big data
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring BudgetHBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
HBaseCon 2012 | Building a Large Search Platform on a Shoestring Budget
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
 
Introduction to Hadoop and Big Data
Introduction to Hadoop and Big DataIntroduction to Hadoop and Big Data
Introduction to Hadoop and Big Data
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
 
Hadoop, Taming Elephants
Hadoop, Taming ElephantsHadoop, Taming Elephants
Hadoop, Taming Elephants
 
Hadoop tutorial
Hadoop tutorialHadoop tutorial
Hadoop tutorial
 
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive DemosHadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
Hadoop Demystified + MapReduce (Java and C#), Pig, and Hive Demos
 
Big Data Developers Moscow Meetup 1 - sql on hadoop
Big Data Developers Moscow Meetup 1  - sql on hadoopBig Data Developers Moscow Meetup 1  - sql on hadoop
Big Data Developers Moscow Meetup 1 - sql on hadoop
 
Scaling Storage and Computation with Hadoop
Scaling Storage and Computation with HadoopScaling Storage and Computation with Hadoop
Scaling Storage and Computation with Hadoop
 
4. hadoop גיא לבנברג
4. hadoop  גיא לבנברג4. hadoop  גיא לבנברג
4. hadoop גיא לבנברג
 
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
P.Maharajothi,II-M.sc(computer science),Bon secours college for women,thanjavur.
 
Hadoop Tutorial.ppt
Hadoop Tutorial.pptHadoop Tutorial.ppt
Hadoop Tutorial.ppt
 
Hadoop Primer
Hadoop PrimerHadoop Primer
Hadoop Primer
 

Más de George Ang

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...George Ang
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarizationGeorge Ang
 
Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textGeorge Ang
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿George Ang
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势George Ang
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程George Ang
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qqGeorge Ang
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道George Ang
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化George Ang
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间George Ang
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨George Ang
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站George Ang
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程George Ang
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagementGeorge Ang
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享George Ang
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍George Ang
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享George Ang
 

Más de George Ang (20)

Wrapper induction construct wrappers automatically to extract information f...
Wrapper induction   construct wrappers automatically to extract information f...Wrapper induction   construct wrappers automatically to extract information f...
Wrapper induction construct wrappers automatically to extract information f...
 
Opinion mining and summarization
Opinion mining and summarizationOpinion mining and summarization
Opinion mining and summarization
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
Do not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar textDo not crawl in the dust 
different ur ls similar text
Do not crawl in the dust 
different ur ls similar text
 
大规模数据处理的那些事儿
大规模数据处理的那些事儿大规模数据处理的那些事儿
大规模数据处理的那些事儿
 
腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势腾讯大讲堂02 休闲游戏发展的文化趋势
腾讯大讲堂02 休闲游戏发展的文化趋势
 
腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程腾讯大讲堂03 qq邮箱成长历程
腾讯大讲堂03 qq邮箱成长历程
 
腾讯大讲堂04 im qq
腾讯大讲堂04 im qq腾讯大讲堂04 im qq
腾讯大讲堂04 im qq
 
腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道腾讯大讲堂05 面向对象应对之道
腾讯大讲堂05 面向对象应对之道
 
腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化腾讯大讲堂06 qq邮箱性能优化
腾讯大讲堂06 qq邮箱性能优化
 
腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间腾讯大讲堂07 qq空间
腾讯大讲堂07 qq空间
 
腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨腾讯大讲堂08 可扩展web架构探讨
腾讯大讲堂08 可扩展web架构探讨
 
腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站腾讯大讲堂09 如何建设高性能网站
腾讯大讲堂09 如何建设高性能网站
 
腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程腾讯大讲堂01 移动qq产品发展历程
腾讯大讲堂01 移动qq产品发展历程
 
腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement腾讯大讲堂10 customer engagement
腾讯大讲堂10 customer engagement
 
腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享腾讯大讲堂11 拍拍ce工作经验分享
腾讯大讲堂11 拍拍ce工作经验分享
 
腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍腾讯大讲堂14 qq直播(qq live) 介绍
腾讯大讲堂14 qq直播(qq live) 介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
腾讯大讲堂15 市场研究及数据分析理念及方法概要介绍
 
腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享腾讯大讲堂16 产品经理工作心得分享
腾讯大讲堂16 产品经理工作心得分享
 

Último

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...itnewsafrica
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Mark Simos
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...itnewsafrica
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 

Último (20)

Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...Zeshan Sattar- Assessing the skill requirements and industry expectations for...
Zeshan Sattar- Assessing the skill requirements and industry expectations for...
 
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
Tampa BSides - The No BS SOC (slides from April 6, 2024 talk)
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
Irene Moetsana-Moeng: Stakeholders in Cybersecurity: Collaborative Defence fo...
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 

Deploying Grid Services Using Hadoop

  • 1. Deploying Grid Services Using Hadoop Allen Wittenauer April 11, 2008 Yahoo! @ ApacheCon 1
  • 2. Agenda • Grid Computing At Yahoo! • Quick Overview of Hadoop – “Hadoop for Systems Administrators” • Scaling Hadoop Deployments • Yahoo!’s Next Generation Grid Infrastructure • Questions (and maybe even Answers!) Yahoo! @ ApacheCon 2
  • 3. Grid Computing at Yahoo! • Drivers – 500M unique users per month – Billions of interesting events per day – “Data analysis is the inner-loop at Yahoo!” • Yahoo! Grid Vision and Focus – On-demand, shared access to vast pool of resources – Support for massively parallel execution (1000s of processors) – Data Intensive Super Computing (DISC) – Centrally provisioned and managed – Service-oriented, elastic • What We’re Not – Not “Grid” in the sense of scientific community (Globus, etc) – Not focused on public or 3rd-party utility (Amazon EC2/S3, etc) Yahoo! @ ApacheCon 3
  • 4. Yahoo! Grid Services • Operate multiple grids within Yahoo! • 10,000s nodes, 100,000s cores, TBs RAM, PBs disk • Support large internal user community – Account management, training, etc • Manage data needs – Ingest TBs per day • Deploy and manage software (Hadoop, Pig, etc) Yahoo! @ ApacheCon 4
  • 5. “Grid” Computing • What we really do is utility computing – Nobody knows what “utility computing” is, but everyone has heard of “grid computing” • Grid computing implies sharing across resources owned by multiple, independent organizations • Utility computing implies sharing one owner’s resources by multiple, independent customers • Ultimate goal is to provide shared compute and storage resources – Instead of going to a hardware committee to provision balkanized resources, a project allocates a part of its budget for use on Yahoo!’s shared grids – Pay as you go • Only buy 100 computers for 15 minutes of compute time vs. 100 computers 24x7 Yahoo! @ ApacheCon 5
  • 6. What is a Yahoo! Grid Service? • Thousands of machines using basic network hardware – It’s hard to program for many machines • Clustering and sharing software – Hadoop, HOD, Torque, Maui, and other bits... • Petabytes of data – It’s an engineering challenge to load so much data from many sources • Attached development environment – A clean, well lit place to interact with a grid • User support – Learning facilitation • Usage tracking and billing – Someone has to pay the bills... Yahoo! @ ApacheCon 6
  • 7. Quick MapReduce Overview • Application writer specifies Input 0 Input 1 Input 2 – two functions: Map and Reduce – set of input files • Workflow Map 0 Map 1 Map 2 – input phase generates a number of FileSplits from input files (one per Map Task) Shuffle – Map phase executes user function to transform key/value inputs into new key/value outputs Reduce 0 Reduce 1 – Framework sorts and shuffles – Reduce phase combines all k/v’s with the same key into new k/v’s – Output phase writes the resulting Out 0 Out 1 pairs to files • cat * | grep | sort | uniq -c | cat > out Yahoo! @ ApacheCon 7
  • 8. Hadoop MapReduce: Process Level • Job – Map Function + Reduce Function + List of inputs Job Job Tracker Task Task Tracker Tracker Task Task Task Task Yahoo! @ ApacheCon 8
  • 9. Hadoop Distributed File System Data FSFS FS FS Node Data Status Name Metadata Node Data Status Data FSFS FS FS Node Yahoo! @ ApacheCon 9
  • 10.
  • 11. Gateways: Multi-user Land • Provided for two main purposes – Meaningful development interaction with a compute cluster • High bandwidth, low latency, and few network barriers enable a tight development loop when creating MapReduce jobs – Permission and privilege separation • Limit exposure to sensitive data – Hadoop 0.15 and lower lack users, permissions, etc. – Hadoop 0.16 has users and “weak” permissions • Characteristics – “Replacement” Lab machine – World-writable local disk space • Any single-threaded processing • Code debugging Gateway Yahoo! @ ApacheCon 11
  • 12. Compute Cluster and Name Nodes • Compute Cluster Node – Users cannot login! – All nodes run both MapReduce and HDFS frameworks – usually 500 to 2000 machines – Each cluster kept relatively homogenous – Hardware configuration Task • 2xSockets (2 or 4 core) Tracker • 4x500-750G Name • 6G-8G RAM Node Data Node • Name Node – 16G RAM • 14G Java heap = 18-20 million files Yahoo! @ ApacheCon 12
  • 13. Queuing and Scheduling • Hadoop does not have an advanced scheduling system – MapReduce JobTracker manages one or more jobs running within a set of machines – Works well for “dedicated” applications, but does not work so well for shared resources • Grid Services are intended to be a shared multi-user, multi-application environment – Need to combine Hadoop with an external queuing and scheduling system... Yahoo! @ ApacheCon 13
  • 14. Hadoop On Demand (HoD) Resource • Wrapper around PBS commands Mgmt/Sched – We use freely available Torque and Maui • Big win: virtual private JobTracker clusters maui – Job isolation – Users create clusters of the size they need pbs_server – Submit jobs to their private JT • Big costs: – Lose data locality pbs_mom – Increased complexity – Lose a node for private JobTracker – Single reducer doesn’t free unused nodes • ~ 30% efficiency lost! • Looking at changing Hadoop scheduling – Task scheduling flexibility combined with node elasticity Yahoo! @ ApacheCon 14
  • 15. HoD Job Scheduling 1+3 Nodes 1+4 Nodes 1+5 Nodes T T T T J T T T J NN Yahoo! @ ApacheCon 15
  • 16. The Reality of a 1000 Node Grid 200 400 500 10 Yahoo! @ ApacheCon 16
  • 17. Putting It All Together Gateways RM JT DN TT DN JT DN TT DN TT DN JT DN NN Yahoo! @ ApacheCon 17
  • 18. Network Configuration Core1 Core2 Core3 Core4 GE 2xGE Switch Switch Switch 40 hosts/racks H H H H H H H H H Yahoo! @ ApacheCon 18
  • 19. Yahoo!’s Next Generation Grid Infrastructure A Work In Progress 19
  • 20. Background Information • Internal deployments – Mostly Yahoo! proprietary technologies • M45 – Educational outreach grid – Non-Yahoo!’s using Yahoo! resources • Legal required us not to use any Y! technology! • Decision made to start from scratch! – Hard to share best practices – Potential legal issues – Don’t want to support two ways to do the same operation • Internal grids converting to be completely OSS as possible – Custom glue code to deal with any Y!<-->OSS incompatibilities • user and group data Yahoo! @ ApacheCon 20
  • 21. Naming and Provisioning Services • Naming services – Kerberos for secure authentication – DNS for host resolution – LDAP for everything else • ISC DHCP – Reads table information from LDAP – In pairs for redundancy • Kickstart – We run RHEL 5.x – base image + bcfg2 • bcfg2 – host customization – centralized configuration management Yahoo! @ ApacheCon 21
  • 22. NFS for Multi-user Support • NFS – Home Directories – Project Directories • Group shared data • Grids with service level agreements (SLAs) shouldn’t use NFS ! – Single point of failure • HA-NFS == $$$ – Performance – Real data should be in HDFS Yahoo! @ ApacheCon 22
  • 23. Big Picture Remote Data Center LDAP/KRB5/ LDAP/KRB5/ MMR DNS DNS Read Only Replication DHCP BCFG2/YUM/ LDAP/KRB Slaves Data DHCP Farm GRID1 GRID2 ... GRIDx NFS: home project (per grid) Yahoo! @ ApacheCon 23
  • 24. Self Healing/Reporting • Torque – Use the Torque node health mechanism to disable/fix ‘sick’ nodes • Great reduction in amount of support issues • Address problems in bulk • Nagios – Usual stuff – Custom hooks into Torque • Simon – Yahoo!’s distributed cluster and application monitoring tools – Similar to Ganglia – On the roadmap to be open sourced Yahoo! @ ApacheCon 24
  • 25. Node Usage Report Yahoo! @ ApacheCon 25
  • 26. Ranges and Groups • Range: group of hosts – example: @GRID == all grid hosts – custom tools to manipulate hosts based upon ranges: • ssh -r @GRID uptime – Report uptime on all of the hosts in @GRID • Netgroup – Used to implement ranges – The most underrated naming service switch ever? • Cascaded! • Scalable! • Supported in lots of useful places! – PAM (e.g., _succeed_if on Linux) – NFS Yahoo! @ ApacheCon 26
  • 27. Some Links • Apache Hadoop – http://hadoop.apache.org/ • Yahoo! Hadoop Blog – http://developer.yahoo.com/blogs/hadoop/ • M45 Press Release – http://research.yahoo.com/node/1879 • Hadoop Summit and DISC Slides – http://research.yahoo.com/node/2104 Yahoo! @ ApacheCon 27
  • 28. 28