SlideShare una empresa de Scribd logo
1 de 63
Hadoop MapReduce
Introduction and Deep Insight
              July 9, 2012
                Anty Rao
       Big Data Engineering Team
              Hanborq Inc.
Outline
•   Architecture
•   Job Tracker
•   Task Tracker
•   Map/Reduce internal
•   Optimization
•   YARN



                            2
Architecture

 MapReduce             RPC
                                                  JobTracker
   Client



                                   beat
                              Heart


        TaskTracker                               TaskTracker                   TaskTracker



Child      Child      Child               Child      Child      Child   Child      Child      Child
JVM        JVM        JVM                 JVM        JVM        JVM     JVM        JVM        JVM




                                                                                                  3
Job Tracker




              4
Job Tracker
• Manages cluster resources
• Job scheduling




                              5
Implementation Overview




                          6
ExpireLaunchingTasks
• A thread to timeout tasks that have been
  assigned to task trackers, but have not
  reported back yet.
• After get report from task tracker, task
  tracker take over the responsibility of
  monitoring task execution, such as killing
  unresponsive task.



                                               7
ExpireTrackers
• Used to monitor task tracker status, expire
  Task tracers that have gone down.
• After task tracker die, reschedule all tasks
  reside on dead task tracker.




                                                 8
RetireJobs
• Used to remove old finished Jobs that
  have been around too long.
• Job tracker can’t retain all finished job’s
  info
• There is also a upper limit on # of job info
  on a per-user basis.




                                                 9
JobInitThread
• Used to initialize jobs that have just been
  created.
• Job initialization including
  – Create split info per map
  – Create map tasks
  – Create reduce tasks




                                                10
TaskCommitQueue
• A thread which does all of the HDFS FS-
  related operations for task
  – Promote outputs of COMMIT_PENDING tasks
  – Discard outputs for FAILED/KILLED tasks
• All local file system related operation is in
  charge of task trackers.




                                                  11
HTTP Server
• Supply job tracker status
• Supply all job status
  – Per job metrics
• Supply history job status




                              12
Key Data Structures
• JobInProcess
   – Maintain all the info for keeping a Job on the straight and narrow.
   – It keeps its JobProfile and its latest JobStatus, plus a set of
     tables for doing bookkeeping of its tasks
   – Penalize task tracker for each of the jobs which had any tasks
     running on it when it was lost.

• TaskInProgress
   – Maintain all the info needed for a task in the lifetime of its owning
     job.
   – A give task might be speculatively executed or re-executed.
   – Maintain multiple task states for different task attempts,




                                                                        13
The whole life of a job




                          14
15
The life of a job
• Client
  – User create custom mapper, reducer; Client
    compute splits, upload job configuration file, jar
    file, split meta info onto HDFS
  – Submit job to job tracker
• Job Tracker
  – Initialize job, read in job split info, determine final
    # maps, create all needed map tasks and reduce
    tasks; create all needed structures to represent
    these tasks
  – Tasks pulled by task tracker through heartbeats


                                                          16
The life of a job
• Task Tracker
  – Through heartbeats pull tasks from job tracker
  – Initialize job, only once per job
  – Initialize task
     • Download all needed jar file, configuration file,
       distributed cache from HDFS to local disk
     • Create staging working directory for task on local disk
     • Localize configuration file
  – Create java launching options, setup the Child
    JVM




                                                                 17
The life of a job
• Child JVM
  – RPC with task tracker to get it’s task info
  – Actually do the dirty chore : execute map or
    reduce function, during this period it report status
    regularly in case being killed by task tracker.
  – retrieve map complete event from task tracker, if
    needed. Report fetch failure to TT
  – When task done, report COMMIT_PENDING or
    SUCCEEDED state to TT


                                                       18
Task Tracker




               19
Task Tracker
• Per-node agent
• Manage tasks




                           20
Implementation
Overview




                 21
TT Main Thread
• Heartbeat with JJ periodically to report task
  status, retrieve directives which includs
  launch task action, kill job action, kill task
  action
• Kill unresponsive task within configured time
  period
• If there isn’t enough disk space to
  accommodate all running task, pick tasks to
  kill
• In case TT expire , reinitialize itself.


                                                   22
taskCleanupThread
• Thread dedicate to process clean up
  actions assigned by JJ
  – Kill job action
  – Kill task action




                                        23
directoryCleanupThread
• Before task executing, create a executing
  environment
  – Create staging directory
  – Copy configuration file
  – Etc
• When task running, may produce multiple
  intermediate files in local staging directory
• After job/task complete or fail, delete all
  these crappy directory and files.


                                                  24
taskLauncher
• Localize job
• Localize task
• Create a taskRunner thread to manage
  Child JVM




                                         25
TaskRunner
• It’s a Thread
• Two type
  – MapTaskRunner
  – ReduceTaskRunner
• Main duties
  – Make up the launching java Options &
    Executing Environment
  – In charge of launching, killing Child JVM.


                                                 26
MapEventsFetcherThread
• When there are tasks(reducer) in shuffle
  phase, RPC with JJ to fetch map
  completion event, on a per-job basis.




                                             27
Child JVM
• Actually execute map/reduce function
• Report status to TT periodically
• Retrieve map completion event from TT
  for reducer task if needed.




                                          28
Key data structures
• Running Jobs
  – JobID
  – JobConf
  – Set<TaskInProgress>
• TaskInProgress
  – Task
  – TaskStatus
  – TaskRunner


                              29
Map/Reduce Internal




                      30
Map/Reduce Programming Mode




         Hadoop—The Definition Guide

                                       31
Map Phase
Diagram




            32
Steps of Map Phase
• Put records emitted by map function into circle
  buffer continually
• When buffer usage space exceed
  io.sort.mb*io.sort.spill.percent, spill will start which
  will sort records by partition, key-part, then write
  out buffer onto disk, with a index file associated
  with it indicating the positions where partition
  begins.
• Merge will combine all the intermediate files into a
  single large file, plus a index file.


                                                         33
Main map-side tuning Knobs




                             34
Reduce Phase Diagram




                       35
• <property>
• <name>mapred.tasktracker.indexcache.mb</name>
• <value>10</value>
• <description> The maximum memory that a task
  tracker allows for the
•    index cache that is used when serving map outputs
  to reducers.
• </description>
• </property>




                                                     36
Steps of Reduce Phase
• Pull over data from map, if there is space
  available In memory & the size of file is
  less than
 25%*HeapSize*mapred.job.shuffle.input.b
 uffer.percent, put file in memory, else
 directly store file on disk.




                                               37
Steps of Reduce Phase(Cont.)
• Merge operation will merge and sort data
  from memory and/or disk and write result on
  disk. Merge operation come in two different
  flavors:
  – In-memory merge operation
     • In-memory merge operation can be triggered when
       accumulated memory space exceed
      mapred.job.shuffle.merge.percent.
  – On-disk merge operation
     • On-disk merge operation will be triggered when # of
       files on disk exceed configured threshold.


                                                             38
Steps of Reduce Phase(Cont.)
• When shuffle and sort complete, before
  feeding reduce function, it must satisfy the
  following constraints:
  – memory usage for buffering reduce input
   can’t exceed
    mapred.job.reduce.input.buffer.percent;
  – # of files on disk can’t exceed io.sort.factor




                                                     39
Notes about Reduce
• Shuffle & sort take up % of Reduce heap size
  to buffer shuffle data, because Reduce can’t
  start until shuffle and sort complete. As
  opposed to Map phase, which buffer size is
  determined by io.sort.mb.
• Reduce input may contains multiple files, not
  necessarily a single file. Just using a heap
  iterator to feed reduce function.


                                              40
Reduce-side
Key parameters




                 41
Optimization Tuning
• We can make use of
  mapred.job.reduce.input.buffer.percent which
  specify how much memory can be spared to
  use as reduce input buffer
• Look at the difference between the following
  cases
  – Case-1
  – Case-2
  – Case-3

                                                 42
Case-1

All reduce input reside on disk
Case-2

Partial data in memory ,plus data on
disk as reduce input
Case-3

Much better, all data in memory
• If reduce function don’t stress memory too
  much, we can spare some memory to
  buffer reduce input to boost overall
  performance.
• What’s more, if input data is small, we can
  let reduces hold all intermediate data in
  memory, not involving disk access.



                                            46
Optimization




               47
Shuffle:
 Netty Server & Batch Fetch (1)
• Less TCP connection overhead.
• Reduce the effect of TCP slow start.
• More important, better shuffle schedule in
  Reduce Phase result in better overall
  performance.
Shuffle:
 Netty Server & Batch Fetch (2)
One connection per map               Batch fetch
• Each fetch thread in reduce        •   Fetch thread copy multiple map
                                         outputs per connection.
  copy one map output per
                                     •   This fetch thread take over this TT,
  connection, even there are             other fetch threads can’t fetch
  many outputs in TT.                    outputs from this TT during coping
                                         period.



                                vs
Sort Avoidance
• Many real-world jobs require shuffling, but not sorting. And the
  sorting bring much overhead.
    – Hash Aggregation
    – Hash Join
    – … etc.

• When sorting is turned off, the mapper feeds data to the reducer
  which directly passes the data to the Reduce() function bypassing
  the intermediate sorting step.
    – Spilling, Partitioning, Merging and Reducing will be more efficient.

• How to turn off sorting?
    – JobConf job = (JobConf) getConf();
    – job.setBoolean("mapred.sort.avoidance", true);

• MAPREDUCE-4039
Sort Avoidance: Spill and Partition
• When spills, records compare by partition
  only.
• Partition comparison using counting sort [O(n)],
  not quick sort [O(nlog n)].
Sort Avoidance: Early Reduce
          (Remove shuffle barrier)
• Currently reduce function can’t start until
  all map outputs have been fetched already.
• When sort is unnecessary, reduce function
  can start as soon as there is any map
  output available.
• Greatly improve overall performance!
Sort Avoidance: Bytes Merge
• No overhead of
  key/value
  serialization/deseriali
  zation, comparison
• Don’t take care of
  records, just bytes
• Just concatenate
  byte streams
  together – read in
  bytes, write out bytes.
Sort Avoidance:
       Sequential Reduce Input
• Sequential read input files to feed reduce
  function, So no disk seeks, better
  performance.
YARN
(yet another resource negotiator)




                                    55
Current Limitations
• Hard partition of resources into map and
  reduce slots
  – Low resource utilization
• Lacks support for alternate paradigms
  – Iterative applications implemented using
    MapReduce are 10x slower.
  – Hacks for the likes of MPI/Graph Processing
• Lack of wire-compatible protocols
  – Client and cluster must be of sameversion
  – Applications and work flows cannot migrate to
    different clusters

                                                    56
Current Limitations(Cont.)
• Scalability
  – Maximum Cluster size – 4,000 nodes
  – Maximum concurrent tasks–40,000
  – Coarse synchronization in JobTracker
• Single point of failure
  – Failure kills all queued and running jobs
  – Jobs need to be re-submitted by user
• Restart is very tricky due to complex state



                                                57
Yarn Architecture




                    58
Architecture
• Resource Manager
  – Global resource scheduler
  – Hierarchical queues
• Node Manager
  – Per-machine agent
  – Manages the life-cycle of container
  – Container resource monitoring
• Application Master
  – Per-application
  – Manages application scheduling and task execution
  – E.g. MapReduce Application Master


                                                        59
Design Centre
• Split up the two major functions of
  JobTractor
  – Cluster resource management
  – Application life-cycle management
• MapReduce becomes user-land library




                                        60
Code
• MapReduce Classic
  – Mess
• Yarn
  – better




                       61
Questions?



ant.rao@gmail.com




                    62
Secondary Sort
• Want to sort by value
• Solution
  – setOutputKeyComparatorClass
  – setOutputValueGroupingComparator
  – Partitioner




                                       63

Más contenido relacionado

La actualidad más candente

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
Nisanth Simon
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
Mohamed Ali Mahmoud khouder
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
DataWorks Summit
 

La actualidad más candente (20)

Apache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce OverviewApache hadoop, hdfs and map reduce Overview
Apache hadoop, hdfs and map reduce Overview
 
03 Hadoop
03 Hadoop03 Hadoop
03 Hadoop
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
03 pig intro
03 pig intro03 pig intro
03 pig intro
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Hanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduceHanborq Optimizations on Hadoop MapReduce
Hanborq Optimizations on Hadoop MapReduce
 
Hadoop architecture by ajay
Hadoop architecture by ajayHadoop architecture by ajay
Hadoop architecture by ajay
 
01 hbase
01 hbase01 hbase
01 hbase
 
Setting up a big data platform at kelkoo
Setting up a big data platform at kelkooSetting up a big data platform at kelkoo
Setting up a big data platform at kelkoo
 
[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery[Altibase] 13 backup and recovery
[Altibase] 13 backup and recovery
 
Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14Hadoop single node installation on ubuntu 14
Hadoop single node installation on ubuntu 14
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overviewHdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
Hdfs, Map Reduce & hadoop 1.0 vs 2.0 overview
 
Apache Hadoop In Theory And Practice
Apache Hadoop In Theory And PracticeApache Hadoop In Theory And Practice
Apache Hadoop In Theory And Practice
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 

Destacado

Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
Zheng Shao
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
What we know about recycle?
What we know about recycle?What we know about recycle?
What we know about recycle?
Santa T
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
Edureka!
 

Destacado (17)

Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and HadoopFacebooks Petabyte Scale Data Warehouse using Hive and Hadoop
Facebooks Petabyte Scale Data Warehouse using Hive and Hadoop
 
Pig, Making Hadoop Easy
Pig, Making Hadoop EasyPig, Making Hadoop Easy
Pig, Making Hadoop Easy
 
introduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pigintroduction to data processing using Hadoop and Pig
introduction to data processing using Hadoop and Pig
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
 
HIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on HadoopHIVE: Data Warehousing & Analytics on Hadoop
HIVE: Data Warehousing & Analytics on Hadoop
 
Hive Quick Start Tutorial
Hive Quick Start TutorialHive Quick Start Tutorial
Hive Quick Start Tutorial
 
Integration of Hive and HBase
Integration of Hive and HBaseIntegration of Hive and HBase
Integration of Hive and HBase
 
Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)Hadoop, Pig, and Twitter (NoSQL East 2009)
Hadoop, Pig, and Twitter (NoSQL East 2009)
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
 
Writing Reusable Content
Writing Reusable ContentWriting Reusable Content
Writing Reusable Content
 
What we know about recycle?
What we know about recycle?What we know about recycle?
What we know about recycle?
 
Reduce Side Joins
Reduce Side Joins Reduce Side Joins
Reduce Side Joins
 
Recycling Of Wastes
Recycling Of WastesRecycling Of Wastes
Recycling Of Wastes
 
Modern Data Architecture
Modern Data ArchitectureModern Data Architecture
Modern Data Architecture
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 

Similar a Hadoop MapReduce Introduction and Deep Insight

Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
Subhas Kumar Ghosh
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
Pallav Jha
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
Subhas Kumar Ghosh
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
Dilip Reddy
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
elliando dias
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
Indhujeni
 

Similar a Hadoop MapReduce Introduction and Deep Insight (20)

Hanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221aHanborq optimizations on hadoop map reduce 20120221a
Hanborq optimizations on hadoop map reduce 20120221a
 
MapReduce.pptx
MapReduce.pptxMapReduce.pptx
MapReduce.pptx
 
Hadoop scheduler
Hadoop schedulerHadoop scheduler
Hadoop scheduler
 
Hadoop fault-tolerance
Hadoop fault-toleranceHadoop fault-tolerance
Hadoop fault-tolerance
 
Hadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by stepHadoop deconstructing map reduce job step by step
Hadoop deconstructing map reduce job step by step
 
Spark Overview and Performance Issues
Spark Overview and Performance IssuesSpark Overview and Performance Issues
Spark Overview and Performance Issues
 
Spark architechure.pptx
Spark architechure.pptxSpark architechure.pptx
Spark architechure.pptx
 
Anatomy of Hadoop YARN
Anatomy of Hadoop YARNAnatomy of Hadoop YARN
Anatomy of Hadoop YARN
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop
HadoopHadoop
Hadoop
 
Map reduce
Map reduceMap reduce
Map reduce
 
An Introduction to Apache Spark
An Introduction to Apache SparkAn Introduction to Apache Spark
An Introduction to Apache Spark
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Distributed Data processing in a Cloud
Distributed Data processing in a CloudDistributed Data processing in a Cloud
Distributed Data processing in a Cloud
 
Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]Resource Aware Scheduling for Hadoop [Final Presentation]
Resource Aware Scheduling for Hadoop [Final Presentation]
 
Big data unit iv and v lecture notes qb model exam
Big data unit iv and v lecture notes   qb model examBig data unit iv and v lecture notes   qb model exam
Big data unit iv and v lecture notes qb model exam
 
Is hadoop for you
Is hadoop for youIs hadoop for you
Is hadoop for you
 

Más de Hanborq Inc. (11)

Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Hadoop HDFS NameNode HA
Hadoop HDFS NameNode HAHadoop HDFS NameNode HA
Hadoop HDFS NameNode HA
 
Hadoop大数据实践经验
Hadoop大数据实践经验Hadoop大数据实践经验
Hadoop大数据实践经验
 
FlumeBase Study
FlumeBase StudyFlumeBase Study
FlumeBase Study
 
Flume and Flive Introduction
Flume and Flive IntroductionFlume and Flive Introduction
Flume and Flive Introduction
 
Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
HBase Introduction
HBase IntroductionHBase Introduction
HBase Introduction
 
Hadoop Versioning
Hadoop VersioningHadoop Versioning
Hadoop Versioning
 
Hadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler IntroductionHadoop MapReduce Task Scheduler Introduction
Hadoop MapReduce Task Scheduler Introduction
 
Hadoop HDFS Detailed Introduction
Hadoop HDFS Detailed IntroductionHadoop HDFS Detailed Introduction
Hadoop HDFS Detailed Introduction
 
How to Build Cloud Storage Service Systems
How to Build Cloud Storage Service SystemsHow to Build Cloud Storage Service Systems
How to Build Cloud Storage Service Systems
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 

Hadoop MapReduce Introduction and Deep Insight

  • 1. Hadoop MapReduce Introduction and Deep Insight July 9, 2012 Anty Rao Big Data Engineering Team Hanborq Inc.
  • 2. Outline • Architecture • Job Tracker • Task Tracker • Map/Reduce internal • Optimization • YARN 2
  • 3. Architecture MapReduce RPC JobTracker Client beat Heart TaskTracker TaskTracker TaskTracker Child Child Child Child Child Child Child Child Child JVM JVM JVM JVM JVM JVM JVM JVM JVM 3
  • 5. Job Tracker • Manages cluster resources • Job scheduling 5
  • 7. ExpireLaunchingTasks • A thread to timeout tasks that have been assigned to task trackers, but have not reported back yet. • After get report from task tracker, task tracker take over the responsibility of monitoring task execution, such as killing unresponsive task. 7
  • 8. ExpireTrackers • Used to monitor task tracker status, expire Task tracers that have gone down. • After task tracker die, reschedule all tasks reside on dead task tracker. 8
  • 9. RetireJobs • Used to remove old finished Jobs that have been around too long. • Job tracker can’t retain all finished job’s info • There is also a upper limit on # of job info on a per-user basis. 9
  • 10. JobInitThread • Used to initialize jobs that have just been created. • Job initialization including – Create split info per map – Create map tasks – Create reduce tasks 10
  • 11. TaskCommitQueue • A thread which does all of the HDFS FS- related operations for task – Promote outputs of COMMIT_PENDING tasks – Discard outputs for FAILED/KILLED tasks • All local file system related operation is in charge of task trackers. 11
  • 12. HTTP Server • Supply job tracker status • Supply all job status – Per job metrics • Supply history job status 12
  • 13. Key Data Structures • JobInProcess – Maintain all the info for keeping a Job on the straight and narrow. – It keeps its JobProfile and its latest JobStatus, plus a set of tables for doing bookkeeping of its tasks – Penalize task tracker for each of the jobs which had any tasks running on it when it was lost. • TaskInProgress – Maintain all the info needed for a task in the lifetime of its owning job. – A give task might be speculatively executed or re-executed. – Maintain multiple task states for different task attempts, 13
  • 14. The whole life of a job 14
  • 15. 15
  • 16. The life of a job • Client – User create custom mapper, reducer; Client compute splits, upload job configuration file, jar file, split meta info onto HDFS – Submit job to job tracker • Job Tracker – Initialize job, read in job split info, determine final # maps, create all needed map tasks and reduce tasks; create all needed structures to represent these tasks – Tasks pulled by task tracker through heartbeats 16
  • 17. The life of a job • Task Tracker – Through heartbeats pull tasks from job tracker – Initialize job, only once per job – Initialize task • Download all needed jar file, configuration file, distributed cache from HDFS to local disk • Create staging working directory for task on local disk • Localize configuration file – Create java launching options, setup the Child JVM 17
  • 18. The life of a job • Child JVM – RPC with task tracker to get it’s task info – Actually do the dirty chore : execute map or reduce function, during this period it report status regularly in case being killed by task tracker. – retrieve map complete event from task tracker, if needed. Report fetch failure to TT – When task done, report COMMIT_PENDING or SUCCEEDED state to TT 18
  • 20. Task Tracker • Per-node agent • Manage tasks 20
  • 22. TT Main Thread • Heartbeat with JJ periodically to report task status, retrieve directives which includs launch task action, kill job action, kill task action • Kill unresponsive task within configured time period • If there isn’t enough disk space to accommodate all running task, pick tasks to kill • In case TT expire , reinitialize itself. 22
  • 23. taskCleanupThread • Thread dedicate to process clean up actions assigned by JJ – Kill job action – Kill task action 23
  • 24. directoryCleanupThread • Before task executing, create a executing environment – Create staging directory – Copy configuration file – Etc • When task running, may produce multiple intermediate files in local staging directory • After job/task complete or fail, delete all these crappy directory and files. 24
  • 25. taskLauncher • Localize job • Localize task • Create a taskRunner thread to manage Child JVM 25
  • 26. TaskRunner • It’s a Thread • Two type – MapTaskRunner – ReduceTaskRunner • Main duties – Make up the launching java Options & Executing Environment – In charge of launching, killing Child JVM. 26
  • 27. MapEventsFetcherThread • When there are tasks(reducer) in shuffle phase, RPC with JJ to fetch map completion event, on a per-job basis. 27
  • 28. Child JVM • Actually execute map/reduce function • Report status to TT periodically • Retrieve map completion event from TT for reducer task if needed. 28
  • 29. Key data structures • Running Jobs – JobID – JobConf – Set<TaskInProgress> • TaskInProgress – Task – TaskStatus – TaskRunner 29
  • 31. Map/Reduce Programming Mode Hadoop—The Definition Guide 31
  • 33. Steps of Map Phase • Put records emitted by map function into circle buffer continually • When buffer usage space exceed io.sort.mb*io.sort.spill.percent, spill will start which will sort records by partition, key-part, then write out buffer onto disk, with a index file associated with it indicating the positions where partition begins. • Merge will combine all the intermediate files into a single large file, plus a index file. 33
  • 36. • <property> • <name>mapred.tasktracker.indexcache.mb</name> • <value>10</value> • <description> The maximum memory that a task tracker allows for the • index cache that is used when serving map outputs to reducers. • </description> • </property> 36
  • 37. Steps of Reduce Phase • Pull over data from map, if there is space available In memory & the size of file is less than 25%*HeapSize*mapred.job.shuffle.input.b uffer.percent, put file in memory, else directly store file on disk. 37
  • 38. Steps of Reduce Phase(Cont.) • Merge operation will merge and sort data from memory and/or disk and write result on disk. Merge operation come in two different flavors: – In-memory merge operation • In-memory merge operation can be triggered when accumulated memory space exceed mapred.job.shuffle.merge.percent. – On-disk merge operation • On-disk merge operation will be triggered when # of files on disk exceed configured threshold. 38
  • 39. Steps of Reduce Phase(Cont.) • When shuffle and sort complete, before feeding reduce function, it must satisfy the following constraints: – memory usage for buffering reduce input can’t exceed mapred.job.reduce.input.buffer.percent; – # of files on disk can’t exceed io.sort.factor 39
  • 40. Notes about Reduce • Shuffle & sort take up % of Reduce heap size to buffer shuffle data, because Reduce can’t start until shuffle and sort complete. As opposed to Map phase, which buffer size is determined by io.sort.mb. • Reduce input may contains multiple files, not necessarily a single file. Just using a heap iterator to feed reduce function. 40
  • 42. Optimization Tuning • We can make use of mapred.job.reduce.input.buffer.percent which specify how much memory can be spared to use as reduce input buffer • Look at the difference between the following cases – Case-1 – Case-2 – Case-3 42
  • 43. Case-1 All reduce input reside on disk
  • 44. Case-2 Partial data in memory ,plus data on disk as reduce input
  • 45. Case-3 Much better, all data in memory
  • 46. • If reduce function don’t stress memory too much, we can spare some memory to buffer reduce input to boost overall performance. • What’s more, if input data is small, we can let reduces hold all intermediate data in memory, not involving disk access. 46
  • 48. Shuffle: Netty Server & Batch Fetch (1) • Less TCP connection overhead. • Reduce the effect of TCP slow start. • More important, better shuffle schedule in Reduce Phase result in better overall performance.
  • 49. Shuffle: Netty Server & Batch Fetch (2) One connection per map Batch fetch • Each fetch thread in reduce • Fetch thread copy multiple map outputs per connection. copy one map output per • This fetch thread take over this TT, connection, even there are other fetch threads can’t fetch many outputs in TT. outputs from this TT during coping period. vs
  • 50. Sort Avoidance • Many real-world jobs require shuffling, but not sorting. And the sorting bring much overhead. – Hash Aggregation – Hash Join – … etc. • When sorting is turned off, the mapper feeds data to the reducer which directly passes the data to the Reduce() function bypassing the intermediate sorting step. – Spilling, Partitioning, Merging and Reducing will be more efficient. • How to turn off sorting? – JobConf job = (JobConf) getConf(); – job.setBoolean("mapred.sort.avoidance", true); • MAPREDUCE-4039
  • 51. Sort Avoidance: Spill and Partition • When spills, records compare by partition only. • Partition comparison using counting sort [O(n)], not quick sort [O(nlog n)].
  • 52. Sort Avoidance: Early Reduce (Remove shuffle barrier) • Currently reduce function can’t start until all map outputs have been fetched already. • When sort is unnecessary, reduce function can start as soon as there is any map output available. • Greatly improve overall performance!
  • 53. Sort Avoidance: Bytes Merge • No overhead of key/value serialization/deseriali zation, comparison • Don’t take care of records, just bytes • Just concatenate byte streams together – read in bytes, write out bytes.
  • 54. Sort Avoidance: Sequential Reduce Input • Sequential read input files to feed reduce function, So no disk seeks, better performance.
  • 55. YARN (yet another resource negotiator) 55
  • 56. Current Limitations • Hard partition of resources into map and reduce slots – Low resource utilization • Lacks support for alternate paradigms – Iterative applications implemented using MapReduce are 10x slower. – Hacks for the likes of MPI/Graph Processing • Lack of wire-compatible protocols – Client and cluster must be of sameversion – Applications and work flows cannot migrate to different clusters 56
  • 57. Current Limitations(Cont.) • Scalability – Maximum Cluster size – 4,000 nodes – Maximum concurrent tasks–40,000 – Coarse synchronization in JobTracker • Single point of failure – Failure kills all queued and running jobs – Jobs need to be re-submitted by user • Restart is very tricky due to complex state 57
  • 59. Architecture • Resource Manager – Global resource scheduler – Hierarchical queues • Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring • Application Master – Per-application – Manages application scheduling and task execution – E.g. MapReduce Application Master 59
  • 60. Design Centre • Split up the two major functions of JobTractor – Cluster resource management – Application life-cycle management • MapReduce becomes user-land library 60
  • 61. Code • MapReduce Classic – Mess • Yarn – better 61
  • 63. Secondary Sort • Want to sort by value • Solution – setOutputKeyComparatorClass – setOutputValueGroupingComparator – Partitioner 63