SlideShare a Scribd company logo
1 of 29
HADOOP

        HPC4 Seminar
            IPM
       December 2011




                               Omid Djoudi
                        od90125@yahoo.com


2011       IPM - HPC4                    1
Hadoop
Scale up: Multi Processing machines -> expensive
Scale out: Commodity hardware

Cost efficiency = Cost / Performance
-> Commodity hardware 12 times higher than SMP


Communication between nodes faster in SMP
But for data intensive applications, workload requires a cluster of
  machines
-> network transfer inevitable


2011                              IPM - HPC4                      2
Hadoop
Hadoop :
Open-source framework for implementing Map/Reduce in a distributed
  environment.


Initially developed in Yahoo, Google.
-> Map/Reduce Framework : Yahoo.
-> HDFS – Hadoop Distributed File System : Google GFS.


Moved to open-source license - Apache Project.

Yahoo(20000 servers), Google, Amazon, Ebay, Facebook

2011                              IPM - HPC4                         3
Hadoop
Suitable for processing TB, PB of datas

Reliability – Commodity machines have less reliable disks
-> mean time between failure = 1000 days
-> 10000 server cluster experience 10 failures a day


Redundant distribution and processing
-> Data is distributed in n replicas
-> Code is spread m times in “slots” across cluster. m > n




2011                                IPM - HPC4               4
Hadoop
Sequential access
-> Data too big to fit in memory, random access expensive
-> Data access sequentially, no seek, no binary tree search


“Shared nothing” architecture
-> State would be un-maintanable in a highly asynchronous environment


Values represented by a list of <key,value>
-> Limit explicit communication between nodes
-> Keys provide information used to move data in clusters



2011                               IPM - HPC4                           5
HDFS
Distributed File System – Decouple namespace from data.
Partition a dataset across a cluster

File system targeted at “very large” files – TB, PB
-> Small number of files to increase namespace management


Files written once – no update or append
-> Optimisation required for distributing files in blocks


Fault-tolerance, redundancy of data
Targeted at batch processing : High throughput, but high latency!
2011                                 IPM - HPC4                     6
HDFS
Files divided in blocks (64MB, 128MB) if size(file)>size(block)
->Each file / block is the atomic input for a map instance.


HDFS block >> Disk block to reduce the number of disk seeks
  compared to the data load. Allow streaming

Block simplifies the storage and management:
Metadata maintained outside the data, separation of security and failure
  management from the intensive disk operations.


Replication done at block level.

2011                                    IPM - HPC4                         7
HDFS




2011   IPM - HPC4   8
HDFS
Replica placement strategy - Minimise transfer across rack
  network switches while keeping load balancing ratio – 1
  replica on local rack, 2 on remote racks
       -> write bandwitdh optimisation: transit by 2 network switch instead of 3.


Affinity – map executed on nodes where block is present. If not
   possible, use rack awareness to minimise distance between
   process and data -> move program to data.

Cluster rebalancing – Additional replicas can be created
   dynamically if high demand.

2011                                      IPM - HPC4                                9
HDFS
Scalability and performance limited by single namespace server
architecture

NameNode and DataNode decoupled for scalability:
-> Metadata operation is fast, Data operation is heavy and slow
-> If one server for both, data operation would dominate, bottleneck in
    namespace response


Whole namespace is in RAM + periodic backup to disk (journal)
-> limitation on the number of files:
1 GB metadata = 1 PB physical storage

2011                                IPM - HPC4                            10
Submission




2011      IPM - HPC4   11
Submission




2011      IPM - HPC4   12
Communications
Synchronisation:
(DataNode-> NameNode)

Heartbeat (every 3 seconds) :
- Total disk
- Used disk
- Number of data transferred by the node (used for load balancing)

Block report (every hour or on demand) :
- List of Block IDs
- Length
- Generation stamp


2011                                 IPM - HPC4                      13
Communications
Synchronisation:
(NameNode->DataNode)

Reply to heartbeat from dataNode

Contains information:
- Replicate block to other nodes
- Remove local replica
- Shut dowm
- Send block report




2011                               IPM - HPC4   14
Communications
Synchronisation:
(TaskTracker>JobTracker)

Heartbeat :
- Available slots for map and reduce
- Pull mode


(JobTracker>TaskTracker)

Heartbeat :
- Task allocation information


2011                                   IPM - HPC4   15
Benchmark
Tera-Sort benchmark

1800 machines dual 2GHz Intel Xeon with hyperthreading, 4GO
  memory
maximum 1 reduce per machine
10^10 * 100 bytes records (1 TB of input data) – records
  normally distributed to have balanced reducers
Map: Extract 10 bytes = key, original line = value, emit (key,value)
Reduce: Identity function
M = 10000, M_size = 64MB, R = 4000

2011                          IPM - HPC4                          16
Benchmark
Input rate – peak at 13 MB/s – Stops after map
   phase has finished
Higher than shuffle and reduce rate because of
   data locality.


Shuffle rate - starts as soon as first map output has
   been generated. Stops after 1800 – end of first
   batch of reducers (1 reducer per machine) and
   begin after first reducers finish their processing

Reduce rate – first writes rates are higher, and then
   there is the second round of shuffles which
   begin again so the rate will slightly decrease.
   Rate is lower than shuffle -> 2 copies generated
   for output.

2011                                    IPM - HPC4      17
Tuning
Increase number of reducers

If more reducers than available slots, faster machines will
   execute more instances of reducers

+ Increase in load balancing
+ Decrease cost of failure
- Increase in global overhead




2011                            IPM - HPC4                    18
Tuning
In-mapper combining
Combiner execution is optional and left to the decision of framework.

To force aggregation in map() :
- State preservation - iterations executed in a single JVM instance
- Emit the whole result at once in a hook

map (filename, file-contents):
   array = new associative_array(int,int);
   for each number in file-contents:
       array[number] += number^2
   foreach number of array
       emit (number, number^2)


2011                               IPM - HPC4                           19
Tuning
Configuration parameters

Map parameters
io.sort.mb: Size of memory buffer for the map output
io.sort.spill.percent: Percent of filling of the buffer before writing to disk.
The writing is done in background, but the process stops if the buffer is full and the
    disk is not fast enough to let to flushing completed.
-> Increase the buffer size if map functions are small
-> Increase the (buffer size / spill percent ) to optimise fluidity – Possible if disk access
    can handle efficient parallel writes
task.tracker.http.threads : Number of threads in map nodes serving reduce
    requests
-> Increase if big cluster and large jobs


2011                                      IPM - HPC4                                        20
Tuning
Configuration parameters

Merge/Sort parameters
mapred.job.shuffle.input.buffer.percent = Percent of available RAM in reduce
       node for keeping the map outputs.
Write on disk after reaching mapred.job.shuffle.merge.percent of memory or
   mapred.inmem.merge.threshold number of files


-> Increase the memory usage if reduce tasks are small and number of
   mappers not much bigger than number of reducers



2011                                       IPM - HPC4                          21
Tuning
Configuration parameters

io.sort.factor
Merge factor : Number of rounds for creating the merged file received from the map
outputs – The number of input files for reduce will be
         nb_received_map / io.sort.factor


-> Increase if high availability on memory on nodes.
-> Take into account mapred.job.shuffle.input.buffer.percent which will reduce the
    available memory for the merge factor




2011                                  IPM - HPC4                                     22
Scheduler
The scheduler is based on jobs and not tasks.

FIFO
Each job will use all the available resources, penalising other users

Fair scheduler
Cluster is shared fairly between different users
-> Pool of jobs per user
-> Preemption if new jobs change the sharing balance and make a pool more resource
    intensive

No affinity scoring calculated for tasks during scheduling sessions. The data affinity is
    performed after the task selection by the scheduler
-> This is a serious handicap for a data grid!


2011                                     IPM - HPC4                                         23
Weakness
JobTracker tied to resources:
-> leverage to a pool of available resources more difficult
-> No dynamic scalability, resource planning should be fixed in advance
-> More difficult to create SLA in shared grid


Pull mode between task tracker and job tracker
-> Peak, valley issue – idle periods between polling times. We can increase heartbeat
    frequency but risk of network saturation


No possibility to pin resources (slave nodes) to a job




2011                                   IPM - HPC4                                       24
Weakness
Reduce phase can only begin after end of map phase
M= nbr of maps, R = nbr of reduces
M_slots = nbr of map slots, R_slots = nbr of reduce slots
Tm = average duration of maps. Tr = average duration of reduces
Total time = Tm* min (1, M/M_slots) + Tr * min (1, R/R_slots)

If reduce phase can begin as soon as first map result avaiable:
-> R will be bigger as there will be minimum as much reduces as outputs from
    maps -> R_new = max(M,R) = M most of the time
Total time = max (Tn,Tm) * min (1, 2*M / (M_slots+R_slots))




2011                              IPM - HPC4                              25
ADDONS - HBASE
Database – storage by on column
-> Datawarehousing

Based on Google BigTable
Stores huge data
Performant access elements by (row, column)

All columns not necessary present for all lines!




2011                                IPM - HPC4     26
ADDONS - PIG LATIN
High level data flow language based on Hadoop. Used for data analysis.


  fileA:
  User1, a
  User1, b
  User2, c

  Log = Load ‘fileA’ (user, value);
  Grp = GROUP Log by user;                              outputFile:
  Count=FOREACH Grp GENERATE group,                     User1, 2
     COUNT(Log);
                                                        User2, 1
  STORE Count INTO ‘outputFile’;



2011                              IPM - HPC4                             27
ADDONS - HIVE
High level language –SQL like load, query



CREATE TABLE T (a INT, b STRING)…
LOAD DATA INPATH “file_name” INTO TABLE T;
SELECT * FROM …

Allow joins and more powerful features




2011                               IPM - HPC4   28
CONCLUSION
• Hadoop is a fair middleware for distributed data processing

• Restrictive usage: High volume data and <key,value>
  processing

• No clear separation between process and resource
  management

• But…very active project, evolution will bring improvements



2011                        IPM - HPC4                          29

More Related Content

What's hot

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Cheng-Hsuan Li
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuningAnil Reddy
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsDataWorks Summit
 
MapReduce Container ReUse
MapReduce Container ReUseMapReduce Container ReUse
MapReduce Container ReUseHortonworks
 
Meethadoop
MeethadoopMeethadoop
MeethadoopIIIT-H
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.Kyong-Ha Lee
 
Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013Dale McInnis
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Seriesselvaraaju
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterDataWorks Summit
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialFarzad Nozarian
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterShivraj Raj
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyKyong-Ha Lee
 
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...xKinAnx
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 

What's hot (20)

Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...Exploring hybrid memory for gpu energy efficiency through software hardware c...
Exploring hybrid memory for gpu energy efficiency through software hardware c...
 
White paper hadoop performancetuning
White paper hadoop performancetuningWhite paper hadoop performancetuning
White paper hadoop performancetuning
 
Compression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of TradeoffsCompression Options in Hadoop - A Tale of Tradeoffs
Compression Options in Hadoop - A Tale of Tradeoffs
 
HDFS Internals
HDFS InternalsHDFS Internals
HDFS Internals
 
MapReduce Container ReUse
MapReduce Container ReUseMapReduce Container ReUse
MapReduce Container ReUse
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013Db2 recovery IDUG EMEA 2013
Db2 recovery IDUG EMEA 2013
 
Unit 1
Unit 1Unit 1
Unit 1
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
MapR Tutorial Series
MapR Tutorial SeriesMapR Tutorial Series
MapR Tutorial Series
 
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at TwitterHadoop Performance Optimization at Scale, Lessons Learned at Twitter
Hadoop Performance Optimization at Scale, Lessons Learned at Twitter
 
Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
 
Apache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce TutorialApache Hadoop MapReduce Tutorial
Apache Hadoop MapReduce Tutorial
 
Schedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop clusterSchedulers optimization to handle multiple jobs in hadoop cluster
Schedulers optimization to handle multiple jobs in hadoop cluster
 
Hadoop2.2
Hadoop2.2Hadoop2.2
Hadoop2.2
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
Ibm spectrum scale fundamentals workshop for americas part 5 spectrum scale_c...
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
H04502048051
H04502048051H04502048051
H04502048051
 

Viewers also liked

01 Risk Management
01 Risk Management01 Risk Management
01 Risk ManagementOmid Djoudi
 
Molecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oMolecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oaleuribeo
 
New Product launch
New Product launchNew Product launch
New Product launchNihal Jain
 
Mortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstMortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstJustMKollum
 
Trends and issues in education present
Trends and issues in education presentTrends and issues in education present
Trends and issues in education presentLiyana Hamdan
 
Introduction to critical thinking
Introduction to critical thinkingIntroduction to critical thinking
Introduction to critical thinkingHawwa Shiuna
 
Asu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio PcmAsu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio Pcmenergy4you
 
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Nihal Jain
 
Ispring sample
Ispring sampleIspring sample
Ispring samplelumaho
 
Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Nihal Jain
 

Viewers also liked (17)

01 Risk Management
01 Risk Management01 Risk Management
01 Risk Management
 
02 Map Reduce
02 Map Reduce02 Map Reduce
02 Map Reduce
 
Molecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe oMolecular biology folding alejandra uribe o
Molecular biology folding alejandra uribe o
 
New Product launch
New Product launchNew Product launch
New Product launch
 
Media evaluation
Media evaluationMedia evaluation
Media evaluation
 
Media evaluation
Media evaluationMedia evaluation
Media evaluation
 
Seventhedition
SeventheditionSeventhedition
Seventhedition
 
Colossians 1v15
Colossians 1v15Colossians 1v15
Colossians 1v15
 
Mortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked FirstMortal Kombat- Who's Picked First
Mortal Kombat- Who's Picked First
 
Trends and issues in education present
Trends and issues in education presentTrends and issues in education present
Trends and issues in education present
 
Introduction to critical thinking
Introduction to critical thinkingIntroduction to critical thinking
Introduction to critical thinking
 
Iklan spa perak
Iklan spa perakIklan spa perak
Iklan spa perak
 
04 Algorithms
04 Algorithms04 Algorithms
04 Algorithms
 
Asu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio PcmAsu December 2010 Application For Bio Pcm
Asu December 2010 Application For Bio Pcm
 
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
Fdi (Foreign Direct Investment) a cool and simple PPT which can be used as a ...
 
Ispring sample
Ispring sampleIspring sample
Ispring sample
 
Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product. Product launch od the most innovative & unique product.
Product launch od the most innovative & unique product.
 

Similar to Hadoop Seminar Summary

Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Cloudera, Inc.
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalTommy Lee
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateDataWorks Summit
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keownCisco Canada
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and DeploymentCisco Canada
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APImcsrivas
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profilepramodbiligiri
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceSpeck&Tech
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation HadoopVarun Narang
 
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceShift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceMapR Technologies
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabadsreehari orienit
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-trainingGeohedrick
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013WANdisco Plc
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insightsOmid Vahdaty
 
Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoSciCompIIT
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalabilityWANdisco Plc
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuningVitthal Gogate
 

Similar to Hadoop Seminar Summary (20)

Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
Hadoop World 2011: Hadoop Network and Compute Architecture Considerations - J...
 
Shak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-finalShak larry-jeder-perf-and-tuning-summit14-part2-final
Shak larry-jeder-perf-and-tuning-summit14-part2-final
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Cisco connect toronto 2015 big data sean mc keown
Cisco connect toronto 2015 big data  sean mc keownCisco connect toronto 2015 big data  sean mc keown
Cisco connect toronto 2015 big data sean mc keown
 
Big Data Architecture and Deployment
Big Data Architecture and DeploymentBig Data Architecture and Deployment
Big Data Architecture and Deployment
 
MapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase APIMapR M7: Providing an enterprise quality Apache HBase API
MapR M7: Providing an enterprise quality Apache HBase API
 
Hadoop Network Performance profile
Hadoop Network Performance profileHadoop Network Performance profile
Hadoop Network Performance profile
 
Architecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for scienceArchitecting a 35 PB distributed parallel file system for science
Architecting a 35 PB distributed parallel file system for science
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
 
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL PerformanceShift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
Shift into High Gear: Dramatically Improve Hadoop & NoSQL Performance
 
Hadoop
HadoopHadoop
Hadoop
 
Hadoop training-in-hyderabad
Hadoop training-in-hyderabadHadoop training-in-hyderabad
Hadoop training-in-hyderabad
 
Hadoop online-training
Hadoop online-trainingHadoop online-training
Hadoop online-training
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Flume and Hadoop performance insights
Flume and Hadoop performance insightsFlume and Hadoop performance insights
Flume and Hadoop performance insights
 
Xian He Sun Data-Centric Into
Xian He Sun Data-Centric IntoXian He Sun Data-Centric Into
Xian He Sun Data-Centric Into
 
getFamiliarWithHadoop
getFamiliarWithHadoopgetFamiliarWithHadoop
getFamiliarWithHadoop
 
Hadoop scalability
Hadoop scalabilityHadoop scalability
Hadoop scalability
 
Hadoop configuration & performance tuning
Hadoop configuration & performance tuningHadoop configuration & performance tuning
Hadoop configuration & performance tuning
 

Hadoop Seminar Summary

  • 1. HADOOP HPC4 Seminar IPM December 2011 Omid Djoudi od90125@yahoo.com 2011 IPM - HPC4 1
  • 2. Hadoop Scale up: Multi Processing machines -> expensive Scale out: Commodity hardware Cost efficiency = Cost / Performance -> Commodity hardware 12 times higher than SMP Communication between nodes faster in SMP But for data intensive applications, workload requires a cluster of machines -> network transfer inevitable 2011 IPM - HPC4 2
  • 3. Hadoop Hadoop : Open-source framework for implementing Map/Reduce in a distributed environment. Initially developed in Yahoo, Google. -> Map/Reduce Framework : Yahoo. -> HDFS – Hadoop Distributed File System : Google GFS. Moved to open-source license - Apache Project. Yahoo(20000 servers), Google, Amazon, Ebay, Facebook 2011 IPM - HPC4 3
  • 4. Hadoop Suitable for processing TB, PB of datas Reliability – Commodity machines have less reliable disks -> mean time between failure = 1000 days -> 10000 server cluster experience 10 failures a day Redundant distribution and processing -> Data is distributed in n replicas -> Code is spread m times in “slots” across cluster. m > n 2011 IPM - HPC4 4
  • 5. Hadoop Sequential access -> Data too big to fit in memory, random access expensive -> Data access sequentially, no seek, no binary tree search “Shared nothing” architecture -> State would be un-maintanable in a highly asynchronous environment Values represented by a list of <key,value> -> Limit explicit communication between nodes -> Keys provide information used to move data in clusters 2011 IPM - HPC4 5
  • 6. HDFS Distributed File System – Decouple namespace from data. Partition a dataset across a cluster File system targeted at “very large” files – TB, PB -> Small number of files to increase namespace management Files written once – no update or append -> Optimisation required for distributing files in blocks Fault-tolerance, redundancy of data Targeted at batch processing : High throughput, but high latency! 2011 IPM - HPC4 6
  • 7. HDFS Files divided in blocks (64MB, 128MB) if size(file)>size(block) ->Each file / block is the atomic input for a map instance. HDFS block >> Disk block to reduce the number of disk seeks compared to the data load. Allow streaming Block simplifies the storage and management: Metadata maintained outside the data, separation of security and failure management from the intensive disk operations. Replication done at block level. 2011 IPM - HPC4 7
  • 8. HDFS 2011 IPM - HPC4 8
  • 9. HDFS Replica placement strategy - Minimise transfer across rack network switches while keeping load balancing ratio – 1 replica on local rack, 2 on remote racks -> write bandwitdh optimisation: transit by 2 network switch instead of 3. Affinity – map executed on nodes where block is present. If not possible, use rack awareness to minimise distance between process and data -> move program to data. Cluster rebalancing – Additional replicas can be created dynamically if high demand. 2011 IPM - HPC4 9
  • 10. HDFS Scalability and performance limited by single namespace server architecture NameNode and DataNode decoupled for scalability: -> Metadata operation is fast, Data operation is heavy and slow -> If one server for both, data operation would dominate, bottleneck in namespace response Whole namespace is in RAM + periodic backup to disk (journal) -> limitation on the number of files: 1 GB metadata = 1 PB physical storage 2011 IPM - HPC4 10
  • 11. Submission 2011 IPM - HPC4 11
  • 12. Submission 2011 IPM - HPC4 12
  • 13. Communications Synchronisation: (DataNode-> NameNode) Heartbeat (every 3 seconds) : - Total disk - Used disk - Number of data transferred by the node (used for load balancing) Block report (every hour or on demand) : - List of Block IDs - Length - Generation stamp 2011 IPM - HPC4 13
  • 14. Communications Synchronisation: (NameNode->DataNode) Reply to heartbeat from dataNode Contains information: - Replicate block to other nodes - Remove local replica - Shut dowm - Send block report 2011 IPM - HPC4 14
  • 15. Communications Synchronisation: (TaskTracker>JobTracker) Heartbeat : - Available slots for map and reduce - Pull mode (JobTracker>TaskTracker) Heartbeat : - Task allocation information 2011 IPM - HPC4 15
  • 16. Benchmark Tera-Sort benchmark 1800 machines dual 2GHz Intel Xeon with hyperthreading, 4GO memory maximum 1 reduce per machine 10^10 * 100 bytes records (1 TB of input data) – records normally distributed to have balanced reducers Map: Extract 10 bytes = key, original line = value, emit (key,value) Reduce: Identity function M = 10000, M_size = 64MB, R = 4000 2011 IPM - HPC4 16
  • 17. Benchmark Input rate – peak at 13 MB/s – Stops after map phase has finished Higher than shuffle and reduce rate because of data locality. Shuffle rate - starts as soon as first map output has been generated. Stops after 1800 – end of first batch of reducers (1 reducer per machine) and begin after first reducers finish their processing Reduce rate – first writes rates are higher, and then there is the second round of shuffles which begin again so the rate will slightly decrease. Rate is lower than shuffle -> 2 copies generated for output. 2011 IPM - HPC4 17
  • 18. Tuning Increase number of reducers If more reducers than available slots, faster machines will execute more instances of reducers + Increase in load balancing + Decrease cost of failure - Increase in global overhead 2011 IPM - HPC4 18
  • 19. Tuning In-mapper combining Combiner execution is optional and left to the decision of framework. To force aggregation in map() : - State preservation - iterations executed in a single JVM instance - Emit the whole result at once in a hook map (filename, file-contents): array = new associative_array(int,int); for each number in file-contents: array[number] += number^2 foreach number of array emit (number, number^2) 2011 IPM - HPC4 19
  • 20. Tuning Configuration parameters Map parameters io.sort.mb: Size of memory buffer for the map output io.sort.spill.percent: Percent of filling of the buffer before writing to disk. The writing is done in background, but the process stops if the buffer is full and the disk is not fast enough to let to flushing completed. -> Increase the buffer size if map functions are small -> Increase the (buffer size / spill percent ) to optimise fluidity – Possible if disk access can handle efficient parallel writes task.tracker.http.threads : Number of threads in map nodes serving reduce requests -> Increase if big cluster and large jobs 2011 IPM - HPC4 20
  • 21. Tuning Configuration parameters Merge/Sort parameters mapred.job.shuffle.input.buffer.percent = Percent of available RAM in reduce node for keeping the map outputs. Write on disk after reaching mapred.job.shuffle.merge.percent of memory or mapred.inmem.merge.threshold number of files -> Increase the memory usage if reduce tasks are small and number of mappers not much bigger than number of reducers 2011 IPM - HPC4 21
  • 22. Tuning Configuration parameters io.sort.factor Merge factor : Number of rounds for creating the merged file received from the map outputs – The number of input files for reduce will be nb_received_map / io.sort.factor -> Increase if high availability on memory on nodes. -> Take into account mapred.job.shuffle.input.buffer.percent which will reduce the available memory for the merge factor 2011 IPM - HPC4 22
  • 23. Scheduler The scheduler is based on jobs and not tasks. FIFO Each job will use all the available resources, penalising other users Fair scheduler Cluster is shared fairly between different users -> Pool of jobs per user -> Preemption if new jobs change the sharing balance and make a pool more resource intensive No affinity scoring calculated for tasks during scheduling sessions. The data affinity is performed after the task selection by the scheduler -> This is a serious handicap for a data grid! 2011 IPM - HPC4 23
  • 24. Weakness JobTracker tied to resources: -> leverage to a pool of available resources more difficult -> No dynamic scalability, resource planning should be fixed in advance -> More difficult to create SLA in shared grid Pull mode between task tracker and job tracker -> Peak, valley issue – idle periods between polling times. We can increase heartbeat frequency but risk of network saturation No possibility to pin resources (slave nodes) to a job 2011 IPM - HPC4 24
  • 25. Weakness Reduce phase can only begin after end of map phase M= nbr of maps, R = nbr of reduces M_slots = nbr of map slots, R_slots = nbr of reduce slots Tm = average duration of maps. Tr = average duration of reduces Total time = Tm* min (1, M/M_slots) + Tr * min (1, R/R_slots) If reduce phase can begin as soon as first map result avaiable: -> R will be bigger as there will be minimum as much reduces as outputs from maps -> R_new = max(M,R) = M most of the time Total time = max (Tn,Tm) * min (1, 2*M / (M_slots+R_slots)) 2011 IPM - HPC4 25
  • 26. ADDONS - HBASE Database – storage by on column -> Datawarehousing Based on Google BigTable Stores huge data Performant access elements by (row, column) All columns not necessary present for all lines! 2011 IPM - HPC4 26
  • 27. ADDONS - PIG LATIN High level data flow language based on Hadoop. Used for data analysis. fileA: User1, a User1, b User2, c Log = Load ‘fileA’ (user, value); Grp = GROUP Log by user; outputFile: Count=FOREACH Grp GENERATE group, User1, 2 COUNT(Log); User2, 1 STORE Count INTO ‘outputFile’; 2011 IPM - HPC4 27
  • 28. ADDONS - HIVE High level language –SQL like load, query CREATE TABLE T (a INT, b STRING)… LOAD DATA INPATH “file_name” INTO TABLE T; SELECT * FROM … Allow joins and more powerful features 2011 IPM - HPC4 28
  • 29. CONCLUSION • Hadoop is a fair middleware for distributed data processing • Restrictive usage: High volume data and <key,value> processing • No clear separation between process and resource management • But…very active project, evolution will bring improvements 2011 IPM - HPC4 29