SlideShare una empresa de Scribd logo
1 de 13
Multi-level aggregation for
   Hadoop MapReduce




                                              Tsuyoshi Ozawa
                                                         NTT



      © 2012 NTT Software Innovation Center
Overview
• Background
  • Shuffle cost
• Approach
  • Multi-level aggregation
• Progress
  • Discussion on MAPREDUCE-4502
    • Design note is available on this JIRA
  • Prototyped to launch combiner per node




                    © 2012 NTT Software Innovation Center   2
MapReduce Architecture
• MapReduce
 • Programming model for large scale processing
 • 3 processing phases


   Map Phase                                             Reduce Phase
                   Shuffle Phase
      Map


                                                           Reduce
      Map


      Map
                                                           Reduce

      Map

                 © 2012 NTT Software Innovation Center                  3
Shuffle Phase
• What happens?
  • Reducers retrieve the outputs of Mappers
    • Mapper side read -> Reducer side write
• Problem
  • Can be bottleneck in jobs
    • Cause disk IO
    • Cause network IO
• Current Solution for aggregation processing
  • Combiner
    • Reduce IO by mapper-side aggregation
    • Apps: WordCount, N-gram, Co-occurrence of freq.

      WordCount Example:
                                                    Data is aggregated
      (apple, 1,1,1,1) => (apple, 4)
                                                     => Get smaller!
      (banana, 1,1) => (banana,2)

                       © 2012 NTT Software Innovation Center             4
Limitation of combiners
• Scope is limited within only one MapTask




               © 2012 NTT Software Innovation Center   5
Limitation of combiners (1)
   • Scope is limited within only one MapTask
          1. Many-core environment
                • Xeon E5 series : 16 threads /CPU => 16 outputs are generated
                • These files must be transferred through network




Aggregation
 Per map          Map             Map             Map              Map
                IFile IFile    IFile IFile    IFile IFile       IFile IFile
        Combiner           Combiner       Combiner        Combiner
                   IFile          IFile           IFile            IFile



 Still large…

                                          Reduce
                                 © 2012 NTT Software Innovation Center           6
Limitation of combiners(2)
   • Scope is limited within only one MapTask
           1. Many-core environment
              • Xeon E5 series : 16 threads /CPU => 16 outputs are generated
           2. Processing middle scale data(TB scale)
              • Processing Larger data needs more network bandwidth & disk IO



                                 All raw IFile must be sent                    10GbE
                         1GbE            over racks
Aggregation
 Per map


   Map          Map                                       1GbE          1GbE
IFile IFile   IFile IFile
                         Combiner
   IFile         IFile
                                                      Reducer
                                © 2012 NTT Software Innovation Center              7
Multi-level aggregation
   • Aggregating the result of maps per node /rack


                                    Smaller IFile is sent                             10GbE
                                       over racks
                         1GbE




  Map           Map                                       1GbE                 1GbE
IFile IFile   IFile IFile
                         Combiner
   IFile         IFile
                                                      Reducer

                     Aggregation                                 Aggregation
                      Per Node                                    Per Rack
                                © 2012 NTT Software Innovation Center                     8
Design Concept
• Minimize overhead
  • Adding new task type causes lots of overheads
  • Modified Mapper to aggregate at the end stage
• Keep the current MapReduce design
  • Fault tolerance against a few machine failures
  • Each aggregation must be in Containers for YARN
• Point of view from Hadoopers
  • Easy to switch ON/OFF the feature
    (ideally, add only one line)
      Public static void main(String[] argv) {
               …
               conf.setCombinerClass(Reducer.class);
               conf.enableNodeLevelAggregation();
               conf.enableRackLevelAggregation();
               …
      }
                     © 2012 NTT Software Innovation Center   9
Progress
• Prototype
  • Modified Mapper to call combiner function at the last
    stage


• Benchmark
  • Environment
    •   40 nodes
    •   Core 2 Duo 2.4GHz x2
    •   Memory 4GB
    •   1GbE
  • Configuration
    • Reducer : 1
  • Input
    • Texts generated by RandomTextWriter
  • Benchmark Program
    • In-mapper combined Word Count
                    © 2012 NTT Software Innovation Center   10
Prototype Benchmark – Job Time -



                                      ON               OFF




• About 2 times faster
• Shuffle cost is decreased to 50% at most.

               © 2012 NTT Software Innovation Center         11
TODOs
• Node level aggregation with FT
• Rack level aggregation with FT
  • The design note is available at MAPREDUCE-4502
    • Need to change umbilical protocol to support FT


• Support for High level languages
  • Pig /Hive support – when issuing “GROUP BY”
    statement
    • The other case may be switch off multi-level aggregation




                   © 2012 NTT Software Innovation Center         12
Summary
• Multi-level aggregation with combining the
  result of maps per node /rack
  • Node /rack-level combiner
  • Needs extended umbilical protocol for FT
• Benchmark with prototype version
  • 1.7 times faster
  • Can restrict the shuffle costs maximum 50%
• TODOs
  • Fault Tolerance
  • Pig /Hive support
• Special Thanks to have discussion with me,
  Chris, Karthik, Siddarsh, Robert, Bikas

• Any Feedbacks are welcome!
                   © 2012 NTT Software Innovation Center   13

Más contenido relacionado

La actualidad más candente

OSI Electronics Manufacturing Services Capabilities
OSI Electronics Manufacturing Services CapabilitiesOSI Electronics Manufacturing Services Capabilities
OSI Electronics Manufacturing Services Capabilities
PAWeyn
 
Focus Group Open Source 04.10.2011 Marco De Felice
Focus Group Open Source 04.10.2011 Marco De FeliceFocus Group Open Source 04.10.2011 Marco De Felice
Focus Group Open Source 04.10.2011 Marco De Felice
Roberto Galoppini
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
aidanshribman
 
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
Ics21 workshop   decoupling compute from memory, storage & io with omi - ...Ics21 workshop   decoupling compute from memory, storage & io with omi - ...
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
Vaibhav R
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick Bergman
AMD
 

La actualidad más candente (20)

Intel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 SeriesIntel® Xeon® Processor 5500 Series
Intel® Xeon® Processor 5500 Series
 
OSI Electronics Manufacturing Services Capabilities
OSI Electronics Manufacturing Services CapabilitiesOSI Electronics Manufacturing Services Capabilities
OSI Electronics Manufacturing Services Capabilities
 
Final apu13 phil-rogers-keynote-21
Final apu13 phil-rogers-keynote-21Final apu13 phil-rogers-keynote-21
Final apu13 phil-rogers-keynote-21
 
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
Unite Berlin 2018 - Book of the Dead Optimizing Performance for High End Cons...
 
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
CC-4006, Deliver Hardware Accelerated Applications Using RemoteFX vGPU with W...
 
MM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman HashimMM-4099, Adapting game content to the viewing environment, by Noman Hashim
MM-4099, Adapting game content to the viewing environment, by Noman Hashim
 
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard HoffnungPG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
PG-4037, Fast modal analysis with NX Nastran and GPUs, by Leonard Hoffnung
 
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...Computer Vision Powered by Heterogeneous System Architecture (HSA) by  Dr. Ha...
Computer Vision Powered by Heterogeneous System Architecture (HSA) by Dr. Ha...
 
Focus Group Open Source 04.10.2011 Marco De Felice
Focus Group Open Source 04.10.2011 Marco De FeliceFocus Group Open Source 04.10.2011 Marco De Felice
Focus Group Open Source 04.10.2011 Marco De Felice
 
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing WorldOMI - The Missing Piece of a Modular, Flexible and Composable Computing World
OMI - The Missing Piece of a Modular, Flexible and Composable Computing World
 
Wildfire IR and Mapping
Wildfire IR and MappingWildfire IR and Mapping
Wildfire IR and Mapping
 
SAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego CloudSAP Virtualization Week 2012 - The Lego Cloud
SAP Virtualization Week 2012 - The Lego Cloud
 
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
HC-4020, Enhancing OpenCL performance in AfterShot Pro with HSA, by Michael W...
 
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
Ics21 workshop   decoupling compute from memory, storage & io with omi - ...Ics21 workshop   decoupling compute from memory, storage & io with omi - ...
Ics21 workshop decoupling compute from memory, storage & io with omi - ...
 
FrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in FrostbiteFrameGraph: Extensible Rendering Architecture in Frostbite
FrameGraph: Extensible Rendering Architecture in Frostbite
 
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
4K Checkerboard in Battlefield 1 and Mass Effect Andromeda
 
Mantle for Developers
Mantle for DevelopersMantle for Developers
Mantle for Developers
 
Blue Gene Active Storage
Blue Gene Active StorageBlue Gene Active Storage
Blue Gene Active Storage
 
iDiff 2008 conference #01 IP-Racine : Cinema production infrastructure on 10G...
iDiff 2008 conference #01 IP-Racine : Cinema production infrastructure on 10G...iDiff 2008 conference #01 IP-Racine : Cinema production infrastructure on 10G...
iDiff 2008 conference #01 IP-Racine : Cinema production infrastructure on 10G...
 
AMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick BergmanAMD Analyst Day 2009: Rick Bergman
AMD Analyst Day 2009: Rick Bergman
 

Similar a Multilevel aggregation for Hadoop/MapReduce

Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
DataWorks Summit
 
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be SlowELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
Benjamin Zores
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Fisnik Kraja
 
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Gaurav Raina
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Igalia
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
karl.barnes
 

Similar a Multilevel aggregation for Hadoop/MapReduce (20)

Hadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job PerformanceHadoop Summit 2012 | Optimizing MapReduce Job Performance
Hadoop Summit 2012 | Optimizing MapReduce Job Performance
 
Optimizing MapReduce Job performance
Optimizing MapReduce Job performanceOptimizing MapReduce Job performance
Optimizing MapReduce Job performance
 
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be SlowELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
ELCE 2011 - BZ - Embedded Linux Optimization Techniques - How Not To Be Slow
 
z13: New Opportunities – if you dare!
z13: New Opportunities – if you dare!z13: New Opportunities – if you dare!
z13: New Opportunities – if you dare!
 
Hana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire ProjectHana Memory Scale out using the hecatonchire Project
Hana Memory Scale out using the hecatonchire Project
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
Software Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFVSoftware Stacks to enable SDN and NFV
Software Stacks to enable SDN and NFV
 
Using IT Equipment in Live Broadcast
Using IT Equipment in Live BroadcastUsing IT Equipment in Live Broadcast
Using IT Equipment in Live Broadcast
 
Don't just go IP - Go IT
Don't just go IP - Go ITDon't just go IP - Go IT
Don't just go IP - Go IT
 
Named Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-caseNamed Data Networking Operational Aspects - IoT as a Use-case
Named Data Networking Operational Aspects - IoT as a Use-case
 
Large customers want postgresql too !!
Large customers want postgresql too !!Large customers want postgresql too !!
Large customers want postgresql too !!
 
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...Using Many-Core Processors to Improve the Performance of Space Computing Plat...
Using Many-Core Processors to Improve the Performance of Space Computing Plat...
 
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
Presentation Thesis - Convolutional net on the Xeon Phi using SIMD - Gaurav R...
 
FPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centersFPGAs versus GPUs in Data centers
FPGAs versus GPUs in Data centers
 
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
Webcast: Reduce latency, improve analytics and maximize asset utilization in ...
 
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
 
Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012Lego Cloud SAP Virtualization Week 2012
Lego Cloud SAP Virtualization Week 2012
 
More Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit JunoMore Efficient Object Replication in OpenStack Summit Juno
More Efficient Object Replication in OpenStack Summit Juno
 
High Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and FutureHigh Performance Computing Infrastructure: Past, Present, and Future
High Performance Computing Infrastructure: Past, Present, and Future
 

Más de Tsuyoshi OZAWA (12)

YARN: a resource manager for analytic platform
YARN: a resource manager for analytic platformYARN: a resource manager for analytic platform
YARN: a resource manager for analytic platform
 
Dynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARNDynamic Resource Allocation Spark on YARN
Dynamic Resource Allocation Spark on YARN
 
Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014Taming YARN @ Hadoop Conference Japan 2014
Taming YARN @ Hadoop Conference Japan 2014
 
Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014Taming YARN @ Hadoop conference Japan 2014
Taming YARN @ Hadoop conference Japan 2014
 
Spark shark
Spark sharkSpark shark
Spark shark
 
Fluent logger-scala
Fluent logger-scalaFluent logger-scala
Fluent logger-scala
 
Memcached as a Service for CloudFoundry
Memcached as a Service for CloudFoundryMemcached as a Service for CloudFoundry
Memcached as a Service for CloudFoundry
 
First step for dynticks in FreeBSD
First step for dynticks in FreeBSDFirst step for dynticks in FreeBSD
First step for dynticks in FreeBSD
 
Memory Virtualization
Memory VirtualizationMemory Virtualization
Memory Virtualization
 
第二回Bitvisor読書会 前半 Intel-VT について
第二回Bitvisor読書会 前半 Intel-VT について第二回Bitvisor読書会 前半 Intel-VT について
第二回Bitvisor読書会 前半 Intel-VT について
 
第二回KVM読書会
第二回KVM読書会第二回KVM読書会
第二回KVM読書会
 
Linux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみようLinux KVM のコードを追いかけてみよう
Linux KVM のコードを追いかけてみよう
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Multilevel aggregation for Hadoop/MapReduce

  • 1. Multi-level aggregation for Hadoop MapReduce Tsuyoshi Ozawa NTT © 2012 NTT Software Innovation Center
  • 2. Overview • Background • Shuffle cost • Approach • Multi-level aggregation • Progress • Discussion on MAPREDUCE-4502 • Design note is available on this JIRA • Prototyped to launch combiner per node © 2012 NTT Software Innovation Center 2
  • 3. MapReduce Architecture • MapReduce • Programming model for large scale processing • 3 processing phases Map Phase Reduce Phase Shuffle Phase Map Reduce Map Map Reduce Map © 2012 NTT Software Innovation Center 3
  • 4. Shuffle Phase • What happens? • Reducers retrieve the outputs of Mappers • Mapper side read -> Reducer side write • Problem • Can be bottleneck in jobs • Cause disk IO • Cause network IO • Current Solution for aggregation processing • Combiner • Reduce IO by mapper-side aggregation • Apps: WordCount, N-gram, Co-occurrence of freq. WordCount Example: Data is aggregated (apple, 1,1,1,1) => (apple, 4) => Get smaller! (banana, 1,1) => (banana,2) © 2012 NTT Software Innovation Center 4
  • 5. Limitation of combiners • Scope is limited within only one MapTask © 2012 NTT Software Innovation Center 5
  • 6. Limitation of combiners (1) • Scope is limited within only one MapTask 1. Many-core environment • Xeon E5 series : 16 threads /CPU => 16 outputs are generated • These files must be transferred through network Aggregation Per map Map Map Map Map IFile IFile IFile IFile IFile IFile IFile IFile Combiner Combiner Combiner Combiner IFile IFile IFile IFile Still large… Reduce © 2012 NTT Software Innovation Center 6
  • 7. Limitation of combiners(2) • Scope is limited within only one MapTask 1. Many-core environment • Xeon E5 series : 16 threads /CPU => 16 outputs are generated 2. Processing middle scale data(TB scale) • Processing Larger data needs more network bandwidth & disk IO All raw IFile must be sent 10GbE 1GbE over racks Aggregation Per map Map Map 1GbE 1GbE IFile IFile IFile IFile Combiner IFile IFile Reducer © 2012 NTT Software Innovation Center 7
  • 8. Multi-level aggregation • Aggregating the result of maps per node /rack Smaller IFile is sent 10GbE over racks 1GbE Map Map 1GbE 1GbE IFile IFile IFile IFile Combiner IFile IFile Reducer Aggregation Aggregation Per Node Per Rack © 2012 NTT Software Innovation Center 8
  • 9. Design Concept • Minimize overhead • Adding new task type causes lots of overheads • Modified Mapper to aggregate at the end stage • Keep the current MapReduce design • Fault tolerance against a few machine failures • Each aggregation must be in Containers for YARN • Point of view from Hadoopers • Easy to switch ON/OFF the feature (ideally, add only one line) Public static void main(String[] argv) { … conf.setCombinerClass(Reducer.class); conf.enableNodeLevelAggregation(); conf.enableRackLevelAggregation(); … } © 2012 NTT Software Innovation Center 9
  • 10. Progress • Prototype • Modified Mapper to call combiner function at the last stage • Benchmark • Environment • 40 nodes • Core 2 Duo 2.4GHz x2 • Memory 4GB • 1GbE • Configuration • Reducer : 1 • Input • Texts generated by RandomTextWriter • Benchmark Program • In-mapper combined Word Count © 2012 NTT Software Innovation Center 10
  • 11. Prototype Benchmark – Job Time - ON OFF • About 2 times faster • Shuffle cost is decreased to 50% at most. © 2012 NTT Software Innovation Center 11
  • 12. TODOs • Node level aggregation with FT • Rack level aggregation with FT • The design note is available at MAPREDUCE-4502 • Need to change umbilical protocol to support FT • Support for High level languages • Pig /Hive support – when issuing “GROUP BY” statement • The other case may be switch off multi-level aggregation © 2012 NTT Software Innovation Center 12
  • 13. Summary • Multi-level aggregation with combining the result of maps per node /rack • Node /rack-level combiner • Needs extended umbilical protocol for FT • Benchmark with prototype version • 1.7 times faster • Can restrict the shuffle costs maximum 50% • TODOs • Fault Tolerance • Pig /Hive support • Special Thanks to have discussion with me, Chris, Karthik, Siddarsh, Robert, Bikas • Any Feedbacks are welcome! © 2012 NTT Software Innovation Center 13