SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
MapReduce
@shot6
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
Cloudera
                   Avro	
           Sqoop	
  
Desktop	
  



 Pig	
        Hive	
          HBase	
           Chukwa	
  


 Map                                       Zoo
                         HDFS	
  
Reduce	
                                  Keeper	
  

                   Core	
  
•                 MapReduce

     –    Mapper/Reducer
• 
MapReduce                      	
•              WordCount
• 
• 
     – Mapper/Reducer       Job   ⾏行行
     – InputFormat/OutputFormat         ⽅方
     – HDFS(FileSystem)
     –     Writable     ⽅方
WordCount	
•  Hadoop          Hello World
•                   API
   (org.apache.hadoop.mapreduce)
•  API
Grep	
•  grep
  – grepJob/sortJob 2
        ⾏行行
  – JobConf/Mapper/Reducer            ⽅方
  – Mapper RegexMapper     ⾏行行   <Text,
    Long> SequenceFileFormat
  – sortJob
  –                                ⼒力力
  – 
Grep
                  -	
•  JobConf
•  Mapper
•  Reducer
o.a.hadoop.mapred.JobConf	
• 
     –           mapred-default.xml
     –    conf/mapred-site.xml
     – XML    ⾝身
       DOM
     – ⾃自        ⽬目    ⼿手
     –  ⼦子
       •  JobConf child = new JobConf(   Conf,   jar
                                 );
mapred-site.xml	
<configuration>
<!–                 -->
<property>
 <key>mapred.job.tracker</key>
 <value>your-site:9001</value>
</property>
</configuration>
o.a.hadoop.mapred.Mapper	
•  Mapper
•  InputSplit    Mapper
•  MapTask/MapRunner
•  map(KEY, VALUE, COLLECTOR,
   REPORTER)
     – KEY:Map      VALUE:Map
     – COLLECTOR:
     – REPORTER:                     API
•         MapReduceBase
o.a.hadoop.mapred.MapTask	
•  Map
•  initiazlize              (Task Reducer    )
  –                                     ⽣生
  –               (o.a.h.mapred.TaskStatus.State)
       •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED,
          KILLED, COMMIT_PENDING, FAILED_UNCLEAN,
          KILLED_UNCLEAN
  – OutputCommiter ⽣生
       •  Task        ⼒力力            ⾏行行
       •                                         ⼒力力
          – mapred.work.output.dir
o.a.h.mapred.MapTask cont	
•  run        runOldMapper
•  JobClient
   InputSplit
•  RecordReader
o.a.h.mapred.MapTask cont2	
•  Reduce
  –              spill                   (*            )
       •  $mapred.local.dir/taskTracker/jobcache/$
          {taskid}/output/spill${spillNumber}.out
  – Reducer
                 ⼒力力
       •  Combiner        min.num.spills.for.combine
                          combiner
  –              RecordWriter                 ⼒力力
•  MapRunner
o.a.h.mapred.MapRunner	
•  MapRunnable
  – mapred.map.runner.class
  – Hadoop
    PipeMapRunner
  –               Map
    MultiThreadedMapRunner
o.a.h.mapred.MapRunner
                cont	
•  run(RecordReader, OutputCollector,
   Reporter)
     – RecordReader: InputFormat Split
         Reader(InputFormat/RecordReader
                           )
• 
     – RecordReader
     – 
                ⾝身
     – 
MapTask	
      MapRunner	
              Mapper	
         Record            Output
                                                         Reader	
          Collector	
       Input
      Split⽣生 	
  
                                                          	
                                                                                   	

                                                                             Spill
              & run	
                            createKey()                SpillThread
                                                 createValue()	
                    	
  

                                                 next(key, value)	

            EOF         	
     Map(key, value,
                                                                           Spill
                               outputCollector, reporter)
m(_ _)m
•  Mapper
     – JobConf
     – Mapper/MapRunner/MapTask
• 
     – Reducer
       •  Reducer   ⾏行行
       •  Reducer                 ⾏行行
     – InputFormat/RecordReader
o.a.h.mapred.Reducer	
•  Reducer
•  InputSplit      Mapper
•  ReduceTask/ReduceRunner
•  reduce(KEY, Iterator<VALUE>,
   COLLECTOR, REPORTER)
     – KEY:    Iterator<VALUE>:
     – COLLECTOR:
     – REPORTER:                       API
•         MapReduceBase
o.a.h.mapred.ReduceTask	
•  SHUFFLE
•  ReduceTask.ReduceCopier
  – fetchOutputs(            Merger.MergeQueue)
    •  Map                            x   mapred.reduce.parallel.copies

         – MapOutputCopier
    •  Map
          ⾏行行 LocalFSMerger
    •                  ⾏行行 InMemFSMergeThread
    •  GetMapEventsThread
         – Map
         – <     , MapOutputLocation(taskId, host, httpUrl)>
    •    ⼀一 TaskTracker                                         ⼯工
o.a.h.mapred.ReduceTask	
•  run(RecordReader, OutputCollector,
   Reporter)
•  SORT
  – Memory, disk                        ⽣生
    •  RowKeyValueItetator
  – Reducer ⽣生
  – RecordWriter ⽣生
  – ReduceValuesIterator       ⾏行行

Más contenido relacionado

La actualidad más candente

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Piotr Wikiel
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat SheetHortonworks
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingMitsuharu Hamba
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkThoughtWorks
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドTatsuya Sasaki
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with TerraformMario IC
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating languagejgrahamc
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Groupgethue
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourselfMarc Huang
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaLucas Renan
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamZheng Shao
 

La actualidad más candente (20)

Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
Apache beam — promyk nadziei data engineera na Toruń JUG 28.03.2018
 
SQL to Hive Cheat Sheet
SQL to Hive Cheat SheetSQL to Hive Cheat Sheet
SQL to Hive Cheat Sheet
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Hive commands
Hive commandsHive commands
Hive commands
 
Sql cheat sheet
Sql cheat sheetSql cheat sheet
Sql cheat sheet
 
Shark - Lab Assignment
Shark - Lab AssignmentShark - Lab Assignment
Shark - Lab Assignment
 
Hive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReadingHive vs Pig for HadoopSourceCodeReading
Hive vs Pig for HadoopSourceCodeReading
 
HadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software FrameworkHadoopThe Hadoop Java Software Framework
HadoopThe Hadoop Java Software Framework
 
Hadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッドHadoop導入事例 in クックパッド
Hadoop導入事例 in クックパッド
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Infrastructure as Code with Terraform
Infrastructure as Code with TerraformInfrastructure as Code with Terraform
Infrastructure as Code with Terraform
 
Lua: the world's most infuriating language
Lua: the world's most infuriating languageLua: the world's most infuriating language
Lua: the world's most infuriating language
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Build your own_map_by_yourself
Build your own_map_by_yourselfBuild your own_map_by_yourself
Build your own_map_by_yourself
 
REST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU SorocabaREST Active Resource - 7º Encontro do GURU Sorocaba
REST Active Resource - 7º Encontro do GURU Sorocaba
 
Hive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive TeamHive User Meeting March 2010 - Hive Team
Hive User Meeting March 2010 - Hive Team
 
Using spaces (Drupal)
Using spaces (Drupal)Using spaces (Drupal)
Using spaces (Drupal)
 
Advanced Sqoop
Advanced Sqoop Advanced Sqoop
Advanced Sqoop
 
What's New In JDK 10
What's New In JDK 10What's New In JDK 10
What's New In JDK 10
 

Similar a サンプルから見るMap reduceコード

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and PipesHanborq Inc.
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture EMC
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomynzhang
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on HadoopCarol McDonald
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionSubhas Kumar Ghosh
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & StorageIlayaraja P
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloudrhatr
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystemAndrew Brust
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce ParadigmDilip Reddy
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hivezahid-mian
 

Similar a サンプルから見るMap reduceコード (20)

Hadoop MapReduce Streaming and Pipes
Hadoop MapReduce  Streaming and PipesHadoop MapReduce  Streaming and Pipes
Hadoop MapReduce Streaming and Pipes
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
mapreduce ppt.ppt
mapreduce ppt.pptmapreduce ppt.ppt
mapreduce ppt.ppt
 
L3.fa14.ppt
L3.fa14.pptL3.fa14.ppt
L3.fa14.ppt
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Overview kdd2011
Hadoop Overview kdd2011Hadoop Overview kdd2011
Hadoop Overview kdd2011
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Introduction to Spark on Hadoop
Introduction to Spark on HadoopIntroduction to Spark on Hadoop
Introduction to Spark on Hadoop
 
Hadoop london
Hadoop londonHadoop london
Hadoop london
 
Hadoop first mr job - inverted index construction
Hadoop first mr job - inverted index constructionHadoop first mr job - inverted index construction
Hadoop first mr job - inverted index construction
 
Large Scale Data Processing & Storage
Large Scale Data Processing & StorageLarge Scale Data Processing & Storage
Large Scale Data Processing & Storage
 
Elephant in the cloud
Elephant in the cloudElephant in the cloud
Elephant in the cloud
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Brust hadoopecosystem
Brust hadoopecosystemBrust hadoopecosystem
Brust hadoopecosystem
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
MapReduce Paradigm
MapReduce ParadigmMapReduce Paradigm
MapReduce Paradigm
 
Hadoop M/R Pig Hive
Hadoop M/R Pig HiveHadoop M/R Pig Hive
Hadoop M/R Pig Hive
 

Más de Shinpei Ohtani

AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayShinpei Ohtani
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4Shinpei Ohtani
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkShinpei Ohtani
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Shinpei Ohtani
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallShinpei Ohtani
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦めShinpei Ohtani
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコードShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダShinpei Ohtani
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!Shinpei Ohtani
 

Más de Shinpei Ohtani (17)

Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
AWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API GatewayAWS Lambda and Amazon API Gateway
AWS Lambda and Amazon API Gateway
 
ECS for Docker Meetup #4
ECS for Docker Meetup #4ECS for Docker Meetup #4
ECS for Docker Meetup #4
 
JVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual TalkJVM的な何か@JVM Operation Casual Talk
JVM的な何か@JVM Operation Casual Talk
 
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
Amazon kinesisで広がるリアルタイムデータプロセッシングとその未来
 
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 FallAmazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
Amazon Elastic MapReduce@Hadoop Conference Japan 2011 Fall
 
プログラマブルクラウドの薦め
プログラマブルクラウドの薦めプログラマブルクラウドの薦め
プログラマブルクラウドの薦め
 
サンプルから見るMapReduceコード
サンプルから見るMapReduceコードサンプルから見るMapReduceコード
サンプルから見るMapReduceコード
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
Hadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダHadoopソースリーディング第1回アジェンダ
Hadoopソースリーディング第1回アジェンダ
 
はやわかりHadoop
はやわかりHadoopはやわかりHadoop
はやわかりHadoop
 
T2 Web Framework
T2 Web FrameworkT2 Web Framework
T2 Web Framework
 
T2 Hacks
T2 HacksT2 Hacks
T2 Hacks
 
T2 webframework
T2 webframeworkT2 webframework
T2 webframework
 
Struts2を始めよう!
Struts2を始めよう!Struts2を始めよう!
Struts2を始めよう!
 
Struts2 in a nutshell
Struts2 in a nutshellStruts2 in a nutshell
Struts2 in a nutshell
 
ASP.NET MVC 1.0
ASP.NET MVC 1.0ASP.NET MVC 1.0
ASP.NET MVC 1.0
 

Último

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 

Último (20)

Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 

サンプルから見るMap reduceコード

  • 2. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 3. Cloudera Avro   Sqoop   Desktop   Pig   Hive   HBase   Chukwa   Map Zoo HDFS   Reduce   Keeper   Core  
  • 4. •  MapReduce –  Mapper/Reducer • 
  • 5. MapReduce •  WordCount •  •  – Mapper/Reducer Job ⾏行行 – InputFormat/OutputFormat ⽅方 – HDFS(FileSystem) –  Writable ⽅方
  • 6. WordCount •  Hadoop Hello World •  API (org.apache.hadoop.mapreduce) •  API
  • 7. Grep •  grep – grepJob/sortJob 2 ⾏行行 – JobConf/Mapper/Reducer ⽅方 – Mapper RegexMapper ⾏行行 <Text, Long> SequenceFileFormat – sortJob –  ⼒力力 – 
  • 8. Grep - •  JobConf •  Mapper •  Reducer
  • 9. o.a.hadoop.mapred.JobConf •  –  mapred-default.xml –  conf/mapred-site.xml – XML ⾝身 DOM – ⾃自 ⽬目 ⼿手 –  ⼦子 •  JobConf child = new JobConf( Conf, jar );
  • 10. mapred-site.xml <configuration> <!– --> <property> <key>mapred.job.tracker</key> <value>your-site:9001</value> </property> </configuration>
  • 11. o.a.hadoop.mapred.Mapper •  Mapper •  InputSplit Mapper •  MapTask/MapRunner •  map(KEY, VALUE, COLLECTOR, REPORTER) – KEY:Map VALUE:Map – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 12. o.a.hadoop.mapred.MapTask •  Map •  initiazlize (Task Reducer ) –  ⽣生 –  (o.a.h.mapred.TaskStatus.State) •  RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED, COMMIT_PENDING, FAILED_UNCLEAN, KILLED_UNCLEAN – OutputCommiter ⽣生 •  Task ⼒力力 ⾏行行 •  ⼒力力 – mapred.work.output.dir
  • 13. o.a.h.mapred.MapTask cont •  run runOldMapper •  JobClient InputSplit •  RecordReader
  • 14. o.a.h.mapred.MapTask cont2 •  Reduce –  spill (* ) •  $mapred.local.dir/taskTracker/jobcache/$ {taskid}/output/spill${spillNumber}.out – Reducer ⼒力力 •  Combiner min.num.spills.for.combine combiner –  RecordWriter ⼒力力 •  MapRunner
  • 15. o.a.h.mapred.MapRunner •  MapRunnable – mapred.map.runner.class – Hadoop PipeMapRunner –  Map MultiThreadedMapRunner
  • 16. o.a.h.mapred.MapRunner cont •  run(RecordReader, OutputCollector, Reporter) – RecordReader: InputFormat Split Reader(InputFormat/RecordReader ) •  – RecordReader –  ⾝身 – 
  • 17. MapTask MapRunner Mapper Record Output Reader Collector Input Split⽣生   Spill & run createKey() SpillThread createValue()   next(key, value) EOF   Map(key, value, Spill outputCollector, reporter)
  • 19. •  Mapper – JobConf – Mapper/MapRunner/MapTask •  – Reducer •  Reducer ⾏行行 •  Reducer ⾏行行 – InputFormat/RecordReader
  • 20. o.a.h.mapred.Reducer •  Reducer •  InputSplit Mapper •  ReduceTask/ReduceRunner •  reduce(KEY, Iterator<VALUE>, COLLECTOR, REPORTER) – KEY: Iterator<VALUE>: – COLLECTOR: – REPORTER: API •  MapReduceBase
  • 21. o.a.h.mapred.ReduceTask •  SHUFFLE •  ReduceTask.ReduceCopier – fetchOutputs( Merger.MergeQueue) •  Map x mapred.reduce.parallel.copies – MapOutputCopier •  Map ⾏行行 LocalFSMerger •  ⾏行行 InMemFSMergeThread •  GetMapEventsThread – Map – < , MapOutputLocation(taskId, host, httpUrl)> •  ⼀一 TaskTracker ⼯工
  • 22. o.a.h.mapred.ReduceTask •  run(RecordReader, OutputCollector, Reporter) •  SORT – Memory, disk ⽣生 •  RowKeyValueItetator – Reducer ⽣生 – RecordWriter ⽣生 – ReduceValuesIterator ⾏行行