SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
2011/7/10 @a_bicky
• Takeshi Arabiki
    ‣
    ‣   Twitter: @a_bicky

    ‣         : id:a_bicky


•
                                    R


•
                 http://d.hatena.ne.jp/a_bicky/
• MapReduce
•           MapReduce
• MapReduce
•           MapReduce
•
•
MapReduce
MapReduce
•                                         TB PB

    Facebook   20TB

•                                 ”   ”
                      ”     ”


    ‣

    ‣
    ‣                      etc.

    ↑ MPI                                  orz

                          MapReduce
MapReduce
• Google

•                         map               reduce
    >>> map(lambda x: x ** 2, range(1, 6))      map
    [1, 4, 9, 16, 25]
    >>> reduce(lambda a, b: a + b, range(1, 6)) reduce
    15

•                                                                          OK

•

                                                                     KVS

                                 MapReduce               Big Table

                                        Google File System
Hadoop

• Google
•            MapReduce            Hadoop   MapReduce



                Google                      Hadoop
                                  KVS                                    KVS
                                               Hadoop
    MapReduce         Big Table                                  HBase
                                              MapReduce
                                             Hadoop Distributed File System
        Google File System
                                                       (HDFS)


             Google                                     Hadoop
Hadoop MapReduce
                                    JobTracker
       JobClient




               assign map task       assign reduce task
HDFS                                                       HDFS

                   mapper        copy & sort

                                                 reducer
                   mapper
                                                 reducer
                   mapper


                    Map           Shuffle         Reduce
                   phase          phase           phase
MapReduce
WordCount
                            JobTracker
                JobClient




     HDFS




the end of money is

  the end of love
WordCount
                                              JobTracker
                JobClient




                            assign map task    assign reduce task
     HDFS             the end of love

                               mapper


the end of money is
                                                           reducer
  the end of love       the end of money is


                               mapper
WordCount
                                                  JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1
                              of	

 	

       1
                              money	

        1
     HDFS                     is	

  	

      1


                            mapper


the end of money is           the	

    	

   1
                              end	

    	

   1                reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                             Map
                            phase
WordCount
                                                     JobTracker
                JobClient

                              the	

 	

      1
                              end	

 	

      1                 end	

 	

    1
                              of	

 	

       1                 end	

 	

    1
                              money	

        1                 is	

   	

   1
                              is	

  	

      1                 love	

 	

   1
     HDFS                                                       money	

      1
                                                                of	

 	

     1
                            mapper                copy & sort   of	

 	

     1
                                                                the	

 	

    1
                                                                the	

 	

    1
the end of money is           the	

    	

   1
                              end	

    	

   1                      reducer
  the end of love             of	

     	

   1
                              love	

   	

   1


                            mapper


                                                    Shuffle
                                                    phase
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>
the end of money is
                                                    reducer
  the end of love



                            mapper
WordCount
                                     JobTracker
                JobClient




                                             end	

 	

    <1, 1>
     HDFS                                    is	

   	

   <1>
                                                                      HDFS
                                             love	

 	

   <1>
                            mapper           money	

      <1>
                                             of	

 	

     <1, 1>
                                             the	

 	

    <1, 1>   end	

 	

    2
                                                                    is	

   	

   1
the end of money is                                                 love	

 	

   1
                                                    reducer         money	

      1
  the end of love                                                   of	

 	

     2
                                                                    the	

 	

    2

                            mapper


                                                     Reduce
                                                      phase
MapReduce
※                            Java
mapred.pl
 1   #!/usr/bin/env perl                 23   package main;   #       MapReduce Framework
 2   use strict;                         24   my $phase = shift;
 3   use warnings;                       25   if ($phase eq 'map') { # map phase
 4                                       26     while (my $line = <STDIN>) {
 5   package MapReduce;                  27       chomp $line; #         map
 6   sub map {                     map   28       MapReduce::map($line);
 7     my $text = shift;                 29     }
 8     my @words = split /s/, $text;    30   } elsif ($phase eq 'reduce') { # reduce phase
 9     foreach my $word (@words) {       31     my ($prev_key, @values);
10       print $word, "t", 1, "n";     32     while (my $line = <STDIN>) {
11     }                                 33       chomp $line;
12   } #                                 34       my ($key, $value) = split /t/, $line;
13                                       35       if (!$prev_key || $key eq $prev_key) {
14   sub reduce {               reduce   36         push @values, $value;
15     my ($key, @values) = @_;          37       } else { #        (     ) reduce
16     my $cnt = 0;                      38         MapReduce::reduce($prev_key, @values);
17     foreach my $value (@values) {     39         @values = ($value);
18       $cnt += $value;                 40       }
19     }                                 41       $prev_key = $key;
20     print $key, "t", $cnt, "n";     42     } #             (     ) reduce
21   }                                   43     MapReduce::reduce($prev_key, @values);
22                                       44   }
MapReduce
    $ cat text.txt | ./mapred.pl map |   sort   | ./mapred.pl reduce




text.txt

 the end of money is
                         mapper                   reducer
   the end of love
MapReduce
   $ cat text.txt | ./mapred.pl map |                sort   | ./mapred.pl reduce

  6 sub map {                          the	

 	

       1
  7   my $text = shift;                end	

 	

       1
  8   my @words = split /s/, $text;   of	

 	

        1
  9   foreach my $word (@words) {      money	

         1
 10     print $word, "t", 1, "n";    is	

   	

      1
 11   }                                the	

 	

       1
 12 }                                  end	

 	

       1
                                       of	

 	

        1
                                       love	

 	

      1
the end of money is
                         mapper                               reducer
  the end of love


                           map


                           Map
                          phase
MapReduce
   $ cat text.txt | ./mapred.pl map |            sort        | ./mapred.pl reduce

                                   the	

 	

       1   end	

 	

       1
                                   end	

 	

       1   end	

 	

       1
                                   of	

 	

        1   is	

   	

      1
                                   money	

         1   love	

 	

      1
                                   is	

   	

      1   money	

         1
                                   the	

 	

       1   of	

 	

        1
                                   end	

 	

       1   of	

 	

        1
                                   of	

 	

        1   the	

 	

       1
                                   love	

 	

      1   the	

 	

       1
the end of money is                     copy & sort
                        mapper                                        reducer
  the end of love




                                           Shuffle
                                           phase
MapReduce
   $ cat text.txt | ./mapred.pl map |   sort            | ./mapred.pl reduce
                                                                   14 sub reduce {
                                                                   15   my ($key, @values) = @_;
                                          end	

 	

      <1, 1>
                                                                   16   my $cnt = 0;
                                          is	

   	

     <1>      17   foreach my $value (@values) {
                                          love	

 	

     <1>      18     $cnt += $value;
                                          money	

        <1>      19   }
                                          of	

 	

       <1, 1>   20   print $key, "t", $cnt, "n";
                                          the	

 	

      <1, 1>   21 }


                                                                                 end	

 	

    2
the end of money is                                                              is	

   	

   1
                        mapper                            reducer                love	

 	

   1
  the end of love                                                                money	

      1
                                                                                 of	

 	

     2
                                                                                 the	

 	

    2

                                                              reduce

                                                           Reduce
                                                            phase
MapReduce
MapReduce
• Split
• Map
• Combine
• Shuffle
• Reduce
Split
• HDFS               mapper

• HDFS             64MB 128MB


• mapper                        HDFS
                                       PC
Map
•   map

•
          HDFS
Combine
• Map                     reducer

    WordCount   Map



•

•
Shuffle
     • Map                                      Combine                                               reducer

                           reducer
              shuffle                 sort

mapper                       hash(the) % 2 = 0                                         reducer
                             hash(end) % 2 = 0
Map                          hash(is) % 2 = 0
                                 the	

   	

     1          end	

   	

   1          end	

   	

   1
the	

 	

      1                                     sort   end	

   	

   1   copy   end	

   	

   1
                                 end	

   	

     1                                                         end	

 	

    1
end	

 	

      1   partition    is	

    	

     1          is	

    	

   1          is	

    	

   1
                                                                                                            end	

 	

    1
of	

 	

       1                the	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            fuga	

 	

   1
money	

        1                end	

   	

     1          the	

   	

   1          the	

   	

   1
                                                                                                            hoge	

 	

   1
is	

   	

     1                                                                           sort & merge
                     hash(key) % 2                                                                          is	

   	

   1
the	

 	

      1
                                                                                                            the	

 	

    1
end	

 	

      1                                                               copy                        the	

 	

    1
of	

 	

       1                                                                      hoge	

 	

    1
love	

 	

     1             of	

 	

           1          love	

 	

    1          fuga	

 	

    1
                                                      sort   money	

       1
                partition money	

                1
                              of	

 	

           1          of	

 	

      1
                              love	

 	

         1          of	

 	

      1
               hash(of) % 2 = 1
               hash(money) % 2 = 1
               hash(love) % 2 = 1                                                      reducer
Reduce
• shuffle               reducer


•             reduce

•                       HDFS
MapReduce
MapReduce
•

    ‣   Word Count
    ‣   Grep
    ‣                etc.

•
MapReduce
•   MapReduce
       mapper → reducer → mapper → reducer
        HDFS                          MapReduce

•   WordCount
                        MapReduce
     MapReduce
MapReduce: Hadoop Streaming

•           Java               map            reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                       map
               Hadoop Streaming               mapper

                                               map           combine
      ”                ”

     Hadoop Streaming             WordCount    map
          #!/usr/bin/env perl
          use strict;
          use warnings;

          while (my $line = <STDIN>) {
              my @words = split / /, $line;
              foreach my $word (@words) {
                  print $word . "t" . 1 . "n";
              }
          }
MapReduce: Hadoop Streaming

•           Java               map        reduce

    Perl, Python, Ruby, JavaScript etc.

• Java MapReduce                                         map
               Hadoop Streaming           mapper

                                            map                  combine
      ”                ”
                                          http://hapyrus.com/
                                                         cf. http://www.slideshare.net/fujibee/tokyo-
                                                                     webmining12-8349942
MapReduce: DSL

• Pig
  ‣     Yahoo!              SQL            ”                  ”
  ‣
                                                                           http://pig.apache.org/

  ‣            MapReduce
• Hive
  ‣     Facebook                 SQL
  ‣                                                                        http://hive.apache.org/

  ‣                        SQL                                       Pig

• Cascading
  ‣     Pig                 Java                        API
  ‣     Java



                                 http://www.cascading.org/1.2/userguide/html/ch10.html
• MapReduce

•

•
• Java
•       SlideShare
    •      Map Reduce                                                                  http://
           www.slideshare.net/doryokujin/map-reduce-8349406
    •      Hadoop      http://www.slideshare.net/pfi/hadoop-2525724
    •      Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for-
           programmer-5202246


•       Web
    •      MapReduce - naoya                            http://d.hatena.ne.jp/naoya/
           20080511/1210506301
    •                                               Hadoop      http://www.atmarkit.co.jp/fjava/index/
           index_hadoop_tm.html
    •      Hadoop hBase                                                          1/2       CodeZine
           http://codezine.jp/article/detail/2448
•
    •                 ( ),           (   ),              ( ),           (   ),             (     ),
                (     ), Hadoop               ,       , 2011
    •      Tom White ( ),             (   ),          (     ), Hadoop,                     ,
           2010
    •      Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large
           Clusters, 6th OSDI, 2004

Más contenido relacionado

Destacado

Rデバッグあれこれ
RデバッグあれこれRデバッグあれこれ
RデバッグあれこれTakeshi Arabiki
 
Introduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisIntroduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisTakeshi Arabiki
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析Takeshi Arabiki
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門Takeshi Arabiki
 
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜Takeshi Arabiki
 

Destacado (6)

Rデバッグあれこれ
RデバッグあれこれRデバッグあれこれ
Rデバッグあれこれ
 
Introduction to Japanese Morphological Analysis
Introduction to Japanese Morphological AnalysisIntroduction to Japanese Morphological Analysis
Introduction to Japanese Morphological Analysis
 
RではじめるTwitter解析
RではじめるTwitter解析RではじめるTwitter解析
RではじめるTwitter解析
 
R による文書分類入門
R による文書分類入門R による文書分類入門
R による文書分類入門
 
HMM, MEMM, CRF メモ
HMM, MEMM, CRF メモHMM, MEMM, CRF メモ
HMM, MEMM, CRF メモ
 
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
クックパッド特売情報 における自然言語処理 〜固有表現抽出を利用した検索システム〜
 

Último

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Hadoop MapReduce Word Count Example

  • 2. • Takeshi Arabiki ‣ ‣ Twitter: @a_bicky ‣ : id:a_bicky • R • http://d.hatena.ne.jp/a_bicky/
  • 3. • MapReduce • MapReduce • MapReduce • MapReduce • •
  • 5. MapReduce • TB PB Facebook 20TB • ” ” ” ” ‣ ‣ ‣ etc. ↑ MPI orz MapReduce
  • 6. MapReduce • Google • map reduce >>> map(lambda x: x ** 2, range(1, 6)) map [1, 4, 9, 16, 25] >>> reduce(lambda a, b: a + b, range(1, 6)) reduce 15 • OK • KVS MapReduce Big Table Google File System
  • 7. Hadoop • Google • MapReduce Hadoop MapReduce Google Hadoop KVS KVS Hadoop MapReduce Big Table HBase MapReduce Hadoop Distributed File System Google File System (HDFS) Google Hadoop
  • 8. Hadoop MapReduce JobTracker JobClient assign map task assign reduce task HDFS HDFS mapper copy & sort reducer mapper reducer mapper Map Shuffle Reduce phase phase phase
  • 10. WordCount JobTracker JobClient HDFS the end of money is the end of love
  • 11. WordCount JobTracker JobClient assign map task assign reduce task HDFS the end of love mapper the end of money is reducer the end of love the end of money is mapper
  • 12. WordCount JobTracker JobClient the 1 end 1 of 1 money 1 HDFS is 1 mapper the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Map phase
  • 13. WordCount JobTracker JobClient the 1 end 1 end 1 of 1 end 1 money 1 is 1 is 1 love 1 HDFS money 1 of 1 mapper copy & sort of 1 the 1 the 1 the end of money is the 1 end 1 reducer the end of love of 1 love 1 mapper Shuffle phase
  • 14. WordCount JobTracker JobClient end <1, 1> HDFS is <1> love <1> mapper money <1> of <1, 1> the <1, 1> the end of money is reducer the end of love mapper
  • 15. WordCount JobTracker JobClient end <1, 1> HDFS is <1> HDFS love <1> mapper money <1> of <1, 1> the <1, 1> end 2 is 1 the end of money is love 1 reducer money 1 the end of love of 2 the 2 mapper Reduce phase
  • 16. MapReduce ※ Java mapred.pl 1 #!/usr/bin/env perl 23 package main; # MapReduce Framework 2 use strict; 24 my $phase = shift; 3 use warnings; 25 if ($phase eq 'map') { # map phase 4 26 while (my $line = <STDIN>) { 5 package MapReduce; 27 chomp $line; # map 6 sub map { map 28 MapReduce::map($line); 7 my $text = shift; 29 } 8 my @words = split /s/, $text; 30 } elsif ($phase eq 'reduce') { # reduce phase 9 foreach my $word (@words) { 31 my ($prev_key, @values); 10 print $word, "t", 1, "n"; 32 while (my $line = <STDIN>) { 11 } 33 chomp $line; 12 } # 34 my ($key, $value) = split /t/, $line; 13 35 if (!$prev_key || $key eq $prev_key) { 14 sub reduce { reduce 36 push @values, $value; 15 my ($key, @values) = @_; 37 } else { # ( ) reduce 16 my $cnt = 0; 38 MapReduce::reduce($prev_key, @values); 17 foreach my $value (@values) { 39 @values = ($value); 18 $cnt += $value; 40 } 19 } 41 $prev_key = $key; 20 print $key, "t", $cnt, "n"; 42 } # ( ) reduce 21 } 43 MapReduce::reduce($prev_key, @values); 22 44 }
  • 17. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce text.txt the end of money is mapper reducer the end of love
  • 18. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 6 sub map { the 1 7 my $text = shift; end 1 8 my @words = split /s/, $text; of 1 9 foreach my $word (@words) { money 1 10 print $word, "t", 1, "n"; is 1 11 } the 1 12 } end 1 of 1 love 1 the end of money is mapper reducer the end of love map Map phase
  • 19. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce the 1 end 1 end 1 end 1 of 1 is 1 money 1 love 1 is 1 money 1 the 1 of 1 end 1 of 1 of 1 the 1 love 1 the 1 the end of money is copy & sort mapper reducer the end of love Shuffle phase
  • 20. MapReduce $ cat text.txt | ./mapred.pl map | sort | ./mapred.pl reduce 14 sub reduce { 15 my ($key, @values) = @_; end <1, 1> 16 my $cnt = 0; is <1> 17 foreach my $value (@values) { love <1> 18 $cnt += $value; money <1> 19 } of <1, 1> 20 print $key, "t", $cnt, "n"; the <1, 1> 21 } end 2 the end of money is is 1 mapper reducer love 1 the end of love money 1 of 2 the 2 reduce Reduce phase
  • 22. MapReduce • Split • Map • Combine • Shuffle • Reduce
  • 23. Split • HDFS mapper • HDFS 64MB 128MB • mapper HDFS PC
  • 24. Map • map • HDFS
  • 25. Combine • Map reducer WordCount Map • •
  • 26. Shuffle • Map Combine reducer reducer shuffle sort mapper hash(the) % 2 = 0 reducer hash(end) % 2 = 0 Map hash(is) % 2 = 0 the 1 end 1 end 1 the 1 sort end 1 copy end 1 end 1 end 1 end 1 partition is 1 is 1 is 1 end 1 of 1 the 1 the 1 the 1 fuga 1 money 1 end 1 the 1 the 1 hoge 1 is 1 sort & merge hash(key) % 2 is 1 the 1 the 1 end 1 copy the 1 of 1 hoge 1 love 1 of 1 love 1 fuga 1 sort money 1 partition money 1 of 1 of 1 love 1 of 1 hash(of) % 2 = 1 hash(money) % 2 = 1 hash(love) % 2 = 1 reducer
  • 27. Reduce • shuffle reducer • reduce • HDFS
  • 29. MapReduce • ‣ Word Count ‣ Grep ‣ etc. •
  • 30. MapReduce • MapReduce mapper → reducer → mapper → reducer HDFS MapReduce • WordCount MapReduce MapReduce
  • 31. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” Hadoop Streaming WordCount map #!/usr/bin/env perl use strict; use warnings; while (my $line = <STDIN>) { my @words = split / /, $line; foreach my $word (@words) { print $word . "t" . 1 . "n"; } }
  • 32. MapReduce: Hadoop Streaming • Java map reduce Perl, Python, Ruby, JavaScript etc. • Java MapReduce map Hadoop Streaming mapper map combine ” ” http://hapyrus.com/ cf. http://www.slideshare.net/fujibee/tokyo- webmining12-8349942
  • 33. MapReduce: DSL • Pig ‣ Yahoo! SQL ” ” ‣ http://pig.apache.org/ ‣ MapReduce • Hive ‣ Facebook SQL ‣ http://hive.apache.org/ ‣ SQL Pig • Cascading ‣ Pig Java API ‣ Java http://www.cascading.org/1.2/userguide/html/ch10.html
  • 34.
  • 36.
  • 37. SlideShare • Map Reduce http:// www.slideshare.net/doryokujin/map-reduce-8349406 • Hadoop http://www.slideshare.net/pfi/hadoop-2525724 • Hadoop for programmer http://www.slideshare.net/shiumachi/hadoop-for- programmer-5202246 • Web • MapReduce - naoya http://d.hatena.ne.jp/naoya/ 20080511/1210506301 • Hadoop http://www.atmarkit.co.jp/fjava/index/ index_hadoop_tm.html • Hadoop hBase 1/2 CodeZine http://codezine.jp/article/detail/2448 • • ( ), ( ), ( ), ( ), ( ), ( ), Hadoop , , 2011 • Tom White ( ), ( ), ( ), Hadoop, , 2010 • Jeffrey Dean and Sanjay Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, 6th OSDI, 2004