SlideShare una empresa de Scribd logo
1 de 61
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓Google Map/Reduce             GFS



✓Java
 - Apache
 - http://hadoop.apache.org/
✓

✓
✓

- HDD
-
✓
✓


✓
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓

✓Map   Reduce
✓
- grep
-
-
✓
-

-
-
PV UU
30+2
5+1
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
•JobTracker TaskTracker Map/Reduce
•NameNode DataNode HDFS
•JobTracker NameNode
•SecondaryNameNode NameNode

•TaskTracker DataNode

•JobTracker/NameNode
•TaskTracker/DataNode
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
-   MapTask
-   ReduceTask
-   JobClient    JobTracker Job
-   HDFS
-   Map/Reduce
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
        //Map
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
       //Reduce
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}
public class WordCount {
    public static class Map extends MapReduceBase implements Mapper<LongWritable,
Text, Text, IntWritable> {
        //Map
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text,
IntWritable, Text, IntWritable> {
       //Reduce
    }

    public static void main(String[] args) throws Exception {
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}
✓Hadoop Streaming
✓LibHDFS
✓Hadoop Pipes
✓Amazon Elastic MapReduce
✓Hadoop      hadoop-streaming.jar
✓                 Map/Reduce


 - C Perl Ruby Python
     Map/Reduce
 -      Map/Reduce
✓Map:cat / Reduce:wc
✓HDFS
$ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/
$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper cat -reducer "wc -l"

09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0%
09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0%
09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100%
09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
   8842
✓python
   http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python


$ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/
output -mapper "python map.py" -reducer "python reduce.py"

09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See
JobConf(Class) or JobConf#setJar(String).
09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4
09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0%
09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0%
09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100%
09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output

$ hadoop dfs -cat /dfs/test/output/*
via 1942
the 1476
to    1394
in    819
a     816
cutting)   740
✓C HDFS
http://wiki.apache.org/hadoop/LibHDFS
✓C C++ HDFS                                Map/Reduce
                      API
http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/
package-summary.html
✓Amazon EC2                                MapReduce
 http://aws.amazon.com/elasticmapreduce/
Hadoop

Map/Reduce

Hadoop

Hadoop(Map/Reduce)
✓
- NameNode/TaskTracker
-
✓HDFS
- HDFS    DataNode
✓JMX              metrics
 - Hadoop

 - Hadoop Java                    jmxremote
 - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/

✓metrics
 - DFS / MapReduce / JVM / RPC
 - Map/Reduce                         Task


  (Keyword Tracker
✓JobTracker
 -              (http://jobtracker:50030/jobtracker.jsp)

 - Map/Reduce
✓      wiki
 - http://wiki.apache.org/hadoop/FAQ
✓Yahoo
 - http://www.docstoc.com/docs/3766688/Hadoop-
  Map-Reduce-Tuning-and-Debugging-Arun-C-
  Murthy-acmurthy
✓TaskTracker                         Map
 Reduce
 -               hadoop-site.xml)


     mapred.tasktracker.reduce.tasks.maximum


 -       TaskTracker
 - TaskTracker
     4   8GB
✓Map→Reduce


 -   io.sort.mb
 -   io.sort.factor
 -   io.sort.record.parcent
 -   io.sort.spill.parcent
✓Reduce


 - mapred.reduce.parallel.copies
✓Map

- mapred.compress.map.output (true )
✓Map→Reduce
                         HDFS


 - fs.inmemory.size.mb
✓Reduce              HDFS


✓        HDFS
 - org.apache.hadoop.mapred.lib.NullOutputFormat
✓Reduce
✓     Reduce


 -

 -
✓MRUnit
 - MapTask/ReduceTask
 - cloudera               Hadoop
 - http://www.cloudera.com/hadoop-mrunit

✓JMock
 -          Mock
 - http://www.jmock.org/

✓Hadoop

 -                                         (´ ω `)
✓Hudson Hadoop
 - zero conf Hudson               Hadoop


 - Hudson                Hadoop


 - http://d.hatena.ne.jp/kkawa/20090315/p1
 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/
  instantly_turni.html
✓

✓Hadoop Streaming          Java


✓Letʼs Try Hadoop Programing!
ちょっとHadoopについて語ってみるか(仮題)
ちょっとHadoopについて語ってみるか(仮題)

Más contenido relacionado

La actualidad más candente

MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Jeremy Walsh
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesDataWorks Summit/Hadoop Summit
 
Database High Availability Using SHADOW Systems
Database High Availability Using SHADOW SystemsDatabase High Availability Using SHADOW Systems
Database High Availability Using SHADOW SystemsJaemyung Kim
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelTakahiro Inoue
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebookragho
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Charles Givre
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configurationSubhas Kumar Ghosh
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)Bopyo Hong
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Yahoo Developer Network
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...Michael Stack
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117exsuns
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215exsuns
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem GetInData
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBaseAmal Abid
 

La actualidad más candente (20)

Prashant de-ny-project-s1
Prashant de-ny-project-s1Prashant de-ny-project-s1
Prashant de-ny-project-s1
 
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLabMapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
MapReduce - Basics | Big Data Hadoop Spark Tutorial | CloudxLab
 
Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14Introduction to HBase - Phoenix HUG 5/14
Introduction to HBase - Phoenix HUG 5/14
 
GoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with DependenciesGoodFit: Multi-Resource Packing of Tasks with Dependencies
GoodFit: Multi-Resource Packing of Tasks with Dependencies
 
Failing gracefully
Failing gracefullyFailing gracefully
Failing gracefully
 
Database High Availability Using SHADOW Systems
Database High Availability Using SHADOW SystemsDatabase High Availability Using SHADOW Systems
Database High Availability Using SHADOW Systems
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
Pig workshop
Pig workshopPig workshop
Pig workshop
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2Data Exploration with Apache Drill: Day 2
Data Exploration with Apache Drill: Day 2
 
02 Hadoop deployment and configuration
02 Hadoop deployment and configuration02 Hadoop deployment and configuration
02 Hadoop deployment and configuration
 
Hive data migration (export/import)
Hive data migration (export/import)Hive data migration (export/import)
Hive data migration (export/import)
 
Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116Jan 2013 HUG: Dist cpv2 for hug 20130116
Jan 2013 HUG: Dist cpv2 for hug 20130116
 
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
hbaseconasia2019 Spatio temporal Data Management based on Ali-HBase Ganos and...
 
Hadoop 20111117
Hadoop 20111117Hadoop 20111117
Hadoop 20111117
 
Hadoop 20111215
Hadoop 20111215Hadoop 20111215
Hadoop 20111215
 
Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem Introduction to Hadoop Ecosystem
Introduction to Hadoop Ecosystem
 
MapReduce@DirectI
MapReduce@DirectIMapReduce@DirectI
MapReduce@DirectI
 
Gur1009
Gur1009Gur1009
Gur1009
 
TP2 Big Data HBase
TP2 Big Data HBaseTP2 Big Data HBase
TP2 Big Data HBase
 

Similar a ちょっとHadoopについて語ってみるか(仮題)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learnedtcurdt
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainYahoo Developer Network
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍Tae Young Lee
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewMadhur Nawandar
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauCodemotion
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slidesryancox
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapakapa rohit
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...CloudxLab
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewDan Morrill
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterEdureka!
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGMatthew McCullough
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceobdit
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...spinningmatt
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEnis Afgan
 

Similar a ちょっとHadoopについて語ってみるか(仮題) (20)

Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit JainApache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
Apache Hadoop India Summit 2011 talk "Hive Evolution" by Namit Jain
 
20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍20141111 파이썬으로 Hadoop MR프로그래밍
20141111 파이썬으로 Hadoop MR프로그래밍
 
Clogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overviewClogeny Hadoop ecosystem - an overview
Clogeny Hadoop ecosystem - an overview
 
Bd class 2 complete
Bd class 2 completeBd class 2 complete
Bd class 2 complete
 
How to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin LeauHow to develop Big Data Pipelines for Hadoop, by Costin Leau
How to develop Big Data Pipelines for Hadoop, by Costin Leau
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Hadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapaHadoop Interview Questions and Answers by rohit kapa
Hadoop Interview Questions and Answers by rohit kapa
 
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
Introduction to MapReduce - Hadoop Streaming | Big Data Hadoop Spark Tutorial...
 
Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Learn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node ClusterLearn to setup a Hadoop Multi Node Cluster
Learn to setup a Hadoop Multi Node Cluster
 
Cascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUGCascading Through Hadoop for the Boulder JUG
Cascading Through Hadoop for the Boulder JUG
 
Getting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduceGetting started with Hadoop, Hive, and Elastic MapReduce
Getting started with Hadoop, Hive, and Elastic MapReduce
 
06 pig-01-intro
06 pig-01-intro06 pig-01-intro
06 pig-01-intro
 
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
Boston Apache Spark User Group (the Spahk group) - Introduction to Spark - 15...
 
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible ComputationEndofday: A Container Workflow Engine for Scalable, Reproducible Computation
Endofday: A Container Workflow Engine for Scalable, Reproducible Computation
 

Más de moai kids

中国最新ニュースアプリ事情
中国最新ニュースアプリ事情中国最新ニュースアプリ事情
中国最新ニュースアプリ事情moai kids
 
FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係moai kids
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについてmoai kids
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4moai kids
 
Programming Hive Reading #3
Programming Hive Reading #3Programming Hive Reading #3
Programming Hive Reading #3moai kids
 
"Programming Hive" Reading #1
"Programming Hive" Reading #1"Programming Hive" Reading #1
"Programming Hive" Reading #1moai kids
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDBmoai kids
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDBmoai kids
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたmoai kids
 
HBase本輪読会資料(11章)
HBase本輪読会資料(11章)HBase本輪読会資料(11章)
HBase本輪読会資料(11章)moai kids
 
snappyについて
snappyについてsnappyについて
snappyについてmoai kids
 
第四回月次セミナー(公開版)
第四回月次セミナー(公開版)第四回月次セミナー(公開版)
第四回月次セミナー(公開版)moai kids
 
第三回月次セミナー(公開版)
第三回月次セミナー(公開版)第三回月次セミナー(公開版)
第三回月次セミナー(公開版)moai kids
 
Pythonで自然言語処理
Pythonで自然言語処理Pythonで自然言語処理
Pythonで自然言語処理moai kids
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークmoai kids
 
Yammer試用レポート(公開版)
Yammer試用レポート(公開版)Yammer試用レポート(公開版)
Yammer試用レポート(公開版)moai kids
 
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)moai kids
 
中国と私(仮題)
中国と私(仮題)中国と私(仮題)
中国と私(仮題)moai kids
 
不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料moai kids
 
n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてmoai kids
 

Más de moai kids (20)

中国最新ニュースアプリ事情
中国最新ニュースアプリ事情中国最新ニュースアプリ事情
中国最新ニュースアプリ事情
 
FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係FluentdとRedshiftの素敵な関係
FluentdとRedshiftの素敵な関係
 
Twitterのsnowflakeについて
TwitterのsnowflakeについてTwitterのsnowflakeについて
Twitterのsnowflakeについて
 
Programming Hive Reading #4
Programming Hive Reading #4Programming Hive Reading #4
Programming Hive Reading #4
 
Programming Hive Reading #3
Programming Hive Reading #3Programming Hive Reading #3
Programming Hive Reading #3
 
"Programming Hive" Reading #1
"Programming Hive" Reading #1"Programming Hive" Reading #1
"Programming Hive" Reading #1
 
Casual Compression on MongoDB
Casual Compression on MongoDBCasual Compression on MongoDB
Casual Compression on MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Hadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきましたHadoop Conference Japan 2011 Fallに行ってきました
Hadoop Conference Japan 2011 Fallに行ってきました
 
HBase本輪読会資料(11章)
HBase本輪読会資料(11章)HBase本輪読会資料(11章)
HBase本輪読会資料(11章)
 
snappyについて
snappyについてsnappyについて
snappyについて
 
第四回月次セミナー(公開版)
第四回月次セミナー(公開版)第四回月次セミナー(公開版)
第四回月次セミナー(公開版)
 
第三回月次セミナー(公開版)
第三回月次セミナー(公開版)第三回月次セミナー(公開版)
第三回月次セミナー(公開版)
 
Pythonで自然言語処理
Pythonで自然言語処理Pythonで自然言語処理
Pythonで自然言語処理
 
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマークHandlerSocket plugin Client for Javaとそれを用いたベンチマーク
HandlerSocket plugin Client for Javaとそれを用いたベンチマーク
 
Yammer試用レポート(公開版)
Yammer試用レポート(公開版)Yammer試用レポート(公開版)
Yammer試用レポート(公開版)
 
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
掲示板時間軸コーパスを用いたワードトレンド解析(公開版)
 
中国と私(仮題)
中国と私(仮題)中国と私(仮題)
中国と私(仮題)
 
不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料不自然言語処理コンテストLT資料
不自然言語処理コンテストLT資料
 
n-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法についてn-gramコーパスを用いた類義語自動獲得手法について
n-gramコーパスを用いた類義語自動獲得手法について
 

Último

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Último (20)

A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

ちょっとHadoopについて語ってみるか(仮題)

  • 1.
  • 2.
  • 4.
  • 5. ✓Google Map/Reduce GFS ✓Java - Apache - http://hadoop.apache.org/
  • 8.
  • 9.
  • 11.
  • 12. ✓ ✓Map Reduce
  • 13.
  • 14.
  • 15.
  • 16.
  • 17.
  • 18.
  • 19.
  • 22. 5+1
  • 24.
  • 26. •JobTracker NameNode •SecondaryNameNode NameNode •TaskTracker DataNode •JobTracker/NameNode •TaskTracker/DataNode
  • 27.
  • 28.
  • 29.
  • 30.
  • 31.
  • 32.
  • 34. ✓ - MapTask - ReduceTask - JobClient JobTracker Job - HDFS - Map/Reduce
  • 35. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 36. public class WordCount { public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { //Map } public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { //Reduce } public static void main(String[] args) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(args[0])); FileOutputFormat.setOutputPath(conf, new Path(args[1])); JobClient.runJob(conf); } }
  • 38. ✓Hadoop hadoop-streaming.jar ✓ Map/Reduce - C Perl Ruby Python Map/Reduce - Map/Reduce
  • 39. ✓Map:cat / Reduce:wc ✓HDFS $ hadoop dfs -copyFromLocal /usr/local/hadoop/*.txt /dfs/test/input/ $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper cat -reducer "wc -l" 09/09/26 17:00:29 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:00:30 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:00:30 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:31 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:00:33 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:00:42 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:00:44 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:00:44 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* 8842
  • 40. ✓python http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python $ hadoop jar /usr/local/hadoop/contrib/streaming/hadoop-0.20.1-streaming.jar -input /dfs/test/input/ -output /dfs/test/ output -mapper "python map.py" -reducer "python reduce.py" 09/09/26 17:29:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 09/09/26 17:29:26 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:26 INFO mapred.FileInputFormat: Total input paths to process : 4 09/09/26 17:29:30 INFO streaming.StreamJob: map 0% reduce 0% 09/09/26 17:29:33 INFO streaming.StreamJob: map 100% reduce 0% 09/09/26 17:29:35 INFO streaming.StreamJob: map 100% reduce 100% 09/09/26 17:29:35 INFO streaming.StreamJob: Output: /dfs/test/output $ hadoop dfs -cat /dfs/test/output/* via 1942 the 1476 to 1394 in 819 a 816 cutting) 740
  • 42. ✓C C++ HDFS Map/Reduce API http://hadoop.apache.org/common/docs/r0.20.1/api/org/apache/hadoop/mapred/pipes/ package-summary.html
  • 43. ✓Amazon EC2 MapReduce http://aws.amazon.com/elasticmapreduce/
  • 46. ✓JMX metrics - Hadoop - Hadoop Java jmxremote - http://www.cloudera.com/blog/2009/03/12/hadoop-metrics/ ✓metrics - DFS / MapReduce / JVM / RPC - Map/Reduce Task (Keyword Tracker
  • 47. ✓JobTracker - (http://jobtracker:50030/jobtracker.jsp) - Map/Reduce
  • 48. wiki - http://wiki.apache.org/hadoop/FAQ ✓Yahoo - http://www.docstoc.com/docs/3766688/Hadoop- Map-Reduce-Tuning-and-Debugging-Arun-C- Murthy-acmurthy
  • 49. ✓TaskTracker Map Reduce - hadoop-site.xml) mapred.tasktracker.reduce.tasks.maximum - TaskTracker - TaskTracker 4 8GB
  • 50. ✓Map→Reduce - io.sort.mb - io.sort.factor - io.sort.record.parcent - io.sort.spill.parcent
  • 53. ✓Map→Reduce HDFS - fs.inmemory.size.mb
  • 54. ✓Reduce HDFS ✓ HDFS - org.apache.hadoop.mapred.lib.NullOutputFormat
  • 55. ✓Reduce ✓ Reduce - -
  • 56. ✓MRUnit - MapTask/ReduceTask - cloudera Hadoop - http://www.cloudera.com/hadoop-mrunit ✓JMock - Mock - http://www.jmock.org/ ✓Hadoop - (´ ω `)
  • 57. ✓Hudson Hadoop - zero conf Hudson Hadoop - Hudson Hadoop - http://d.hatena.ne.jp/kkawa/20090315/p1 - http://weblogs.java.net/blog/kohsuke/archive/2009/03/ instantly_turni.html
  • 58.
  • 59. ✓ ✓Hadoop Streaming Java ✓Letʼs Try Hadoop Programing!