SlideShare una empresa de Scribd logo
1 de 10
Descargar para leer sin conexión
MapReduce by JRuby and DSL
     Hadoop Papyrus

        2010/8/28
      JRubyKaigi 2010
 藤川幸一 FUJIKAWA Koichi @fujibee
What’s Hadoop?
• FW of parallel distributed processing
  framework for BIG data
• OSS clone of Google MapReduce
• For over terabyte scale data processing
  – Took over 2000hr if you read the data of
    400TB(Web scale data) by standard HDD, reading
    50MB/s
  – Need the distributed file system and parallel
    processing framework!
Hadoop Papyrus
• My own OSS project
  – Hosted by github http://github.com/fujibee/hadoop-papyrus
• Framework for running Hadoop jobs by (J)Ruby
  DSL description
  – Originally Hadoop jobs written by Java
  – Just few lines in Ruby same as the very complex
    procedure if using Java!
• Supported by IPA MITOH 2009 project
  (Government support)
• Can run by Hudson (CI tool) plug-in
Step.1
Not Java, But we can write in Ruby!
Step.2
Simple description by DSL in Ruby

       Map   Reduce      Job
                      Description




                                    Log Analysis
                                        DSL
Step.3
Enable the Hadoop server environment
easily by Hudson
package org.apache.hadoop.examples;            Javaの場合
import java.io.IOException;
import java.util.StringTokenizer;
                                                                                70 lines are needed in Java..
import org.apache.hadoop.conf.Configuration;
                                                                                Hadoop Papyrus is only needed 10 lines!
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper; IntSumReducer extends
                              public static class
import org.apache.hadoop.mapreduce.Reducer;
                              Reducer<Text, IntWritable, Text, IntWritable> {
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
                              private IntWritable result = new IntWritable();
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
                               public void reduce(Text key, Iterable<IntWritable> values,
                               Context context) throws IOException, InterruptedException {
public class WordCount {       int sum = 0;
                               for (IntWritable val : values) {
                               sum += val.get();
public static class TokenizerMapper extends
                               }
Mapper<Object, Text, Text, IntWritable> {
                               result.set(sum);
                                                                                                                   Hadoop Papyrus
                                 context.write(key, result);
                                 }                                                             dsl 'LogAnalysis‘
private final static IntWritable one = new IntWritable(1);
                                 }
private Text word = new Text();

                                public static void main(String[] args) throws Exception {
public void map(Object key, Text value, Context context) Configuration();
                                Configuration conf = new
                                                                                               from ‘test/in‘
throws IOException, InterruptedException {
                                String[] otherArgs = new GenericOptionsParser(conf, args)
StringTokenizer itr = new StringTokenizer(value.toString());
                                .getRemainingArgs();
                                                                                               to ‘test/out’
while (itr.hasMoreTokens()) { if (otherArgs.length != 2) {
word.set(itr.nextToken());      System.err.println("Usage: wordcount <in> <out>");
context.write(word, one);       System.exit(2);
}
}
                                }                                                              pattern /¥[¥[([^|¥]:]+)[^¥]:]*¥]¥]/
                                Job job = new Job(conf, "word count");
}                               job.setJarByClass(WordCount.class);
                                job.setMapperClass(TokenizerMapper.class);
                                                                                               column_name :link
                                job.setCombinerClass(IntSumReducer.class);
                                job.setReducerClass(IntSumReducer.class);
                                job.setOutputKeyClass(Text.class);
                                job.setOutputValueClass(IntWritable.class);                    topic "link num", :label => 'n' do
                                FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
                                FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));      count_uniq column[:link]
                                System.exit(job.waitForCompletion(true) ? 0 : 1);
                                }
                                }
                                                                                               end
Hadoop Papyrus Details
• Invoke Ruby script using JRuby in the process
  of Map/Reduce running on Java
Hadoop Papyrus Details (con’t)
• Additionally, we can write the DSL script you want to process (log analysis,
  etc). Papyrus can choose the different process on each phase (Map or
  Reduce, job initialization). So we just need the only one script.
ありがとうございました! Thank you!




      Twitter ID: @fujibee

Más contenido relacionado

La actualidad más candente

Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startupsbmlever
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystifiedjavier ramirez
 
Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Takayuki Shimizukawa
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1medcl
 
Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Hermann Hueck
 
#살아있다 #자프링외길12년차 #코프링2개월생존기
#살아있다 #자프링외길12년차 #코프링2개월생존기#살아있다 #자프링외길12년차 #코프링2개월생존기
#살아있다 #자프링외길12년차 #코프링2개월생존기Arawn Park
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화Henry Jeong
 
Productive Programming in Groovy
Productive Programming in GroovyProductive Programming in Groovy
Productive Programming in GroovyGanesh Samarthyam
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630Yong Joon Moon
 
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用iammutex
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQLPeter Eisentraut
 
Gpars concepts explained
Gpars concepts explainedGpars concepts explained
Gpars concepts explainedVaclav Pech
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeKonrad Malawski
 
CommonJS: JavaScript Everywhere
CommonJS: JavaScript EverywhereCommonJS: JavaScript Everywhere
CommonJS: JavaScript EverywhereKris Kowal
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) BigDataEverywhere
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptSurvey Department
 

La actualidad más candente (20)

Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
 
Redis
RedisRedis
Redis
 
Postgresql search demystified
Postgresql search demystifiedPostgresql search demystified
Postgresql search demystified
 
Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015Sphinx autodoc - automated api documentation - PyCon.KR 2015
Sphinx autodoc - automated api documentation - PyCon.KR 2015
 
Java and xml
Java and xmlJava and xml
Java and xml
 
Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1Elastic Search Training#1 (brief tutorial)-ESCC#1
Elastic Search Training#1 (brief tutorial)-ESCC#1
 
Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8Reactive Access to MongoDB from Java 8
Reactive Access to MongoDB from Java 8
 
#살아있다 #자프링외길12년차 #코프링2개월생존기
#살아있다 #자프링외길12년차 #코프링2개월생존기#살아있다 #자프링외길12년차 #코프링2개월생존기
#살아있다 #자프링외길12년차 #코프링2개월생존기
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Productive Programming in Groovy
Productive Programming in GroovyProductive Programming in Groovy
Productive Programming in Groovy
 
엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630엘라스틱서치 적합성 이해하기 20160630
엘라스틱서치 적합성 이해하기 20160630
 
MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用MongoDB 在盛大大数据量下的应用
MongoDB 在盛大大数据量下的应用
 
Programming with Python and PostgreSQL
Programming with Python and PostgreSQLProgramming with Python and PostgreSQL
Programming with Python and PostgreSQL
 
Gpars concepts explained
Gpars concepts explainedGpars concepts explained
Gpars concepts explained
 
Scalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of codeScalding - Hadoop Word Count in LESS than 70 lines of code
Scalding - Hadoop Word Count in LESS than 70 lines of code
 
CommonJS: JavaScript Everywhere
CommonJS: JavaScript EverywhereCommonJS: JavaScript Everywhere
CommonJS: JavaScript Everywhere
 
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant) Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
Big Data Everywhere Chicago: Unleash the Power of HBase Shell (Conversant)
 
Config BuildConfig
Config BuildConfigConfig BuildConfig
Config BuildConfig
 
Psycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python ScriptPsycopg2 - Connect to PostgreSQL using Python Script
Psycopg2 - Connect to PostgreSQL using Python Script
 

Similar a JRubyKaigi2010 Hadoop Papyrus

Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInVitaly Gordon
 
Open XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxOpen XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxPublicis Sapient Engineering
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Rohit Agrawal
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoopdatasalt
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in CassandraJairam Chandar
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAnanth PackkilDurai
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programmingKuldeep Dhole
 
AJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop exampleAJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop exampleChristopher Curtin
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingToni Cebrián
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...CloudxLab
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopMax Tepkeev
 
Create & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxCreate & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxvishal choudhary
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiUnmesh Baile
 

Similar a JRubyKaigi2010 Hadoop Papyrus (20)

Scalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedInScalable and Flexible Machine Learning With Scala @ LinkedIn
Scalable and Flexible Machine Learning With Scala @ LinkedIn
 
Open XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand DechouxOpen XKE - Big Data, Big Mess par Bertrand Dechoux
Open XKE - Big Data, Big Mess par Bertrand Dechoux
 
Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3Hadoop MapReduce framework - Module 3
Hadoop MapReduce framework - Module 3
 
Introducción a hadoop
Introducción a hadoopIntroducción a hadoop
Introducción a hadoop
 
Hadoop Integration in Cassandra
Hadoop Integration in CassandraHadoop Integration in Cassandra
Hadoop Integration in Cassandra
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
An introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduceAn introduction to Test Driven Development on MapReduce
An introduction to Test Driven Development on MapReduce
 
Amazon elastic map reduce
Amazon elastic map reduceAmazon elastic map reduce
Amazon elastic map reduce
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Cs267 hadoop programming
Cs267 hadoop programmingCs267 hadoop programming
Cs267 hadoop programming
 
AJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop exampleAJUG April 2011 Raw hadoop example
AJUG April 2011 Raw hadoop example
 
Writing Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using ScaldingWriting Hadoop Jobs in Scala using Scalding
Writing Hadoop Jobs in Scala using Scalding
 
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
Writing MapReduce Programs using Java | Big Data Hadoop Spark Tutorial | Clou...
 
Hadoop
HadoopHadoop
Hadoop
 
EuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and HadoopEuroPython 2015 - Big Data with Python and Hadoop
EuroPython 2015 - Big Data with Python and Hadoop
 
Create & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptxCreate & Execute First Hadoop MapReduce Project in.pptx
Create & Execute First Hadoop MapReduce Project in.pptx
 
Hadoop + Clojure
Hadoop + ClojureHadoop + Clojure
Hadoop + Clojure
 
Big-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbaiBig-data-analysis-training-in-mumbai
Big-data-analysis-training-in-mumbai
 

Más de Koichi Fujikawa

Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Koichi Fujikawa
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusKoichi Fujikawa
 
第2回 Jenkins勉強会 LT 藤川
第2回 Jenkins勉強会 LT 藤川第2回 Jenkins勉強会 LT 藤川
第2回 Jenkins勉強会 LT 藤川Koichi Fujikawa
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術Koichi Fujikawa
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by HapyrusKoichi Fujikawa
 
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLKoichi Fujikawa
 

Más de Koichi Fujikawa (7)

Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 Hapyrus
 
第2回 Jenkins勉強会 LT 藤川
第2回 Jenkins勉強会 LT 藤川第2回 Jenkins勉強会 LT 藤川
第2回 Jenkins勉強会 LT 藤川
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術
 
Rakuten tech conf
Rakuten tech confRakuten tech conf
Rakuten tech conf
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by Hapyrus
 
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
 

JRubyKaigi2010 Hadoop Papyrus

  • 1. MapReduce by JRuby and DSL Hadoop Papyrus 2010/8/28 JRubyKaigi 2010 藤川幸一 FUJIKAWA Koichi @fujibee
  • 2. What’s Hadoop? • FW of parallel distributed processing framework for BIG data • OSS clone of Google MapReduce • For over terabyte scale data processing – Took over 2000hr if you read the data of 400TB(Web scale data) by standard HDD, reading 50MB/s – Need the distributed file system and parallel processing framework!
  • 3. Hadoop Papyrus • My own OSS project – Hosted by github http://github.com/fujibee/hadoop-papyrus • Framework for running Hadoop jobs by (J)Ruby DSL description – Originally Hadoop jobs written by Java – Just few lines in Ruby same as the very complex procedure if using Java! • Supported by IPA MITOH 2009 project (Government support) • Can run by Hudson (CI tool) plug-in
  • 4. Step.1 Not Java, But we can write in Ruby!
  • 5. Step.2 Simple description by DSL in Ruby Map Reduce Job Description Log Analysis DSL
  • 6. Step.3 Enable the Hadoop server environment easily by Hudson
  • 7. package org.apache.hadoop.examples; Javaの場合 import java.io.IOException; import java.util.StringTokenizer; 70 lines are needed in Java.. import org.apache.hadoop.conf.Configuration; Hadoop Papyrus is only needed 10 lines! import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; IntSumReducer extends public static class import org.apache.hadoop.mapreduce.Reducer; Reducer<Text, IntWritable, Text, IntWritable> { import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; private IntWritable result = new IntWritable(); import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.GenericOptionsParser; public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { public class WordCount { int sum = 0; for (IntWritable val : values) { sum += val.get(); public static class TokenizerMapper extends } Mapper<Object, Text, Text, IntWritable> { result.set(sum); Hadoop Papyrus context.write(key, result); } dsl 'LogAnalysis‘ private final static IntWritable one = new IntWritable(1); } private Text word = new Text(); public static void main(String[] args) throws Exception { public void map(Object key, Text value, Context context) Configuration(); Configuration conf = new from ‘test/in‘ throws IOException, InterruptedException { String[] otherArgs = new GenericOptionsParser(conf, args) StringTokenizer itr = new StringTokenizer(value.toString()); .getRemainingArgs(); to ‘test/out’ while (itr.hasMoreTokens()) { if (otherArgs.length != 2) { word.set(itr.nextToken()); System.err.println("Usage: wordcount <in> <out>"); context.write(word, one); System.exit(2); } } } pattern /¥[¥[([^|¥]:]+)[^¥]:]*¥]¥]/ Job job = new Job(conf, "word count"); } job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); column_name :link job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); topic "link num", :label => 'n' do FileInputFormat.addInputPath(job, new Path(otherArgs[0])); FileOutputFormat.setOutputPath(job, new Path(otherArgs[1])); count_uniq column[:link] System.exit(job.waitForCompletion(true) ? 0 : 1); } } end
  • 8. Hadoop Papyrus Details • Invoke Ruby script using JRuby in the process of Map/Reduce running on Java
  • 9. Hadoop Papyrus Details (con’t) • Additionally, we can write the DSL script you want to process (log analysis, etc). Papyrus can choose the different process on each phase (Map or Reduce, job initialization). So we just need the only one script.