SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Design of a DSL by Ruby
 for heavy computations
over map-reduce clusters


     the 37th Grace seminar
         16th June, 2010

          Koichi Fujikawa
     Cirius Technologies, Inc.
Today's Agenda
 Background
 Problem
 Approach
 My Project
 Conclusion
Background
         Where are we in the world?
We Live in the "Big Data" era
 World-wide web page data (Text-only) is expected
 400TB (at one point).
   Some web service company (like Google,
   Yahoo, etc) have to process these data for
   their business, but..
 General HDD can read data in 50MB/sec. This
 means we can take 2000 hours (approx. 100
 days) to read the total web data(400TB) by one
 machine.
 We need the parallel processing / file system.
MapReduce
 MapReduce is one of the parallel skeletons
 Became popular by Google's paper(2004)
 MapReduce has two phases
    Map phase: transform key and value to
    another (key and) value
    Reduce phase: aggregate and calculate
    values by one key
 Each record process by map phase first and
 then by reduce phase
Hadoop
 Hadoop is open source clone of Google
 MapReduce hosted by Apache Foundation
 Big web service provider(Yahoo, Facebook,
 etc) contribute this project actively.
 Large development and user community all
 over the world (including Japan)
    Hadoop conference Japan 2009
    Hadoop source code reading events
Problem
          What issues do we face?
Programming Model
 General programmers, engineers are not
 familiar with this "MapReduce" model, so it is
 too difficult to try and use
    Especially to separate Map and Reduce
 No Effective way of the "pattern of the
 MapRecuce programming" because this
 technology is not mature for the engineers.
 We have to find this individually. It is very
 difficult and time-consuming.
Programming Language
 Hadoop is written in Java language, so the
 programmers need to write Map and Reduce
 procedure in Java.
   Java is strong typed and compile language.
   Some web service engineer don't like these
   language.
 No problem if the code is fixed and
 completed, but I wonder it is suitable for ad-
 hoc prototyping and easy querying.
   MapReduce jobs depend on what users want to
   get, so flexibility is important, I think.
Approach
           How do we resolve it?
Hide complexity of MapReduce
 I found the description for MapReduce could
 be simpler in some specific case (e.g. log
 analysis).
 In this case (but almost all of Hadoop usage is
 now log analysis), it would be nice if
 programmers can write the description without
 taking care of MapReduce!
DSL approach by Ruby
 For this description, I created DSL for each
 specific usage.
   Log analysis DSL is a reference
   implementation which I prepared.
 As DSL runtime environment for Hadoop, I
 chose Ruby and JRuby, which is Ruby
 runtime working on JVM.
   Ruby is very flexible and reusable object-
   oriented language, so very easy to create
   DSL processor.
My project
             What do I do?
Hadoop Papyrus
 DSL framework for Hadoop by JRuby
   We can write log analysis code by
   only several line.
 Open source (Apache Licence) same as
 Hadoop
   Hosted by github
 Distributed by common Ruby archive site
   RubyGems.org
 Supported by IPA mitoh 2009
DEMO
Conclusion
             What is archiving now?
On the way to big challenge
 We need parallel processing method to
 handle massive web-scale data.
 MapReduce and Hadoop is one of good tools,
 but..
   Difficult to describe Map and Reduce
   Irritated to write Java for someone :-)
 Hadoop Papyrus is providing the key!
   Ruby-based DSL framework for Hadoop
   You can write Map and Reduce at once
Questions?
 Thank you very much!
  Twitter ID: @fujibee

Más contenido relacionado

La actualidad más candente

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkRde:code 2017
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineeringunivalence
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startupsbmlever
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioWinston Chen
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBTobias Trelle
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophyTakuto Wada
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring DataAnton Sulzhenko
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renrend0nn9n
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto OverfviewMiguel Ping
 

La actualidad más candente (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Redis
RedisRedis
Redis
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophy
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Hadoop
HadoopHadoop
Hadoop
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring Data
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renren
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto Overfview
 

Destacado

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLKoichi Fujikawa
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by HapyrusKoichi Fujikawa
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableDana Luterman
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADALboello
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術Koichi Fujikawa
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusKoichi Fujikawa
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Koichi Fujikawa
 

Destacado (9)

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by Hapyrus
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms Table
 
Rakuten tech conf
Rakuten tech confRakuten tech conf
Rakuten tech conf
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADAL
 
Trends WCM 2010
Trends WCM 2010Trends WCM 2010
Trends WCM 2010
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 Hapyrus
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
 

Similar a Design of a_dsl_by_ruby_for_heavy_computations

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceChandra Sekhar
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMJeremy Whitlock
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 

Similar a Design of a_dsl_by_ruby_for_heavy_computations (20)

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective Audience
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Blue Ruby SDN Webinar
Blue Ruby SDN WebinarBlue Ruby SDN Webinar
Blue Ruby SDN Webinar
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVM
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 

Último

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 

Último (20)

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

Design of a_dsl_by_ruby_for_heavy_computations

  • 1. Design of a DSL by Ruby for heavy computations over map-reduce clusters the 37th Grace seminar 16th June, 2010 Koichi Fujikawa Cirius Technologies, Inc.
  • 2. Today's Agenda Background Problem Approach My Project Conclusion
  • 3. Background Where are we in the world?
  • 4. We Live in the "Big Data" era World-wide web page data (Text-only) is expected 400TB (at one point). Some web service company (like Google, Yahoo, etc) have to process these data for their business, but.. General HDD can read data in 50MB/sec. This means we can take 2000 hours (approx. 100 days) to read the total web data(400TB) by one machine. We need the parallel processing / file system.
  • 5. MapReduce MapReduce is one of the parallel skeletons Became popular by Google's paper(2004) MapReduce has two phases Map phase: transform key and value to another (key and) value Reduce phase: aggregate and calculate values by one key Each record process by map phase first and then by reduce phase
  • 6.
  • 7. Hadoop Hadoop is open source clone of Google MapReduce hosted by Apache Foundation Big web service provider(Yahoo, Facebook, etc) contribute this project actively. Large development and user community all over the world (including Japan) Hadoop conference Japan 2009 Hadoop source code reading events
  • 8. Problem What issues do we face?
  • 9. Programming Model General programmers, engineers are not familiar with this "MapReduce" model, so it is too difficult to try and use Especially to separate Map and Reduce No Effective way of the "pattern of the MapRecuce programming" because this technology is not mature for the engineers. We have to find this individually. It is very difficult and time-consuming.
  • 10. Programming Language Hadoop is written in Java language, so the programmers need to write Map and Reduce procedure in Java. Java is strong typed and compile language. Some web service engineer don't like these language. No problem if the code is fixed and completed, but I wonder it is suitable for ad- hoc prototyping and easy querying. MapReduce jobs depend on what users want to get, so flexibility is important, I think.
  • 11. Approach How do we resolve it?
  • 12. Hide complexity of MapReduce I found the description for MapReduce could be simpler in some specific case (e.g. log analysis). In this case (but almost all of Hadoop usage is now log analysis), it would be nice if programmers can write the description without taking care of MapReduce!
  • 13. DSL approach by Ruby For this description, I created DSL for each specific usage. Log analysis DSL is a reference implementation which I prepared. As DSL runtime environment for Hadoop, I chose Ruby and JRuby, which is Ruby runtime working on JVM. Ruby is very flexible and reusable object- oriented language, so very easy to create DSL processor.
  • 14. My project What do I do?
  • 15. Hadoop Papyrus DSL framework for Hadoop by JRuby We can write log analysis code by only several line. Open source (Apache Licence) same as Hadoop Hosted by github Distributed by common Ruby archive site RubyGems.org Supported by IPA mitoh 2009
  • 16.
  • 17.
  • 18. DEMO
  • 19. Conclusion What is archiving now?
  • 20. On the way to big challenge We need parallel processing method to handle massive web-scale data. MapReduce and Hadoop is one of good tools, but.. Difficult to describe Map and Reduce Irritated to write Java for someone :-) Hadoop Papyrus is providing the key! Ruby-based DSL framework for Hadoop You can write Map and Reduce at once
  • 21. Questions? Thank you very much! Twitter ID: @fujibee