SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Design of a DSL by Ruby
 for heavy computations
over map-reduce clusters


     the 37th Grace seminar
         16th June, 2010

          Koichi Fujikawa
     Cirius Technologies, Inc.
Today's Agenda
 Background
 Problem
 Approach
 My Project
 Conclusion
Background
         Where are we in the world?
We Live in the "Big Data" era
 World-wide web page data (Text-only) is expected
 400TB (at one point).
   Some web service company (like Google,
   Yahoo, etc) have to process these data for
   their business, but..
 General HDD can read data in 50MB/sec. This
 means we can take 2000 hours (approx. 100
 days) to read the total web data(400TB) by one
 machine.
 We need the parallel processing / file system.
MapReduce
 MapReduce is one of the parallel skeletons
 Became popular by Google's paper(2004)
 MapReduce has two phases
    Map phase: transform key and value to
    another (key and) value
    Reduce phase: aggregate and calculate
    values by one key
 Each record process by map phase first and
 then by reduce phase
Hadoop
 Hadoop is open source clone of Google
 MapReduce hosted by Apache Foundation
 Big web service provider(Yahoo, Facebook,
 etc) contribute this project actively.
 Large development and user community all
 over the world (including Japan)
    Hadoop conference Japan 2009
    Hadoop source code reading events
Problem
          What issues do we face?
Programming Model
 General programmers, engineers are not
 familiar with this "MapReduce" model, so it is
 too difficult to try and use
    Especially to separate Map and Reduce
 No Effective way of the "pattern of the
 MapRecuce programming" because this
 technology is not mature for the engineers.
 We have to find this individually. It is very
 difficult and time-consuming.
Programming Language
 Hadoop is written in Java language, so the
 programmers need to write Map and Reduce
 procedure in Java.
   Java is strong typed and compile language.
   Some web service engineer don't like these
   language.
 No problem if the code is fixed and
 completed, but I wonder it is suitable for ad-
 hoc prototyping and easy querying.
   MapReduce jobs depend on what users want to
   get, so flexibility is important, I think.
Approach
           How do we resolve it?
Hide complexity of MapReduce
 I found the description for MapReduce could
 be simpler in some specific case (e.g. log
 analysis).
 In this case (but almost all of Hadoop usage is
 now log analysis), it would be nice if
 programmers can write the description without
 taking care of MapReduce!
DSL approach by Ruby
 For this description, I created DSL for each
 specific usage.
   Log analysis DSL is a reference
   implementation which I prepared.
 As DSL runtime environment for Hadoop, I
 chose Ruby and JRuby, which is Ruby
 runtime working on JVM.
   Ruby is very flexible and reusable object-
   oriented language, so very easy to create
   DSL processor.
My project
             What do I do?
Hadoop Papyrus
 DSL framework for Hadoop by JRuby
   We can write log analysis code by
   only several line.
 Open source (Apache Licence) same as
 Hadoop
   Hosted by github
 Distributed by common Ruby archive site
   RubyGems.org
 Supported by IPA mitoh 2009
DEMO
Conclusion
             What is archiving now?
On the way to big challenge
 We need parallel processing method to
 handle massive web-scale data.
 MapReduce and Hadoop is one of good tools,
 but..
   Difficult to describe Map and Reduce
   Irritated to write Java for someone :-)
 Hadoop Papyrus is providing the key!
   Ruby-based DSL framework for Hadoop
   You can write Map and Reduce at once
Questions?
 Thank you very much!
  Twitter ID: @fujibee

Más contenido relacionado

La actualidad más candente

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Nathan Bijnens
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkRde:code 2017
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Uwe Printz
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure wayBahadir Cambel
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineeringunivalence
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startupsbmlever
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioWinston Chen
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBTobias Trelle
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophyTakuto Wada
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseJihoon Son
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring DataAnton Sulzhenko
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascadingjohnynek
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and MonoidsHugo Gävert
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRick Copeland
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renrend0nn9n
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto OverfviewMiguel Ping
 

La actualidad más candente (20)

Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!Hadoop Pig: MapReduce the easy way!
Hadoop Pig: MapReduce the easy way!
 
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
[AI04] Scaling Machine Learning to Big Data Using SparkML and SparkR
 
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
 
Redis
RedisRedis
Redis
 
Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)Introduction to the Hadoop Ecosystem (codemotion Edition)
Introduction to the Hadoop Ecosystem (codemotion Edition)
 
Onyx data processing the clojure way
Onyx   data processing  the clojure wayOnyx   data processing  the clojure way
Onyx data processing the clojure way
 
7 key recipes for data engineering
7 key recipes for data engineering7 key recipes for data engineering
7 key recipes for data engineering
 
Scalding for Hadoop
Scalding for HadoopScalding for Hadoop
Scalding for Hadoop
 
Scoobi - Scala for Startups
Scoobi - Scala for StartupsScoobi - Scala for Startups
Scoobi - Scala for Startups
 
Data Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudioData Science Stack with MongoDB and RStudio
Data Science Stack with MongoDB and RStudio
 
Java Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDBJava Persistence Frameworks for MongoDB
Java Persistence Frameworks for MongoDB
 
power-assert, mechanism and philosophy
power-assert, mechanism and philosophypower-assert, mechanism and philosophy
power-assert, mechanism and philosophy
 
Introduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data WarehouseIntroduction to Apache Tajo: Future of Data Warehouse
Introduction to Apache Tajo: Future of Data Warehouse
 
Hadoop
HadoopHadoop
Hadoop
 
MongoDB + Java + Spring Data
MongoDB + Java + Spring DataMongoDB + Java + Spring Data
MongoDB + Java + Spring Data
 
Scalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/CascadingScalding: Twitter's Scala DSL for Hadoop/Cascading
Scalding: Twitter's Scala DSL for Hadoop/Cascading
 
Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQRealtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
Realtime Analytics Using MongoDB, Python, Gevent, and ZeroMQ
 
Huangjing renren
Huangjing renrenHuangjing renren
Huangjing renren
 
Presto Overfview
Presto OverfviewPresto Overfview
Presto Overfview
 

Destacado

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLKoichi Fujikawa
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by HapyrusKoichi Fujikawa
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableDana Luterman
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADALboello
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術Koichi Fujikawa
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusKoichi Fujikawa
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Koichi Fujikawa
 

Destacado (9)

Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSLHadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
Hadoop Conf Japan 2009 After Party LT - Hadoop Ruby DSL
 
Cloud computing competition by Hapyrus
Cloud computing competition by HapyrusCloud computing competition by Hapyrus
Cloud computing competition by Hapyrus
 
Technology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms TableTechnology Plan For Stevenson Ms Table
Technology Plan For Stevenson Ms Table
 
Rakuten tech conf
Rakuten tech confRakuten tech conf
Rakuten tech conf
 
GUIAS DE NADAL
GUIAS DE NADALGUIAS DE NADAL
GUIAS DE NADAL
 
Trends WCM 2010
Trends WCM 2010Trends WCM 2010
Trends WCM 2010
 
クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術クラウド時代の並列分散処理技術
クラウド時代の並列分散処理技術
 
Tokyo Webmining #12 Hapyrus
Tokyo Webmining #12 HapyrusTokyo Webmining #12 Hapyrus
Tokyo Webmining #12 Hapyrus
 
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
Amazon Redshiftの開発者がこれだけは知っておきたい10のTIPS / 第18回 AWS User Group - Japan
 

Similar a Design of a_dsl_by_ruby_for_heavy_computations

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHitendra Kumar
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scalesamthemonad
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn HadoopSilicon Halton
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs sparkamarkayam
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1Thanh Nguyen
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceChandra Sekhar
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answerstechieguy85
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...Big Data Montreal
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMJeremy Whitlock
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtssiddharth30121
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo pptPhil Young
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoopRexRamos9
 

Similar a Design of a_dsl_by_ruby_for_heavy_computations (20)

Hadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log ProcessingHadoop a Natural Choice for Data Intensive Log Processing
Hadoop a Natural Choice for Data Intensive Log Processing
 
spark_v1_2
spark_v1_2spark_v1_2
spark_v1_2
 
Architecting and productionising data science applications at scale
Architecting and productionising data science applications at scaleArchitecting and productionising data science applications at scale
Architecting and productionising data science applications at scale
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
 
Hadoop vs spark
Hadoop vs sparkHadoop vs spark
Hadoop vs spark
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Overview of big data & hadoop v1
Overview of big data & hadoop   v1Overview of big data & hadoop   v1
Overview of big data & hadoop v1
 
Hadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective AudienceHadoop And Big Data - My Presentation To Selective Audience
Hadoop And Big Data - My Presentation To Selective Audience
 
Hadoop Interview Question and Answers
Hadoop  Interview Question and AnswersHadoop  Interview Question and Answers
Hadoop Interview Question and Answers
 
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
BDM37: Hadoop in production – the war stories by Nikolaï Grigoriev, Principal...
 
Blue Ruby SDN Webinar
Blue Ruby SDN WebinarBlue Ruby SDN Webinar
Blue Ruby SDN Webinar
 
Gluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVMGluecon 2014 - Bringing Node.js to the JVM
Gluecon 2014 - Bringing Node.js to the JVM
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Unit 4 lecture2
Unit 4 lecture2Unit 4 lecture2
Unit 4 lecture2
 
B04 06 0918
B04 06 0918B04 06 0918
B04 06 0918
 
Handling not so big data
Handling not so big dataHandling not so big data
Handling not so big data
 
Learn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemtsLearn about SPARK tool and it's componemts
Learn about SPARK tool and it's componemts
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
 

Último

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Design of a_dsl_by_ruby_for_heavy_computations

  • 1. Design of a DSL by Ruby for heavy computations over map-reduce clusters the 37th Grace seminar 16th June, 2010 Koichi Fujikawa Cirius Technologies, Inc.
  • 2. Today's Agenda Background Problem Approach My Project Conclusion
  • 3. Background Where are we in the world?
  • 4. We Live in the "Big Data" era World-wide web page data (Text-only) is expected 400TB (at one point). Some web service company (like Google, Yahoo, etc) have to process these data for their business, but.. General HDD can read data in 50MB/sec. This means we can take 2000 hours (approx. 100 days) to read the total web data(400TB) by one machine. We need the parallel processing / file system.
  • 5. MapReduce MapReduce is one of the parallel skeletons Became popular by Google's paper(2004) MapReduce has two phases Map phase: transform key and value to another (key and) value Reduce phase: aggregate and calculate values by one key Each record process by map phase first and then by reduce phase
  • 6.
  • 7. Hadoop Hadoop is open source clone of Google MapReduce hosted by Apache Foundation Big web service provider(Yahoo, Facebook, etc) contribute this project actively. Large development and user community all over the world (including Japan) Hadoop conference Japan 2009 Hadoop source code reading events
  • 8. Problem What issues do we face?
  • 9. Programming Model General programmers, engineers are not familiar with this "MapReduce" model, so it is too difficult to try and use Especially to separate Map and Reduce No Effective way of the "pattern of the MapRecuce programming" because this technology is not mature for the engineers. We have to find this individually. It is very difficult and time-consuming.
  • 10. Programming Language Hadoop is written in Java language, so the programmers need to write Map and Reduce procedure in Java. Java is strong typed and compile language. Some web service engineer don't like these language. No problem if the code is fixed and completed, but I wonder it is suitable for ad- hoc prototyping and easy querying. MapReduce jobs depend on what users want to get, so flexibility is important, I think.
  • 11. Approach How do we resolve it?
  • 12. Hide complexity of MapReduce I found the description for MapReduce could be simpler in some specific case (e.g. log analysis). In this case (but almost all of Hadoop usage is now log analysis), it would be nice if programmers can write the description without taking care of MapReduce!
  • 13. DSL approach by Ruby For this description, I created DSL for each specific usage. Log analysis DSL is a reference implementation which I prepared. As DSL runtime environment for Hadoop, I chose Ruby and JRuby, which is Ruby runtime working on JVM. Ruby is very flexible and reusable object- oriented language, so very easy to create DSL processor.
  • 14. My project What do I do?
  • 15. Hadoop Papyrus DSL framework for Hadoop by JRuby We can write log analysis code by only several line. Open source (Apache Licence) same as Hadoop Hosted by github Distributed by common Ruby archive site RubyGems.org Supported by IPA mitoh 2009
  • 16.
  • 17.
  • 18. DEMO
  • 19. Conclusion What is archiving now?
  • 20. On the way to big challenge We need parallel processing method to handle massive web-scale data. MapReduce and Hadoop is one of good tools, but.. Difficult to describe Map and Reduce Irritated to write Java for someone :-) Hadoop Papyrus is providing the key! Ruby-based DSL framework for Hadoop You can write Map and Reduce at once
  • 21. Questions? Thank you very much! Twitter ID: @fujibee