SlideShare una empresa de Scribd logo
1 de 27
An Introduction to
MapReduce with MongoDB
        Russell Smith
/usr/bin/whoami

•   Russell Smith

•   Consultant for UKD1 Limited

•   I Specialise in helping companies going through rapid growth;

•   Code, architecture, infrastructure, devops, sysops, capacity planning, etc

•   <3 Gearman, MongoDB, Neo4j, MySQL, Riak, Kohana, PHP, Debian, Puppet, etc...
What is MongoDB

•   A scalable, high-performance, open source, document-oriented
    database.

•   Stores JSON like documents

•   Indexible on any attributes (like MySQL)

•   Built in MapReduce
Requirements

•   A running MongoDB server
    http://www.mongodb.org/downloads


•   Basic knowledge of MongoDB

•   Basic Javascript
What is Map Reduce

•   Allows aggregating data in parallel

•   Some built in aggregation functions exist;
    distinct, count

•   If you need to do something more, either query or MapReduce
How does it work?
•   You write two functions

•   You write them in Javascript (currently)
•   Map function:
    Called once per document - returns a key + a value

•   Reduce function:
    Called once per key emitted, with an array of values

•   Optional finalize function allowing rounding up of the reduce data
Some example data

•   I downloaded the H1B (US temporary work VISA data)
    http://www.flcdatacenter.com/CaseH1B.aspx


•   Imported the CSV data using mongoimport command

•   Total imported documents ~335k
What do the documents look like?
                                  {
                                  
   "_id" : ObjectId("4db7c981e243a6e23725570f"),
                                  
   "LCA_CASE_NUMBER" : "I-200-09132-243675",
                                  
   "STATUS" : "CERTIFIED",
                                  
   "LCA_CASE_SUBMIT" : "7/14/2010 9:06:36",



•
                                  
   "VISA_CLASS" : "H-1B",

    LCA_CASE_EMPLOYER_STATE       
                                  
                                  
                                      "LCA_CASE_EMPLOYMENT_START_DATE" : "12/15/2010 0:00:00",
                                      "LCA_CASE_EMPLOYMENT_END_DATE" : "12/15/2013 0:00:00",
                                      "LCA_CASE_EMPLOYER_NAME" : "BRITISH SCHOOL OF AMERICA, LLC",
                                  
   "LCA_CASE_EMPLOYER_ADDRESS" : "4211 WATONGA BLVD.",
                                  
   "LCA_CASE_EMPLOYER_CITY" : "HOUSTON",



•
                                  
   "LCA_CASE_EMPLOYER_STATE" : "TX",

    STATUS                        
                                  
                                  
                                      "LCA_CASE_EMPLOYER_POSTAL_CODE" : 77092,
                                      "LCA_CASE_SOC_CODE" : "25-2022.00",
                                      "LCA_CASE_SOC_NAME" : "Middle School Teachers, Except Special and Vocatio",
                                  
   "LCA_CASE_JOB_TITLE" : "MIDDLE SCHOOL TEACHER/IB COORDINATOR",
                                  
   "LCA_CASE_WAGE_RATE_FROM" : 51577.63,



•
                                  
   "LCA_CASE_WAGE_RATE_UNIT" : "Year",

    LCA_CASE_SUMBIT / Decision_Date
                                  
                                  
                                  
                                      "FULL_TIME_POS" : "Y",
                                      "TOTAL_WORKERS" : 1,
                                      "LCA_CASE_WORKLOC1_CITY" : "HOUSTON",
                                  
   "LCA_CASE_WORKLOC1_STATE" : "TX",




•
                                  
   "PW_1" : 47827,


    LCA_CASE_WAGE_RATE_FROM
                                  
   "PW_UNIT_1" : "Year",
                                  
   "PW_SOURCE_1" : "OES",
                                  
   "OTHER_WAGE_SOURCE_1" : "OFLC ONLINE DATA CENTER",
                                  
   "YR_SOURCE_PUB_1" : 2010,
                                  
   "LCA_CASE_NAICS_CODE" : 611110,
                                  
   "Decision_Date" : "7/20/2010 0:00:00r"
                                  }
What we can do with the data?

•   Work out the;

•   Applications per state

•   Applications by status per state

•   Average time from submission to decision, by status
Applications by State


•   Key will be LCA_CASE_EMPLOYER_STATE

•   Assume (wrongly) one person per document
Map


•   this is equal to the current document     m = function () {

                                              
   emit(this.LCA_CASE_EMPLOYER_STATE, 1);
•   emit a value of 1; as we are assuming a
    single H1B app per document               }
Reduce


•   Return a value; the length of the array      r = function (k, v_arr) {
                                                    return v_arr.length
•   This works as each value in the array is 1   }
Executing


•   This will execute the map/reduce
                                        db.text2010.mapReduce(m,r,
                                        {out: 'workers_by_state',
•   Output goes to a collection named
                                        keeptemp:true, verbose:true})
    workers_by_state
Result

{
"_id"
:
"NEW
YORK",
"value"
:
512
}
{
"_id"
:
"IOWA",
"value"
:
15
}
{
"_id"
:
"KANSAS",
"value"
:
54
}
...
A more complex Map!

                                            m = function () {
•   The last example assumed one worker
    per state...which is wrong.                   emit(this.LCA_CASE_EMPLOYER_STATE,
                                            this.TOTAL_WORKERS);

•   We now emit a numeric value per state
                                            }
Reduce
                                             r = function (k, v_arr) {
                                                   var total = 0;
                                                   var len = v_arr.length;

•   As the array now contains values other
                                                  for (var i=0, i<len, i++)
    than 1, we have to iterate over it
                                                  {
                                                        total = total + v_arr[i];
•   This is standard Javascript
                                                  }
                                                  return total;
                                             }
VISA Class by Application Status by
          Average wage                    m = function () {
                                               var k = this.VISA_CLASS + ' ' + this.STATUS;

                                              switch (this.LCA_CASE_WAGE_RATE_UNIT)
                                              {


•
                                                   case 'Year':
    Assumptions:                                         emit(k, this.LCA_CASE_WAGE_RATE_FROM);
                                                         break;

                                                   case 'Month':

•   People work ~40 hour weeks                         emit(k, this.LCA_CASE_WAGE_RATE_FROM * 12);
                                                       break;

                                                   case 'Bi-Weekly':


•
                                                       emit(k, this.LCA_CASE_WAGE_RATE_FROM * 26);
    Weekly wages are paid every week                   break;

    rather than only the weeks worked              case 'Week':
                                                       emit(k, this.LCA_CASE_WAGE_RATE_FROM * 52);
                                                       break;



•   'Select Pay Range' seems to the the            case 'Hour':
                                                       emit(k, this.LCA_CASE_WAGE_RATE_FROM * 40 * 52);

    default option...                                  break;

                                                   default:
                                                        emit(k, 0);
                                              }

                                          }
Reduce
                                        r = function (k, v_arr) {
                                              var tot = 0;
                                              var len = v_arr.length;
•   Work out the average for each key
                                             for (var i = 0; i < len; i++)
                                             {
•   Add each of the elements up
                                                   tot += v_arr[i];
                                             }
•   Average them

                                             return tot / len;
                                        }
Finalize

•   A finalize function may be run after reduction.

•   Called a single time per object

•   The finalize function takes a key and a value, and returns a finalized
    value.
Options

•   Persist the output

•   Filtering input documents

•   Sorting input documents

•   Javascript scope - allows you to pass in extra variables (cannot be
    changed at runtime?)
Current limitations / Watch for

•   Single threaded per node (which sucks)
    https://jira.mongodb.org/browse/SERVER-463


•   Language is restricted to Javascript (which sucks)
    https://jira.mongodb.org/browse/SERVER-699)


•   Does not use secondaries in replica sets

•   From 1.7.3 on, you can reduce into existing collection
...


•   Doesn't allow creation of full documents (which can be a pain for
    perm MR collections if using libraries)
    https://jira.mongodb.org/browse/SERVER-2517


•   Slow; ~x20-30 slower than Hadoop with 1.8
    https://jira.mongodb.org/browse/SERVER-3055
Using MongoDB with Hadoop

•   https://github.com/mongodb/mongo-hadoop

•   Open source

•   Requires knowledge of Java

•   Working Input and Output adapters for MongoDB are provided

•   Alpha quality from what I can tell
The future
1.9 / 2.0

•   V8 is replacing SpiderMonkey

•   Recent Hadoop provider

•   Sharded output collections

•   Improved yielding (concurrency)
> 2.0

•   Multi-threaded

•   Alternative languages
    https://jira.mongodb.org/browse/SERVER-699


•   ~2.2 native aggregation framework

•   Js only mode is faster for lighter jobs
    https://jira.mongodb.org/browse/SERVER-2976
Further reading
•   I’ve only brushed on the details, but this should be enough to get you
    interested / started with MongoDB Map Reduce. Some of the missing
    stuff;

•   Finalize functions - http://bit.ly/gEfKOr

•   Some more examples - http://bit.ly/ig1Yfj

Más contenido relacionado

La actualidad más candente

How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyMongoDB
 
PHP Cookies and Sessions
PHP Cookies and SessionsPHP Cookies and Sessions
PHP Cookies and SessionsNisa Soomro
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema DesignMongoDB
 
03 spark rdd operations
03 spark rdd operations03 spark rdd operations
03 spark rdd operationsVenkat Datla
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architectureBishal Khanal
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaEdureka!
 
Bahmni Introduction
Bahmni IntroductionBahmni Introduction
Bahmni IntroductionVivek Singh
 
Introduction to .net framework
Introduction to .net frameworkIntroduction to .net framework
Introduction to .net frameworkArun Prasad
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented DatabasesFabio Fumarola
 
Spring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSpring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSam Brannen
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Dan Harvey
 

La actualidad más candente (20)

How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
Consistency in NoSQL
Consistency in NoSQLConsistency in NoSQL
Consistency in NoSQL
 
Ajax
AjaxAjax
Ajax
 
REST & RESTful Web Services
REST & RESTful Web ServicesREST & RESTful Web Services
REST & RESTful Web Services
 
Introduction to Apache Storm
Introduction to Apache StormIntroduction to Apache Storm
Introduction to Apache Storm
 
PHP Cookies and Sessions
PHP Cookies and SessionsPHP Cookies and Sessions
PHP Cookies and Sessions
 
MongoDB Schema Design
MongoDB Schema DesignMongoDB Schema Design
MongoDB Schema Design
 
Servlet life cycle
Servlet life cycleServlet life cycle
Servlet life cycle
 
03 spark rdd operations
03 spark rdd operations03 spark rdd operations
03 spark rdd operations
 
MYSQL.ppt
MYSQL.pptMYSQL.ppt
MYSQL.ppt
 
Restful web services ppt
Restful web services pptRestful web services ppt
Restful web services ppt
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | EdurekaMapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
MapReduce Tutorial | What is MapReduce | Hadoop MapReduce Tutorial | Edureka
 
Bahmni Introduction
Bahmni IntroductionBahmni Introduction
Bahmni Introduction
 
Introduction to .net framework
Introduction to .net frameworkIntroduction to .net framework
Introduction to .net framework
 
Caching
CachingCaching
Caching
 
Getting started with entity framework
Getting started with entity framework Getting started with entity framework
Getting started with entity framework
 
9. Document Oriented Databases
9. Document Oriented Databases9. Document Oriented Databases
9. Document Oriented Databases
 
Spring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. RESTSpring Web Services: SOAP vs. REST
Spring Web Services: SOAP vs. REST
 
Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.Change data capture with MongoDB and Kafka.
Change data capture with MongoDB and Kafka.
 

Destacado

Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-ReduceBrendan Tierney
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reducerantav
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceTakahiro Inoue
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation FrameworkMongoDB
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsMongoDB
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopAhmedabadJavaMeetup
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation FrameworkTyler Brock
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHPfwso
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsLeila panahi
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...Gianfranco Palumbo
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB MongoDB
 
Justin J. Dunne Resume
Justin J. Dunne ResumeJustin J. Dunne Resume
Justin J. Dunne ResumeJustin Dunne
 
shared-ownership-21_FINAL
shared-ownership-21_FINALshared-ownership-21_FINAL
shared-ownership-21_FINALChristoph Sinn
 
apprenticeship-levy-summary-5may2016 (1)
apprenticeship-levy-summary-5may2016 (1)apprenticeship-levy-summary-5may2016 (1)
apprenticeship-levy-summary-5may2016 (1)David Ritchie
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceBhupesh Chawda
 

Destacado (20)

An Introduction To Map-Reduce
An Introduction To Map-ReduceAn Introduction To Map-Reduce
An Introduction To Map-Reduce
 
Introduction to Map-Reduce
Introduction to Map-ReduceIntroduction to Map-Reduce
Introduction to Map-Reduce
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
Aggregation Framework
Aggregation FrameworkAggregation Framework
Aggregation Framework
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Introduction to MongoDB and Workshop
Introduction to MongoDB and WorkshopIntroduction to MongoDB and Workshop
Introduction to MongoDB and Workshop
 
MongoDB - Ekino PHP
MongoDB - Ekino PHPMongoDB - Ekino PHP
MongoDB - Ekino PHP
 
MongoDB Aggregation Framework
MongoDB Aggregation FrameworkMongoDB Aggregation Framework
MongoDB Aggregation Framework
 
MongoDB
MongoDBMongoDB
MongoDB
 
Introduction to MongoDB with PHP
Introduction to MongoDB with PHPIntroduction to MongoDB with PHP
Introduction to MongoDB with PHP
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
How to leverage MongoDB for Big Data Analysis and Operations with MongoDB's A...
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
Justin J. Dunne Resume
Justin J. Dunne ResumeJustin J. Dunne Resume
Justin J. Dunne Resume
 
shared-ownership-21_FINAL
shared-ownership-21_FINALshared-ownership-21_FINAL
shared-ownership-21_FINAL
 
apprenticeship-levy-summary-5may2016 (1)
apprenticeship-levy-summary-5may2016 (1)apprenticeship-levy-summary-5may2016 (1)
apprenticeship-levy-summary-5may2016 (1)
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 

Similar a An Introduction to Map/Reduce with MongoDB

GraphQL, Redux, and React
GraphQL, Redux, and ReactGraphQL, Redux, and React
GraphQL, Redux, and ReactKeon Kim
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingEd Kohlwey
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...Lucidworks
 
CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009Jason Davies
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsAndrew Morgan
 
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010Alex Sharp
 
"An introduction to object-oriented programming for those who have never done...
"An introduction to object-oriented programming for those who have never done..."An introduction to object-oriented programming for those who have never done...
"An introduction to object-oriented programming for those who have never done...Fwdays
 
JavaScript Fundamentals & JQuery
JavaScript Fundamentals & JQueryJavaScript Fundamentals & JQuery
JavaScript Fundamentals & JQueryJamshid Hashimi
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJSWei Ru
 
kissy-past-now-future
kissy-past-now-futurekissy-past-now-future
kissy-past-now-futureyiming he
 
KISSY 的昨天、今天与明天
KISSY 的昨天、今天与明天KISSY 的昨天、今天与明天
KISSY 的昨天、今天与明天tblanlan
 
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Fastly
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Lucas Jellema
 
前后端mvc经验 - webrebuild 2011 session
前后端mvc经验 - webrebuild 2011 session前后端mvc经验 - webrebuild 2011 session
前后端mvc经验 - webrebuild 2011 sessionRANK LIU
 
Converting a Rails application to Node.js
Converting a Rails application to Node.jsConverting a Rails application to Node.js
Converting a Rails application to Node.jsMatt Sergeant
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...StampedeCon
 
Building your first Java Application with MongoDB
Building your first Java Application with MongoDBBuilding your first Java Application with MongoDB
Building your first Java Application with MongoDBMongoDB
 
JavaScript- Functions and arrays.pptx
JavaScript- Functions and arrays.pptxJavaScript- Functions and arrays.pptx
JavaScript- Functions and arrays.pptxMegha V
 
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...Ortus Solutions, Corp
 
Programming the Physical World with Device Shadows and Rules Engine
Programming the Physical World with Device Shadows and Rules EngineProgramming the Physical World with Device Shadows and Rules Engine
Programming the Physical World with Device Shadows and Rules EngineAmazon Web Services
 

Similar a An Introduction to Map/Reduce with MongoDB (20)

GraphQL, Redux, and React
GraphQL, Redux, and ReactGraphQL, Redux, and React
GraphQL, Redux, and React
 
Beyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel ProcessingBeyond Map/Reduce: Getting Creative With Parallel Processing
Beyond Map/Reduce: Getting Creative With Parallel Processing
 
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
The Evolution of Streaming Expressions - Joel Bernstein, Alfresco & Dennis Go...
 
CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009CouchDB at JAOO Århus 2009
CouchDB at JAOO Århus 2009
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
Practical Ruby Projects with MongoDB - Ruby Kaigi 2010
 
"An introduction to object-oriented programming for those who have never done...
"An introduction to object-oriented programming for those who have never done..."An introduction to object-oriented programming for those who have never done...
"An introduction to object-oriented programming for those who have never done...
 
JavaScript Fundamentals & JQuery
JavaScript Fundamentals & JQueryJavaScript Fundamentals & JQuery
JavaScript Fundamentals & JQuery
 
Practical AngularJS
Practical AngularJSPractical AngularJS
Practical AngularJS
 
kissy-past-now-future
kissy-past-now-futurekissy-past-now-future
kissy-past-now-future
 
KISSY 的昨天、今天与明天
KISSY 的昨天、今天与明天KISSY 的昨天、今天与明天
KISSY 的昨天、今天与明天
 
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
Altitude SF 2017: Fastly GSLB: Scaling your microservice and multi-cloud envi...
 
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...Everything That Is Really Useful in Oracle Database 12c for Application Devel...
Everything That Is Really Useful in Oracle Database 12c for Application Devel...
 
前后端mvc经验 - webrebuild 2011 session
前后端mvc经验 - webrebuild 2011 session前后端mvc经验 - webrebuild 2011 session
前后端mvc经验 - webrebuild 2011 session
 
Converting a Rails application to Node.js
Converting a Rails application to Node.jsConverting a Rails application to Node.js
Converting a Rails application to Node.js
 
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
Using Multiple Persistence Layers in Spark to Build a Scalable Prediction Eng...
 
Building your first Java Application with MongoDB
Building your first Java Application with MongoDBBuilding your first Java Application with MongoDB
Building your first Java Application with MongoDB
 
JavaScript- Functions and arrays.pptx
JavaScript- Functions and arrays.pptxJavaScript- Functions and arrays.pptx
JavaScript- Functions and arrays.pptx
 
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...
ITB2019 10 in 50: Ten Coldbox Modules You Should be Using in Every App - Jon ...
 
Programming the Physical World with Device Shadows and Rules Engine
Programming the Physical World with Device Shadows and Rules EngineProgramming the Physical World with Device Shadows and Rules Engine
Programming the Physical World with Device Shadows and Rules Engine
 

Más de Rainforest QA

Machine Learning in Practice - CTO Summit Chicago 2019
Machine Learning in Practice - CTO Summit Chicago 2019Machine Learning in Practice - CTO Summit Chicago 2019
Machine Learning in Practice - CTO Summit Chicago 2019Rainforest QA
 
CTO Summit NASDAQ NYC 2017: Creating a QA Strategy
CTO Summit NASDAQ NYC 2017: Creating a QA StrategyCTO Summit NASDAQ NYC 2017: Creating a QA Strategy
CTO Summit NASDAQ NYC 2017: Creating a QA StrategyRainforest QA
 
Ops Skills and Tools for Beginners [#MongoDB World 2014]
Ops Skills and Tools for Beginners [#MongoDB World 2014]Ops Skills and Tools for Beginners [#MongoDB World 2014]
Ops Skills and Tools for Beginners [#MongoDB World 2014]Rainforest QA
 
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]Rainforest QA
 
Bitcoin Ops & Security Primer
Bitcoin Ops & Security PrimerBitcoin Ops & Security Primer
Bitcoin Ops & Security PrimerRainforest QA
 
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...Rainforest QA
 
MongoDB Command Line Tools
MongoDB Command Line ToolsMongoDB Command Line Tools
MongoDB Command Line ToolsRainforest QA
 
Seedhack MongoDB 2011
Seedhack MongoDB 2011Seedhack MongoDB 2011
Seedhack MongoDB 2011Rainforest QA
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]Rainforest QA
 
London MongoDB User Group April 2011
London MongoDB User Group April 2011London MongoDB User Group April 2011
London MongoDB User Group April 2011Rainforest QA
 
Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDBRainforest QA
 

Más de Rainforest QA (11)

Machine Learning in Practice - CTO Summit Chicago 2019
Machine Learning in Practice - CTO Summit Chicago 2019Machine Learning in Practice - CTO Summit Chicago 2019
Machine Learning in Practice - CTO Summit Chicago 2019
 
CTO Summit NASDAQ NYC 2017: Creating a QA Strategy
CTO Summit NASDAQ NYC 2017: Creating a QA StrategyCTO Summit NASDAQ NYC 2017: Creating a QA Strategy
CTO Summit NASDAQ NYC 2017: Creating a QA Strategy
 
Ops Skills and Tools for Beginners [#MongoDB World 2014]
Ops Skills and Tools for Beginners [#MongoDB World 2014]Ops Skills and Tools for Beginners [#MongoDB World 2014]
Ops Skills and Tools for Beginners [#MongoDB World 2014]
 
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
Pragmatic Rails Architecture [SF Rails, 24 Apr 14]
 
Bitcoin Ops & Security Primer
Bitcoin Ops & Security PrimerBitcoin Ops & Security Primer
Bitcoin Ops & Security Primer
 
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
Pivotal Labs Lunch Talk; 3 Infrastructure and workflow lessons learned at an ...
 
MongoDB Command Line Tools
MongoDB Command Line ToolsMongoDB Command Line Tools
MongoDB Command Line Tools
 
Seedhack MongoDB 2011
Seedhack MongoDB 2011Seedhack MongoDB 2011
Seedhack MongoDB 2011
 
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
How does Riak compare to Cassandra? [Cassandra London User Group July 2011]
 
London MongoDB User Group April 2011
London MongoDB User Group April 2011London MongoDB User Group April 2011
London MongoDB User Group April 2011
 
Geo & capped collections with MongoDB
Geo & capped collections  with MongoDBGeo & capped collections  with MongoDB
Geo & capped collections with MongoDB
 

Último

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 

Último (20)

Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 

An Introduction to Map/Reduce with MongoDB

  • 1. An Introduction to MapReduce with MongoDB Russell Smith
  • 2. /usr/bin/whoami • Russell Smith • Consultant for UKD1 Limited • I Specialise in helping companies going through rapid growth; • Code, architecture, infrastructure, devops, sysops, capacity planning, etc • <3 Gearman, MongoDB, Neo4j, MySQL, Riak, Kohana, PHP, Debian, Puppet, etc...
  • 3. What is MongoDB • A scalable, high-performance, open source, document-oriented database. • Stores JSON like documents • Indexible on any attributes (like MySQL) • Built in MapReduce
  • 4. Requirements • A running MongoDB server http://www.mongodb.org/downloads • Basic knowledge of MongoDB • Basic Javascript
  • 5. What is Map Reduce • Allows aggregating data in parallel • Some built in aggregation functions exist; distinct, count • If you need to do something more, either query or MapReduce
  • 6. How does it work? • You write two functions • You write them in Javascript (currently) • Map function: Called once per document - returns a key + a value • Reduce function: Called once per key emitted, with an array of values • Optional finalize function allowing rounding up of the reduce data
  • 7. Some example data • I downloaded the H1B (US temporary work VISA data) http://www.flcdatacenter.com/CaseH1B.aspx • Imported the CSV data using mongoimport command • Total imported documents ~335k
  • 8. What do the documents look like? { "_id" : ObjectId("4db7c981e243a6e23725570f"), "LCA_CASE_NUMBER" : "I-200-09132-243675", "STATUS" : "CERTIFIED", "LCA_CASE_SUBMIT" : "7/14/2010 9:06:36", • "VISA_CLASS" : "H-1B", LCA_CASE_EMPLOYER_STATE "LCA_CASE_EMPLOYMENT_START_DATE" : "12/15/2010 0:00:00", "LCA_CASE_EMPLOYMENT_END_DATE" : "12/15/2013 0:00:00", "LCA_CASE_EMPLOYER_NAME" : "BRITISH SCHOOL OF AMERICA, LLC", "LCA_CASE_EMPLOYER_ADDRESS" : "4211 WATONGA BLVD.", "LCA_CASE_EMPLOYER_CITY" : "HOUSTON", • "LCA_CASE_EMPLOYER_STATE" : "TX", STATUS "LCA_CASE_EMPLOYER_POSTAL_CODE" : 77092, "LCA_CASE_SOC_CODE" : "25-2022.00", "LCA_CASE_SOC_NAME" : "Middle School Teachers, Except Special and Vocatio", "LCA_CASE_JOB_TITLE" : "MIDDLE SCHOOL TEACHER/IB COORDINATOR", "LCA_CASE_WAGE_RATE_FROM" : 51577.63, • "LCA_CASE_WAGE_RATE_UNIT" : "Year", LCA_CASE_SUMBIT / Decision_Date "FULL_TIME_POS" : "Y", "TOTAL_WORKERS" : 1, "LCA_CASE_WORKLOC1_CITY" : "HOUSTON", "LCA_CASE_WORKLOC1_STATE" : "TX", • "PW_1" : 47827, LCA_CASE_WAGE_RATE_FROM "PW_UNIT_1" : "Year", "PW_SOURCE_1" : "OES", "OTHER_WAGE_SOURCE_1" : "OFLC ONLINE DATA CENTER", "YR_SOURCE_PUB_1" : 2010, "LCA_CASE_NAICS_CODE" : 611110, "Decision_Date" : "7/20/2010 0:00:00r" }
  • 9. What we can do with the data? • Work out the; • Applications per state • Applications by status per state • Average time from submission to decision, by status
  • 10. Applications by State • Key will be LCA_CASE_EMPLOYER_STATE • Assume (wrongly) one person per document
  • 11. Map • this is equal to the current document m = function () { emit(this.LCA_CASE_EMPLOYER_STATE, 1); • emit a value of 1; as we are assuming a single H1B app per document }
  • 12. Reduce • Return a value; the length of the array r = function (k, v_arr) { return v_arr.length • This works as each value in the array is 1 }
  • 13. Executing • This will execute the map/reduce db.text2010.mapReduce(m,r, {out: 'workers_by_state', • Output goes to a collection named keeptemp:true, verbose:true}) workers_by_state
  • 15. A more complex Map! m = function () { • The last example assumed one worker per state...which is wrong. emit(this.LCA_CASE_EMPLOYER_STATE, this.TOTAL_WORKERS); • We now emit a numeric value per state }
  • 16. Reduce r = function (k, v_arr) { var total = 0; var len = v_arr.length; • As the array now contains values other for (var i=0, i<len, i++) than 1, we have to iterate over it { total = total + v_arr[i]; • This is standard Javascript } return total; }
  • 17. VISA Class by Application Status by Average wage m = function () { var k = this.VISA_CLASS + ' ' + this.STATUS; switch (this.LCA_CASE_WAGE_RATE_UNIT) { • case 'Year': Assumptions: emit(k, this.LCA_CASE_WAGE_RATE_FROM); break; case 'Month': • People work ~40 hour weeks emit(k, this.LCA_CASE_WAGE_RATE_FROM * 12); break; case 'Bi-Weekly': • emit(k, this.LCA_CASE_WAGE_RATE_FROM * 26); Weekly wages are paid every week break; rather than only the weeks worked case 'Week': emit(k, this.LCA_CASE_WAGE_RATE_FROM * 52); break; • 'Select Pay Range' seems to the the case 'Hour': emit(k, this.LCA_CASE_WAGE_RATE_FROM * 40 * 52); default option... break; default: emit(k, 0); } }
  • 18. Reduce r = function (k, v_arr) { var tot = 0; var len = v_arr.length; • Work out the average for each key for (var i = 0; i < len; i++) { • Add each of the elements up tot += v_arr[i]; } • Average them return tot / len; }
  • 19. Finalize • A finalize function may be run after reduction. • Called a single time per object • The finalize function takes a key and a value, and returns a finalized value.
  • 20. Options • Persist the output • Filtering input documents • Sorting input documents • Javascript scope - allows you to pass in extra variables (cannot be changed at runtime?)
  • 21. Current limitations / Watch for • Single threaded per node (which sucks) https://jira.mongodb.org/browse/SERVER-463 • Language is restricted to Javascript (which sucks) https://jira.mongodb.org/browse/SERVER-699) • Does not use secondaries in replica sets • From 1.7.3 on, you can reduce into existing collection
  • 22. ... • Doesn't allow creation of full documents (which can be a pain for perm MR collections if using libraries) https://jira.mongodb.org/browse/SERVER-2517 • Slow; ~x20-30 slower than Hadoop with 1.8 https://jira.mongodb.org/browse/SERVER-3055
  • 23. Using MongoDB with Hadoop • https://github.com/mongodb/mongo-hadoop • Open source • Requires knowledge of Java • Working Input and Output adapters for MongoDB are provided • Alpha quality from what I can tell
  • 25. 1.9 / 2.0 • V8 is replacing SpiderMonkey • Recent Hadoop provider • Sharded output collections • Improved yielding (concurrency)
  • 26. > 2.0 • Multi-threaded • Alternative languages https://jira.mongodb.org/browse/SERVER-699 • ~2.2 native aggregation framework • Js only mode is faster for lighter jobs https://jira.mongodb.org/browse/SERVER-2976
  • 27. Further reading • I’ve only brushed on the details, but this should be enough to get you interested / started with MongoDB Map Reduce. Some of the missing stuff; • Finalize functions - http://bit.ly/gEfKOr • Some more examples - http://bit.ly/ig1Yfj

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n