SlideShare una empresa de Scribd logo
1 de 29
Shafaq Abdullah
Principal, Zenprise
   MongoDB features and Architecture
   Business Intelligence: Overview
   Model of Data using SQL vs NoSQL
   Concept of MapReduce
   Real-world use-case of Business Analytics
   Scalability
   Conclusions
   Key-value
   Column
   Document-based
   Graph
   Schema less data model
   Adhoc Query
   Scalability
   High Availability
   Speed
consistency    Availability




         Partition
        Tolerance
http://www.mongodb.org
Business + smart information =
                Business Intelligence




  Consists of
   querying,
reporting, and
 analytics for
  businesses
                                    Enable
                                  business to
                                  make smart
                                  decision to
                                   execute
   Modelization of Data in SQL
       A 1-many relation of node (id, value) with
    other nodes related by two different relations


      Node                     Relation

       id                      id_node1
       value
                               id_node2
NoSQL Modelization mapped on Relational
Database Modelization



  Node                   Relation
                         c          c
   id                   _id
   value
                        value 1
                        value2
   Using Complex Type Attributes to Model data



                      Nodes

                 _id
                 valued
                 relations []
                                value 1
                                value 2
                                 …
                                value n
   No join operation required

   Nesting of documents, resulting in
    instantaneous access to retrieve nodes/

   Supporting agile method of programming

   Schema flexible adaptive to changing
    business needs
   MapReduce
    ◦ Programming model for managing large amount of
      data in a parallel fashion

    ◦ Map : Processing of a data list to create key/value
      pairs

    ◦ Reduce: Porcess above pair to create new
      aggregated key/value pairs
map(k1, v1) = list(k2,v2)

 reduce(k1, list(v2)) = list(v3)



List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5)

Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5])

Reduce: (a;7), (b;6),(c;5)
   Problem


The task is to find the 2 closest
cities in each country

Given that earth as 2D plane

The distance b/w P1 (x1,y1) and
P2 (x2,y2) is computed as
Square-Root of { (x1-x2)2 + (y1-
y2)2 }
Computations   Groups
create view city_dist as
select c1.CountryID,
       c1.CityId, c1.City,
        c2.CityId as CityId2, c2.City as City2,    dist(c1,c2) as
Dist from cities c1 inner join cities c2
where c1.CountryID = c2.CountryID
and c1.CityId < c2.CityId
       /* Calculate distance between 2 cities only once */
select city_dist.* from
( select CountryID, min(Dist) as MinDist from
city_dist where Dist > 0 /* Avoid cities which
share Latitude & Longitude */ group by
CountryID ) a inner join city_dist on
a.CountryID = city_dist.CountryID and
a.MinDist = city_dist.Dist;
Map   Reduce   Finalize
function MapCode() {
emit(this.CountryID, {
     "data":
     [ { "city": this.City, "lat": this.Latitude, "lon":
   this.Longitude } ]
});
 }
function ReduceCode(key, values) {
 var reduced = {"data":[]};
 for (var i in values) {
      var inter = values[i];
             for (var j in inter.data) {

     reduced.data.push(inter.data[j]);
     }
 } return reduced;
function Finalize(key, reduced) {
   if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" };
   }
      var min_dist = 999999999999;
   var city1 = { "name": "" };
   var city2 = { "name": "" };
      var c1; var c2; var d;
   for (var i in reduced.data) {
           for (var j in reduced.data) {
                         if (i>=j) continue;
                         c1 = reduced.data[i];
                         c2 = reduced.data[j];
                         d = Dist(c1,c2)
                         if (d < min_dist && d > 0) {
                                      min_dist = d; city1 = c1; city2 = c2;
                         }
           }
      }
   return {"city1": city1.name, "city2": city2.name, "dist": min_dist};
 }
   Shard!
   For write intensive, increase number of
    shards

   For read intensive, increase number of
    replica-sets within shards

   Best Read performance : Data in Shard
    breadth in memory
   MongoDB has application well-suited for BI

   Schemaless Model allows high-performance

   Replication with Sharding allows Scaling out
   http://www.mongodb.org/
   http://www.jaspersoft.com/
   R. Cattell. Scalable SQL and NoSQL Data
    Stores.http://www.cattell.net/datastores/Dat
    astores.pdf
   C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R.
    Bradski, A. Y.Ng, and K. Olukotun. Map-
    reduce for machine learning on multicore. In
    NIPS, pages 281–288, 2006.
   http://www.mongovue.com/2010/11/03/yet
    -another-mongodb-map-reduce-tutorial/

Más contenido relacionado

Similar a MongoDB MapReduce Business Intelligence

Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingcoolmirza143
 
Map reduce
Map reduceMap reduce
Map reducexydii
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilledb0ris_1
 
Understanding Connected Data through Visualization
Understanding Connected Data through VisualizationUnderstanding Connected Data through Visualization
Understanding Connected Data through VisualizationSebastian Müller
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Sameer Farooqui
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to sparkJavier Arrieta
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkDatabricks
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAprithan
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Updatevithakur
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...Adrian Florea
 
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesUMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesGwendal Daniel
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live HackingTobias Trelle
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”inventionjournals
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Spark Summit
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET Journal
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung MosbachJohannes Hoppe
 

Similar a MongoDB MapReduce Business Intelligence (20)

Map reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreadingMap reduceoriginalpaper mandatoryreading
Map reduceoriginalpaper mandatoryreading
 
Map reduce
Map reduceMap reduce
Map reduce
 
MongoDB Distilled
MongoDB DistilledMongoDB Distilled
MongoDB Distilled
 
Understanding Connected Data through Visualization
Understanding Connected Data through VisualizationUnderstanding Connected Data through Visualization
Understanding Connected Data through Visualization
 
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015 Spark & Cassandra at DataStax Meetup on Jan 29, 2015
Spark & Cassandra at DataStax Meetup on Jan 29, 2015
 
Scala meetup - Intro to spark
Scala meetup - Intro to sparkScala meetup - Intro to spark
Scala meetup - Intro to spark
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Tuning and Debugging in Apache Spark
Tuning and Debugging in Apache SparkTuning and Debugging in Apache Spark
Tuning and Debugging in Apache Spark
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Boston Spark Meetup event Slides Update
Boston Spark Meetup event Slides UpdateBoston Spark Meetup event Slides Update
Boston Spark Meetup event Slides Update
 
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ..."MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
"MapReduce: Simplified Data Processing on Large Clusters" Paper Presentation ...
 
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph DatabasesUMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
UMLtoGraphDB: Mapping Conceptual Schemas to Graph Databases
 
Big data
Big dataBig data
Big data
 
MongoDB Live Hacking
MongoDB Live HackingMongoDB Live Hacking
MongoDB Live Hacking
 
“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”“ Implimentation of SD Processor Based On CRDC Algorithm ”
“ Implimentation of SD Processor Based On CRDC Algorithm ”
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel ComputingIRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
IRJET- Performance Analysis of RSA Algorithm with CUDA Parallel Computing
 
2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach2015 02-09 - NoSQL Vorlesung Mosbach
2015 02-09 - NoSQL Vorlesung Mosbach
 

Último

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Último (20)

"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

MongoDB MapReduce Business Intelligence

  • 2. MongoDB features and Architecture  Business Intelligence: Overview  Model of Data using SQL vs NoSQL  Concept of MapReduce  Real-world use-case of Business Analytics  Scalability  Conclusions
  • 3. Key-value  Column  Document-based  Graph
  • 4. Schema less data model  Adhoc Query  Scalability  High Availability  Speed
  • 5. consistency Availability Partition Tolerance
  • 7. Business + smart information = Business Intelligence Consists of querying, reporting, and analytics for businesses Enable business to make smart decision to execute
  • 8. Modelization of Data in SQL A 1-many relation of node (id, value) with other nodes related by two different relations Node Relation id id_node1 value id_node2
  • 9. NoSQL Modelization mapped on Relational Database Modelization Node Relation c c id _id value value 1 value2
  • 10. Using Complex Type Attributes to Model data Nodes _id valued relations [] value 1 value 2 … value n
  • 11. No join operation required  Nesting of documents, resulting in instantaneous access to retrieve nodes/  Supporting agile method of programming  Schema flexible adaptive to changing business needs
  • 12. MapReduce ◦ Programming model for managing large amount of data in a parallel fashion ◦ Map : Processing of a data list to create key/value pairs ◦ Reduce: Porcess above pair to create new aggregated key/value pairs
  • 13. map(k1, v1) = list(k2,v2) reduce(k1, list(v2)) = list(v3) List : (a; 2) (a; 4)(b; 4)(b; 2)(a;1)(c;5) Map: (a;[2, 4, 1]), (b;[4,2]), (c,[5]) Reduce: (a;7), (b;6),(c;5)
  • 14.
  • 15. Problem The task is to find the 2 closest cities in each country Given that earth as 2D plane The distance b/w P1 (x1,y1) and P2 (x2,y2) is computed as Square-Root of { (x1-x2)2 + (y1- y2)2 }
  • 16.
  • 17. Computations Groups
  • 18. create view city_dist as select c1.CountryID, c1.CityId, c1.City, c2.CityId as CityId2, c2.City as City2, dist(c1,c2) as Dist from cities c1 inner join cities c2 where c1.CountryID = c2.CountryID and c1.CityId < c2.CityId /* Calculate distance between 2 cities only once */
  • 19. select city_dist.* from ( select CountryID, min(Dist) as MinDist from city_dist where Dist > 0 /* Avoid cities which share Latitude & Longitude */ group by CountryID ) a inner join city_dist on a.CountryID = city_dist.CountryID and a.MinDist = city_dist.Dist;
  • 20.
  • 21. Map Reduce Finalize
  • 22. function MapCode() { emit(this.CountryID, { "data": [ { "city": this.City, "lat": this.Latitude, "lon": this.Longitude } ] }); }
  • 23.
  • 24. function ReduceCode(key, values) { var reduced = {"data":[]}; for (var i in values) { var inter = values[i]; for (var j in inter.data) { reduced.data.push(inter.data[j]); } } return reduced;
  • 25. function Finalize(key, reduced) { if (reduced.data.length == 1) { return { "message" : "This Country contains only 1 City" }; } var min_dist = 999999999999; var city1 = { "name": "" }; var city2 = { "name": "" }; var c1; var c2; var d; for (var i in reduced.data) { for (var j in reduced.data) { if (i>=j) continue; c1 = reduced.data[i]; c2 = reduced.data[j]; d = Dist(c1,c2) if (d < min_dist && d > 0) { min_dist = d; city1 = c1; city2 = c2; } } } return {"city1": city1.name, "city2": city2.name, "dist": min_dist}; }
  • 26. Shard!  For write intensive, increase number of shards  For read intensive, increase number of replica-sets within shards  Best Read performance : Data in Shard breadth in memory
  • 27.
  • 28. MongoDB has application well-suited for BI  Schemaless Model allows high-performance  Replication with Sharding allows Scaling out
  • 29. http://www.mongodb.org/  http://www.jaspersoft.com/  R. Cattell. Scalable SQL and NoSQL Data Stores.http://www.cattell.net/datastores/Dat astores.pdf  C.-T. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. R. Bradski, A. Y.Ng, and K. Olukotun. Map- reduce for machine learning on multicore. In NIPS, pages 281–288, 2006.  http://www.mongovue.com/2010/11/03/yet -another-mongodb-map-reduce-tutorial/