SlideShare una empresa de Scribd logo
1 de 33
MongoDB’s New Aggregation
      Framework

         Tyler Brock
2.1 available now

      (unstable)
Map Reduce

          Map/Reduce is a big hammer


• Used to perform complex analytics tasks on
  massive amounts of data

• Users are currently using it for aggregation…
  • totaling, averaging, etc
Problem
   • It should be easier to do simple aggregations
   • Shouldn’t need to write JavaScript
   • Avoid the overhead of JavaScript engine
New Aggregation Framework
 • Declarative
   • No JavaScript required
 • C++ implementation
   • Higher performance than JavaScript
 • Expression evaluation
   • Return computed values
 • Framework: we can add new operations easily
Pipeline
  • Series of operations
  • Members of a collection are passed through a
   pipeline to produce a result
The Aggregation Command
• Takes two arguments
 • Aggregate -- name of collection
 • Pipeline -- array of pipeline operators
      db.runCommand(
        {
          aggregate : "article",
          pipeline : [ {$op1, $op2, ...} ]
        }
      );
Aggregation helper

      db.article.aggregate(
         { $pipeline_op1 },
         { $pipeline_op2 },
         { $pipeline_op3 },
         { $pipeline_op4 },
         ...
      );
Pipeline Operators

    Old Faves        New Hotness
     •   $match       • $project
     •   $sort        • $unwind
     •   $limit       • $group
     •   $skip
$match
  • Uses a query predicate (like .find({…})) as a filter

           {
                title : "this is my title" ,
                author : "bob" ,
                posted : new Date(1079895594000) ,
                pageViews : 5 ,
                tags : [ "fun" , "good" , "fun" ] ,

           }


                                  { $match :
{ $match : { author : "bob" } }       { pgv : { $gt : 50, $lte : 90 } }
                                  }
$sort
• Sorts input documents
• Requires sort key -- specified like index keys



          { $sort : { name : 1, age: -1 } }
$limit
• Limits the number of JSON documents

               { $limit : 5 }


$skip
• Skips a number of JSON documents


               { $skip : 5 }
$project
• Project can reshape a document
  • add, remove, rename, move
• Similar to .find()’s field selection syntax
  • But much more powerful
• Can generate computed values
$project (include and exclude fields)
{ $project : {
   title : 1 ,         /* include this field, if it exists */
   author : 1 ,        /* include this field, if it exists */
   "comments.author" : 1
   }
}

{ $project : {
   title : 0 ,          /* exclude this field */
   author : 0 ,         /* exclude this field */
   }
}
$project (computed fields)


{ $project : {
      title : 1, /* include this field if it exists */
      doctoredPageViews : { $add: ["$pageViews", 10] }
   }
}
Computed Fields
• Prefix expression language
  • Add two fields
    • $add:[“$field1”, “$field2”]
  • Provide a value for a missing field
    • $ifnull:[“$field1”, “$field2”]
  • Nesting
    • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]
  • Date field extraction
    • Get year, month, day, hour, etc, from Date
  • Date arithmetic
$project (rename and pull fields up)


{ $project : {
      title : 1 ,
      page_views : "$pageViews" , /* rename this field */
      upgrade : "$other.foo"     /* move to top level */
   }
}
$project (push fields down)

{ $project : {
     title : 1 ,
     stats : {
          pv : "$pageViews",   /* rename this from the top-level */
     }
   }
}
$unwind
• Produces document for each value in an array
 where the array value is single array element
    {
          title : "this is my title" ,
          author : "bob" ,
          posted : new Date(1079895594000) ,
          pageViews : 5 ,
          tags : [ "fun" , "good" , "awesome" ] ,
          comments : [
              { author :"joe" , text : "this is cool" } ,
              { author :"sam" , text : "this is bad" }
        ],
          other : { foo : 5 }
    }
{
     ...
     tags : "fun"
     ...
},
{
     ...
     tags : "good"
     ...
}
{
     ...
     tags : "awesome"
     ...
}
$unwind

   db.article.aggregate(
      { $project : {
          author : 1 , /* include this field */
          title : 1 , /* include this field */
          tags : 1 /* include this field */
      }},
      { $unwind : "$tags" }
   );
{
    "result" : [
         {
                   "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                   "title" : "this is my title",
                   "author" : "bob",
                   "tags" : "fun"
          },
          {
                   "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                   "title" : "this is my title",
                   "author" : "bob",
                   "tags" : "good"
          },
          {
                   "_id" : ObjectId("4e6e4ef557b77501a49233f6"),
                   "title" : "this is my title",
                   "author" : "bob",
                   "tags" : "fun"
          }
    ],
    "ok" : 1
}
Grouping
• $group aggregation expressions
  • Total of column values: $sum
  • Average of column values: $avg
  • Collect column values in an array: $push

      { $group : {
           _id: "$author",
           fieldname: { $aggfunc: “$field” }
         }
      }
$group example

 db.article.aggregate(
    { $group : {
        _id : "$author",
     
 viewsPerAuthor : { $sum : "$pageViews" }
     }}
 );
{

   "result" : [

   
         {

   
         
    "_id" : "jane",

   
         
    "viewsPerAuthor" : 6

   
         },

   
         {

   
         
    "_id" : "dave",

   
         
    "viewsPerAuthor" : 7

   
         },

   
         {

   
         
    "_id" : "bob",

   
         
    "viewsPerAuthor" : 5

   
         }

   ],

   "ok" : 1
}
Group Aggregation Functions

    $min           $addToSet
    $avg           $first
    $push          $last
    $sum           $max
Pulling it all together


                                         {
{
                                              tag : “fun”
    title : "this is my title" ,
                                              authors: [ ..., ..., ... ]
    author : "bob" ,
                                         },
    posted : new Date(1079895594000) ,
                                         {
    pageViews : 5 ,
                                              tag: “good”
    tags : [ "fun" , "good" , "fun" ]
                                              authors: [ ..., ..., ... ]
}
                                         }
db.article.aggregate(
   { $project : {
       author : 1,
       tags : 1,
   }},
   { $unwind : "$tags" },
   { $group : {
       _id : “$tags”,
       authors : { $addToSet : "$author" }
   }}
);
"result" : [

   
      {

   
      
       "_id" : { "tags" : "cool" },

   
      
       "authors" : [ "jane","dave" ]

   
      },

   
      {

   
      
       "_id" : { "tags" : "fun" },

   
      
       "authors" : [ "dave", "bob" ]

   
      },

   
      {

   
      
       "_id" : { "tags" : "good" },

   
      
       "authors" : [ "bob" ]

   
      },

   
      {

   
      
       "_id" : { "tags" : "awful" },

   
      
       "authors" : [ "jane" ]

   
      }

   ]
Usage Tips
• Use $match in a pipeline as early as possible
  • The query optimizer can then be used to choose
   an index and avoid scanning the entire collection
Driver Support
• Initial version is a command
  • For any language, build a JSON database object,
   and execute the command
    • { aggregate : <collection>, pipeline : [ ] }
  • Beware of command result size limit
Sharding support
• Initial release will support sharding
• Mongos analyzes pipeline, and forwards
 operations up to first $group or $sort to
 shards; combines shard server results and
 continues
Common SQL
• Distinct
  • aggregate({ $group: { _id: "$author" }})
• Count
  • aggregate({ $group: {_id:null, count: {$sum:1}}}])
• Sum
  • aggregate({ $group: {_id:null, total: {$sum:
    "$price"}}})

Más contenido relacionado

La actualidad más candente

Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
MongoDB
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
MongoDB
 

La actualidad más candente (20)

MongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced AggregationMongoDB World 2016 : Advanced Aggregation
MongoDB World 2016 : Advanced Aggregation
 
Webinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation FrameworkWebinar: Exploring the Aggregation Framework
Webinar: Exploring the Aggregation Framework
 
Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB Data Processing and Aggregation with MongoDB
Data Processing and Aggregation with MongoDB
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
MongoDB - Aggregation Pipeline
MongoDB - Aggregation PipelineMongoDB - Aggregation Pipeline
MongoDB - Aggregation Pipeline
 
Webinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation OptionsWebinar: Data Processing and Aggregation Options
Webinar: Data Processing and Aggregation Options
 
Mongodb Aggregation Pipeline
Mongodb Aggregation PipelineMongodb Aggregation Pipeline
Mongodb Aggregation Pipeline
 
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and V...
 
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)""Powerful Analysis with the Aggregation Pipeline (Tutorial)"
"Powerful Analysis with the Aggregation Pipeline (Tutorial)"
 
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation PipelinesMongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework Beyond the Basics 2: Aggregation Framework
Beyond the Basics 2: Aggregation Framework
 
Indexing
IndexingIndexing
Indexing
 
2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB2011 Mongo FR - Indexing in MongoDB
2011 Mongo FR - Indexing in MongoDB
 
ETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDBETL for Pros: Getting Data Into MongoDB
ETL for Pros: Getting Data Into MongoDB
 
Latinoware
LatinowareLatinoware
Latinoware
 
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDBMongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
MongoDB .local Paris 2020: La puissance du Pipeline d'Agrégation de MongoDB
 
Doing More with MongoDB Aggregation
Doing More with MongoDB AggregationDoing More with MongoDB Aggregation
Doing More with MongoDB Aggregation
 
Webinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDBWebinar: Working with Graph Data in MongoDB
Webinar: Working with Graph Data in MongoDB
 

Destacado

MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game Data
Valeri Karpov
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
Takahiro Inoue
 
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
phpdevby
 

Destacado (17)

MongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game DataMongoDB: Queries and Aggregation Framework with NBA Game Data
MongoDB: Queries and Aggregation Framework with NBA Game Data
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Intro To MongoDB
Intro To MongoDBIntro To MongoDB
Intro To MongoDB
 
Mongo db aggregation guide
Mongo db aggregation guideMongo db aggregation guide
Mongo db aggregation guide
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011Building your first application w/mongoDB MongoSV2011
Building your first application w/mongoDB MongoSV2011
 
MongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduceMongoDB: Replication,Sharding,MapReduce
MongoDB: Replication,Sharding,MapReduce
 
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'tsThe Fine Art of Schema Design in MongoDB: Dos and Don'ts
The Fine Art of Schema Design in MongoDB: Dos and Don'ts
 
Кратко о MongoDB
Кратко о MongoDBКратко о MongoDB
Кратко о MongoDB
 
MongoDB and Schema Design
MongoDB and Schema DesignMongoDB and Schema Design
MongoDB and Schema Design
 
Analytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop ConnectorAnalytics with MongoDB Aggregation Framework and Hadoop Connector
Analytics with MongoDB Aggregation Framework and Hadoop Connector
 
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...MongoDB. Области применения, преимущества и узкие места, тонкости использован...
MongoDB. Области применения, преимущества и узкие места, тонкости использован...
 
Преимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDBПреимущества NoSQL баз данных на примере MongoDB
Преимущества NoSQL баз данных на примере MongoDB
 
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
Выбор NoSQL базы данных для вашего проекта: "Не в свои сани не садись"
 
Transitioning from SQL to MongoDB
Transitioning from SQL to MongoDBTransitioning from SQL to MongoDB
Transitioning from SQL to MongoDB
 
MongoDB's New Aggregation framework
MongoDB's New Aggregation frameworkMongoDB's New Aggregation framework
MongoDB's New Aggregation framework
 
An Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDBAn Introduction to Map/Reduce with MongoDB
An Introduction to Map/Reduce with MongoDB
 

Similar a MongoDB Aggregation Framework

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB
 

Similar a MongoDB Aggregation Framework (20)

10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Powerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation PipelinePowerful Analysis with the Aggregation Pipeline
Powerful Analysis with the Aggregation Pipeline
 
Webinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev TeamsWebinar: General Technical Overview of MongoDB for Dev Teams
Webinar: General Technical Overview of MongoDB for Dev Teams
 
MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2MongoDB Online Conference: Introducing MongoDB 2.2
MongoDB Online Conference: Introducing MongoDB 2.2
 
Schema design
Schema designSchema design
Schema design
 
Full metal mongo
Full metal mongoFull metal mongo
Full metal mongo
 
9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab9b. Document-Oriented Databases lab
9b. Document-Oriented Databases lab
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
MongoDB World 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pipeline Em...
 
Thoughts on MongoDB Analytics
Thoughts on MongoDB AnalyticsThoughts on MongoDB Analytics
Thoughts on MongoDB Analytics
 
Introduction to MongoDB and Hadoop
Introduction to MongoDB and HadoopIntroduction to MongoDB and Hadoop
Introduction to MongoDB and Hadoop
 
MongoDB at GUL
MongoDB at GULMongoDB at GUL
MongoDB at GUL
 
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
MongoDB .local Munich 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pip...
 
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Toronto 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
 
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
MongoDB .local Chicago 2019: Aggregation Pipeline Power++: How MongoDB 4.2 Pi...
 
Ruby sittin' on the Couch
Ruby sittin' on the CouchRuby sittin' on the Couch
Ruby sittin' on the Couch
 
CouchDB @ red dirt ruby conference
CouchDB @ red dirt ruby conferenceCouchDB @ red dirt ruby conference
CouchDB @ red dirt ruby conference
 
Modern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter BootstrapModern Application Foundations: Underscore and Twitter Bootstrap
Modern Application Foundations: Underscore and Twitter Bootstrap
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Building Apps with MongoDB
Building Apps with MongoDBBuilding Apps with MongoDB
Building Apps with MongoDB
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

MongoDB Aggregation Framework

  • 1. MongoDB’s New Aggregation Framework Tyler Brock
  • 2. 2.1 available now (unstable)
  • 3. Map Reduce Map/Reduce is a big hammer • Used to perform complex analytics tasks on massive amounts of data • Users are currently using it for aggregation… • totaling, averaging, etc
  • 4. Problem • It should be easier to do simple aggregations • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine
  • 5. New Aggregation Framework • Declarative • No JavaScript required • C++ implementation • Higher performance than JavaScript • Expression evaluation • Return computed values • Framework: we can add new operations easily
  • 6. Pipeline • Series of operations • Members of a collection are passed through a pipeline to produce a result
  • 7. The Aggregation Command • Takes two arguments • Aggregate -- name of collection • Pipeline -- array of pipeline operators db.runCommand( { aggregate : "article", pipeline : [ {$op1, $op2, ...} ] } );
  • 8. Aggregation helper db.article.aggregate( { $pipeline_op1 }, { $pipeline_op2 }, { $pipeline_op3 }, { $pipeline_op4 }, ... );
  • 9. Pipeline Operators Old Faves New Hotness • $match • $project • $sort • $unwind • $limit • $group • $skip
  • 10. $match • Uses a query predicate (like .find({…})) as a filter { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "fun" ] , } { $match : { $match : { author : "bob" } } { pgv : { $gt : 50, $lte : 90 } } }
  • 11. $sort • Sorts input documents • Requires sort key -- specified like index keys { $sort : { name : 1, age: -1 } }
  • 12. $limit • Limits the number of JSON documents { $limit : 5 } $skip • Skips a number of JSON documents { $skip : 5 }
  • 13. $project • Project can reshape a document • add, remove, rename, move • Similar to .find()’s field selection syntax • But much more powerful • Can generate computed values
  • 14. $project (include and exclude fields) { $project : { title : 1 , /* include this field, if it exists */ author : 1 , /* include this field, if it exists */ "comments.author" : 1 } } { $project : { title : 0 , /* exclude this field */ author : 0 , /* exclude this field */ } }
  • 15. $project (computed fields) { $project : { title : 1, /* include this field if it exists */ doctoredPageViews : { $add: ["$pageViews", 10] } } }
  • 16. Computed Fields • Prefix expression language • Add two fields • $add:[“$field1”, “$field2”] • Provide a value for a missing field • $ifnull:[“$field1”, “$field2”] • Nesting • $add:[“$field1”, $ifnull:[“$field2”, “$field3”]] • Date field extraction • Get year, month, day, hour, etc, from Date • Date arithmetic
  • 17. $project (rename and pull fields up) { $project : { title : 1 , page_views : "$pageViews" , /* rename this field */ upgrade : "$other.foo" /* move to top level */ } }
  • 18. $project (push fields down) { $project : { title : 1 , stats : { pv : "$pageViews", /* rename this from the top-level */ } } }
  • 19. $unwind • Produces document for each value in an array where the array value is single array element { title : "this is my title" , author : "bob" , posted : new Date(1079895594000) , pageViews : 5 , tags : [ "fun" , "good" , "awesome" ] , comments : [ { author :"joe" , text : "this is cool" } , { author :"sam" , text : "this is bad" } ], other : { foo : 5 } }
  • 20. { ... tags : "fun" ... }, { ... tags : "good" ... } { ... tags : "awesome" ... }
  • 21. $unwind db.article.aggregate( { $project : { author : 1 , /* include this field */ title : 1 , /* include this field */ tags : 1 /* include this field */ }}, { $unwind : "$tags" } );
  • 22. { "result" : [ { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "good" }, { "_id" : ObjectId("4e6e4ef557b77501a49233f6"), "title" : "this is my title", "author" : "bob", "tags" : "fun" } ], "ok" : 1 }
  • 23. Grouping • $group aggregation expressions • Total of column values: $sum • Average of column values: $avg • Collect column values in an array: $push { $group : { _id: "$author", fieldname: { $aggfunc: “$field” } } }
  • 24. $group example db.article.aggregate( { $group : { _id : "$author", viewsPerAuthor : { $sum : "$pageViews" } }} );
  • 25. { "result" : [ { "_id" : "jane", "viewsPerAuthor" : 6 }, { "_id" : "dave", "viewsPerAuthor" : 7 }, { "_id" : "bob", "viewsPerAuthor" : 5 } ], "ok" : 1 }
  • 26. Group Aggregation Functions $min $addToSet $avg $first $push $last $sum $max
  • 27. Pulling it all together { { tag : “fun” title : "this is my title" , authors: [ ..., ..., ... ] author : "bob" , }, posted : new Date(1079895594000) , { pageViews : 5 , tag: “good” tags : [ "fun" , "good" , "fun" ] authors: [ ..., ..., ... ] } }
  • 28. db.article.aggregate( { $project : { author : 1, tags : 1, }}, { $unwind : "$tags" }, { $group : { _id : “$tags”, authors : { $addToSet : "$author" } }} );
  • 29. "result" : [ { "_id" : { "tags" : "cool" }, "authors" : [ "jane","dave" ] }, { "_id" : { "tags" : "fun" }, "authors" : [ "dave", "bob" ] }, { "_id" : { "tags" : "good" }, "authors" : [ "bob" ] }, { "_id" : { "tags" : "awful" }, "authors" : [ "jane" ] } ]
  • 30. Usage Tips • Use $match in a pipeline as early as possible • The query optimizer can then be used to choose an index and avoid scanning the entire collection
  • 31. Driver Support • Initial version is a command • For any language, build a JSON database object, and execute the command • { aggregate : <collection>, pipeline : [ ] } • Beware of command result size limit
  • 32. Sharding support • Initial release will support sharding • Mongos analyzes pipeline, and forwards operations up to first $group or $sort to shards; combines shard server results and continues
  • 33. Common SQL • Distinct • aggregate({ $group: { _id: "$author" }}) • Count • aggregate({ $group: {_id:null, count: {$sum:1}}}]) • Sum • aggregate({ $group: {_id:null, total: {$sum: "$price"}}})

Notas del editor

  1. \n
  2. well, why do we need a new aggregation framework\n
  3. but...\n
  4. \n
  5. it works by creating a pipeline\n
  6. the way you create this pipeline is through the aggregation command\n
  7. \n
  8. \n
  9. \n
  10. $match should be placed as early in the aggregation pipeline as possible. This minimizes the number of documents after it, thereby minimizing later processing. Placing a $match at the very beginning of a pipeline will enable it to take advantage of indexes in exactly the same way as a regular query (find()/findOne()).\n
  11. \n
  12. \n
  13. \n
  14. _id is included by default in inclusion mode\nuser can specify _id: 0 but no other fields can be excluded\n\n
  15. respects ordering\n
  16. \n
  17. \n
  18. Doctored page views\nThe BSON specification specifies that field order matters, and is to be preserved. A projection will honor that, and fields will be output in the same order as they are input, regardless of the order of any inclusion or exclusion specifications.\nWhen new computed fields are added via a projection, these always follow all fields from the original source, and will appear in the order they appear in the projection specification.\n
  19. \n
  20. \n
  21. \n
  22. $unwind is most useful when combined with $group or $filter.\nThe effects of an unwind can be undone with the $push $group aggregation function.\nIf the target field does not exist within an input document, the document is passed through unchanged.\nIf the target field within an input document is not an array, an error is generated.\nIf the target field within an input document is an empty array (&quot;[]&quot;), then the document is passed through unchanged.\n
  23. \n
  24. _id can be a dotted field path reference (prefixed with a dollar sign, &apos;$&apos;), a braced document expression containing multiple fields (an order-preserving concatenated key), or a single constant. Using a constant will create a single bucket, and can be used to count documents, or to add all the values for a field in all the documents in a collection.\nIf you need the output to have a different name, it can easily be renamed using a simple $project after the $group.\n
  25. \n
  26. \n
  27. \n
  28. \n
  29. \n
  30. \n
  31. \n
  32. \n
  33. \n