MongoDB Aggregation Framework

MongoDB’s New Aggregation
Framework

Tyler Brock

2.1 available now

(unstable)

Map Reduce

Map/Reduce is a big hammer

• Used to perform complex analytics tasks on
massive amounts of data

• Users are currently using it for aggregation…
• totaling, averaging, etc

Problem
• It should be easier to do simple aggregations
• Shouldn’t need to write JavaScript
• Avoid the overhead of JavaScript engine

New Aggregation Framework
• Declarative
• No JavaScript required
• C++ implementation
• Higher performance than JavaScript
• Expression evaluation
• Return computed values
• Framework: we can add new operations easily

Pipeline
• Series of operations
• Members of a collection are passed through a
pipeline to produce a result

The Aggregation Command
• Takes two arguments
• Aggregate -- name of collection
• Pipeline -- array of pipeline operators
db.runCommand(
{
aggregate : "article",
pipeline : [ {$op1, $op2, ...} ]
}
);

Aggregation helper

db.article.aggregate(
{ $pipeline_op1 },
{ $pipeline_op2 },
{ $pipeline_op3 },
{ $pipeline_op4 },
...
);

Pipeline Operators

Old Faves New Hotness
• $match • $project
• $sort • $unwind
• $limit • $group
• $skip

$match
• Uses a query predicate (like .find({…})) as a filter

{
title : "this is my title" ,
author : "bob" ,
posted : new Date(1079895594000) ,
pageViews : 5 ,
tags : [ "fun" , "good" , "fun" ] ,

}

{ $match :
{ $match : { author : "bob" } } { pgv : { $gt : 50, $lte : 90 } }
}

$sort
• Sorts input documents
• Requires sort key -- specified like index keys

{ $sort : { name : 1, age: -1 } }

$limit
• Limits the number of JSON documents

{ $limit : 5 }

$skip
• Skips a number of JSON documents

{ $skip : 5 }

$project
• Project can reshape a document
• add, remove, rename, move
• Similar to .find()’s field selection syntax
• But much more powerful
• Can generate computed values

$project (include and exclude fields)
{ $project : {
title : 1 , /* include this field, if it exists */
author : 1 , /* include this field, if it exists */
"comments.author" : 1
}
}

{ $project : {
title : 0 , /* exclude this field */
author : 0 , /* exclude this field */
}
}

$project (computed fields)

{ $project : {
title : 1, /* include this ﬁeld if it exists */
doctoredPageViews : { $add: ["$pageViews", 10] }
}
}

Computed Fields
• Prefix expression language
• Add two fields
• $add:[“$field1”, “$field2”]
• Provide a value for a missing field
• $ifnull:[“$field1”, “$field2”]
• Nesting
• $add:[“$field1”, $ifnull:[“$field2”, “$field3”]]
• Date field extraction
• Get year, month, day, hour, etc, from Date
• Date arithmetic

$project (rename and pull fields up)

{ $project : {
title : 1 ,
page_views : "$pageViews" , /* rename this ﬁeld */
upgrade : "$other.foo" /* move to top level */
}
}

$project (push fields down)

{ $project : {
title : 1 ,
stats : {
pv : "$pageViews", /* rename this from the top-level */
}
}
}

$unwind
• Produces document for each value in an array
where the array value is single array element
{
author : "bob" ,
posted : new Date(1079895594000) ,
pageViews : 5 ,
tags : [ "fun" , "good" , "awesome" ] ,
comments : [
{ author :"joe" , text : "this is cool" } ,
{ author :"sam" , text : "this is bad" }
],
other : { foo : 5 }
}

{
...
tags : "fun"
...
},
{
...
tags : "good"
...
}
{
...
tags : "awesome"
...
}

$unwind

{ $project : {
author : 1 , /* include this field */
title : 1 , /* include this field */
tags : 1 /* include this field */
}},
{ $unwind : "$tags" }
);

{
"result" : [
{
"_id" : ObjectId("4e6e4ef557b77501a49233f6"),
"title" : "this is my title",
"author" : "bob",
"tags" : "fun"
},
{
"author" : "bob",
"tags" : "good"
},
{
"author" : "bob",
"tags" : "fun"
}
],
"ok" : 1
}

Grouping
• $group aggregation expressions
• Total of column values: $sum
• Average of column values: $avg
• Collect column values in an array: $push

{ $group : {
_id: "$author",
ﬁeldname: { $aggfunc: “$ﬁeld” }
}
}

$group example

{ $group : {
_id : "$author",

viewsPerAuthor : { $sum : "$pageViews" }
}}
);

{

"result" : [

{

"_id" : "jane",

"viewsPerAuthor" : 6

},

{

"_id" : "dave",


},

{

"_id" : "bob",


}

],

"ok" : 1
}

Group Aggregation Functions

$min $addToSet
$avg $ﬁrst
$push $last
$sum $max

Pulling it all together

{
{
tag : “fun”
authors: [ ..., ..., ... ]
author : "bob" ,
},
posted : new Date(1079895594000) ,
{
pageViews : 5 ,
tag: “good”
tags : [ "fun" , "good" , "fun" ]
authors: [ ..., ..., ... ]
}
}

{ $project : {
author : 1,
tags : 1,
}},
{ $unwind : "$tags" },
{ $group : {
_id : “$tags”,
authors : { $addToSet : "$author" }
}}
);

"result" : [

{

"_id" : { "tags" : "cool" },

"authors" : [ "jane","dave" ]

},

{

"_id" : { "tags" : "fun" },

"authors" : [ "dave", "bob" ]

},

{

"_id" : { "tags" : "good" },

"authors" : [ "bob" ]

},

{

"_id" : { "tags" : "awful" },

"authors" : [ "jane" ]

}

]

Usage Tips
• Use $match in a pipeline as early as possible
• The query optimizer can then be used to choose
an index and avoid scanning the entire collection

Driver Support
• Initial version is a command
• For any language, build a JSON database object,
and execute the command
• { aggregate : <collection>, pipeline : [ ] }
• Beware of command result size limit

Sharding support
• Initial release will support sharding
• Mongos analyzes pipeline, and forwards
operations up to first $group or $sort to
shards; combines shard server results and
continues

Common SQL
• Distinct
• aggregate({ $group: { _id: "$author" }})
• Count
• aggregate({ $group: {_id:null, count: {$sum:1}}}])
• Sum
• aggregate({ $group: {_id:null, total: {$sum:
"$price"}}})

MongoDB Aggregation Framework

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (17)

Similar a MongoDB Aggregation Framework

Similar a MongoDB Aggregation Framework (20)

Último

Último (20)

MongoDB Aggregation Framework

Notas del editor