2. What problem are we solving?
• Map/Reduce can be used for aggregation…
• Currently being used for totaling, averaging, etc
• Map/Reduce is a big hammer
• Simpler tasks should be easier
• Shouldn’t need to write JavaScript
• Avoid the overhead of JavaScript engine
• We’re seeing requests for help in handling
complex documents
• Select only matching subdocuments or arrays
3. How will we solve the problem?
• Our new aggregation framework
• Declarative framework
• No JavaScript required
• Describe a chain of operations to apply
• Expression evaluation
• Return computed values
• Framework: we can add new operations easily
• C++ implementation
• Higher performance than JavaScript
4. Aggregation - Pipelines
• Aggregation requests specify a pipeline
• A pipeline is a series of operations
• Conceptually, the members of a collection
are passed through a pipeline to produce a
result
• Similar to a command-line pipe
5. Pipeline Operations
• $match
• Uses a query predicate (like .find({…})) as a filter
• $project
• Uses a sample document to determine the shape
of the result (similar to .find()’s optional argument)
• This can include computed values
• $unwind
• Hands out array elements one at a time
• $group
• Aggregates items into buckets defined by a key
6. Pipeline Operations (continued)
• $sort
• Sort documents
• $limit
• Only allow the specified number of documents to
pass
• $skip
• Skip over the specified number of documents
7. Projections
• $project can reshape results
• Include or exclude fields
• Computed fields
• Arithmetic expressions, including built-in functions
• Pull fields from nested documents to the top
• Push fields from the top down into new virtual
documents
8. Unwinding
• $unwind can “stream” arrays
• Array values are doled out one at time in the
context of their surrounding documents
• Makes it possible to filter out elements before
returning
9. Grouping
• $group aggregation expressions
• Define a grouping key as the _id of the result
• Total grouped column values: $sum
• Average grouped column values: $avg
• Collect grouped column values in an array or set:
$push, $addToSet
• Other functions
• $min, $max, $first, $last
10. Sorting
• $sort can sort documents
• Sort specifications are the same as today, e.g.,
$sort:{ key1: 1, key2: -1, …}
11. Computed Expressions
• Available in $project operations
• Prefix expression language
• Add two fields: $add:[“$field1”, “$field2”]
• Provide a value for a missing field:
$ifNull:[“$field1”, “$field2”]
• Nesting: $add:[“$field1”, $ifNull:[“$field2”,
“$field3”]]
• Other functions….
• And we can easily add more as required
12. Computed Expressions (continued)
• String functions
• toUpper, toLower, substr
• Date field extraction
• Get year, month, day, hour, etc, from ISODate
• Date arithmetic
• Null value substitution (like MySQL ifnull(),
Oracle nvl())
• Ternary conditional
• Return one of two values based on a predicate
14. Usage Tips
• Use $match in a pipeline as early as possible
• The query optimizer can then choose to scan an
index and avoid scanning the entire collection
• Use $sort in a pipeline as early as possible
• The query optimizer can then be used to choose
an index to scan instead of sorting the result
15. Driver Support
• Initial version is a command
• For any language, build a JSON database object,
and execute the command
• In the shell: db.runCommand({ aggregate :
<collection-name>, pipeline : {…} });
• Beware of command result size limit
• Document size limit is 16MB
16. Sharding support
• Initial release will support sharding
• Mongos analyzes pipeline, and forwards
operations up to $group or $sort to shards;
combines shard server results and returns
them
17. When is this being released?
• In final development now
• Adding an explain facility
• Expect to see this in the near future
18. Future Plans
• More optimizations
• $out pipeline operation
• Saves the document stream to a collection
• Similar to M/R $out, but with sharded output
• Functions like a tee, so that intermediate results
can be saved