Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.


3.022 visualizaciones

Publicado el

Slide deck for my presentation at MongoSF 2012 in May: .

Publicado en: Tecnología, Empresariales
  • Sé el primero en comentar


  1. 1. 1
  2. 2. What problem are we solving?• Map/Reduce can be used for aggregation… • Currently being used for totaling, averaging, etc• Map/Reduce is a big hammer • Simpler tasks should be easier • Shouldn’t need to write JavaScript • Avoid the overhead of JavaScript engine• We’re seeing requests for help in handling complex documents • Select only matching subdocuments or arrays
  3. 3. How will we solve the problem?• Our new aggregation framework • Declarative framework • No JavaScript required • Describe a chain of operations to apply • Expression evaluation • Return computed values • Framework: we can add new operations easily • C++ implementation • Higher performance than JavaScript
  4. 4. Aggregation - Pipelines• Aggregation requests specify a pipeline• A pipeline is a series of operations• Conceptually, the members of a collection are passed through a pipeline to produce a result • Similar to a command-line pipe
  5. 5. Pipeline Operations• $match • Uses a query predicate (like .find({…})) as a filter• $project • Uses a sample document to determine the shape of the result (similar to .find()’s optional argument) • This can include computed values• $unwind • Hands out array elements one at a time• $group • Aggregates items into buckets defined by a key
  6. 6. Pipeline Operations (continued)• $sort • Sort documents• $limit • Only allow the specified number of documents to pass• $skip • Skip over the specified number of documents
  7. 7. Projections• $project can reshape results • Include or exclude fields • Computed fields • Arithmetic expressions, including built-in functions • Pull fields from nested documents to the top • Push fields from the top down into new virtual documents
  8. 8. Unwinding• $unwind can “stream” arrays • Array values are doled out one at time in the context of their surrounding documents • Makes it possible to filter out elements before returning
  9. 9. Grouping• $group aggregation expressions • Define a grouping key as the _id of the result • Total grouped column values: $sum • Average grouped column values: $avg • Collect grouped column values in an array or set: $push, $addToSet • Other functions • $min, $max, $first, $last
  10. 10. Sorting• $sort can sort documents • Sort specifications are the same as today, e.g., $sort:{ key1: 1, key2: -1, …}
  11. 11. Computed Expressions• Available in $project operations• Prefix expression language • Add two fields: $add:[“$field1”, “$field2”] • Provide a value for a missing field: $ifNull:[“$field1”, “$field2”] • Nesting: $add:[“$field1”, $ifNull:[“$field2”, “$field3”]] • Other functions…. • And we can easily add more as required
  12. 12. Computed Expressions (continued)• String functions • toUpper, toLower, substr• Date field extraction • Get year, month, day, hour, etc, from ISODate• Date arithmetic• Null value substitution (like MySQL ifnull(), Oracle nvl())• Ternary conditional • Return one of two values based on a predicate
  13. 13. DemoDemo files are at
  14. 14. Usage Tips• Use $match in a pipeline as early as possible • The query optimizer can then choose to scan an index and avoid scanning the entire collection• Use $sort in a pipeline as early as possible • The query optimizer can then be used to choose an index to scan instead of sorting the result
  15. 15. Driver Support• Initial version is a command • For any language, build a JSON database object, and execute the command • In the shell: db.runCommand({ aggregate : <collection-name>, pipeline : {…} }); • Beware of command result size limit • Document size limit is 16MB
  16. 16. Sharding support• Initial release will support sharding• Mongos analyzes pipeline, and forwards operations up to $group or $sort to shards; combines shard server results and returns them
  17. 17. When is this being released?• In final development now • Adding an explain facility• Expect to see this in the near future
  18. 18. Future Plans• More optimizations• $out pipeline operation • Saves the document stream to a collection • Similar to M/R $out, but with sharded output • Functions like a tee, so that intermediate results can be saved