Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Introduction to MongoDB and Hadoop

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 55 Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

A los espectadores también les gustó (20)

Anuncio

Similares a Introduction to MongoDB and Hadoop (20)

Más de Steven Francia (20)

Anuncio

Más reciente (20)

Introduction to MongoDB and Hadoop

  1. 1. #MongoDB Introduction to MongoDB & MongoDB + Hadoop Steve Francia Chief Evangelist, 10gen
  2. 2. What is MongoDB
  3. 3. MongoDB is a ___________ database • Document • Open source • High performance • Horizontally scalable • Full featured
  4. 4. Document Database • Not for .PDF & .DOC files • A document is essentially an associative array • Document == JSON object • Document == PHP Array • Document == Python Dict • Document == Ruby Hash • etc
  5. 5. Open Source • MongoDB is an open source project • On GitHub • Licensed under the AGPL • Started & sponsored by 10gen • Commercial licenses available • Contributions welcome
  6. 6. High Performance • Written in C++ • Extensive use of memory-mapped files i.e. read-through write-through memory caching. • Runs nearly everywhere • Data serialized as BSON (fast parsing) • Full support for primary & secondary indexes • Document model = less work
  7. 7. Horizontally Scalable
  8. 8. Full Featured • Ad Hoc queries • Real time aggregation • Rich query capabilities • Traditionally consistent • Geospatial features • Support for most programming languages • Flexible schema
  9. 9. Database Landscape
  10. 10. http://www.mongodb.org/download s
  11. 11. Mongo Shell
  12. 12. Document Database
  13. 13. RDBMS MongoDB Table, View ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference Partition ➜ Shard Terminology
  14. 14. Typical (relational) ERD
  15. 15. MongoDB ERD
  16. 16. Working with MongoDB
  17. 17. Creating an author > db.author.insert({ first_name: 'j.r.r.', last_name: 'tolkien', bio: 'J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own.' })
  18. 18. Querying for our author > db.author.findOne( { last_name : 'tolkien' } ) { "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own." }
  19. 19. Creating a Book > db.books.insert({ title: 'fellowship of the ring, the', author: ObjectId("507ffbb1d94ccab2da652597"), language: 'english', genre: ['fantasy', 'adventure'], publication: { name: 'george allen & unwin', location: 'London', date: new Date('21 July 1954'), } }) http://society6.com/PastaSoup/The-Fellowship-of-the-Ring-ZZc_Print/
  20. 20. Multiple values per key > db.books.findOne({language: 'english'}, {genre: 1}) { "_id" : ObjectId("50804391d94ccab2da652598"), "genre" : [ "fantasy", "adventure" ] }
  21. 21. Querying for key with multiple values > db.books.findOne({genre: 'fantasy'}, {title: 1}) { "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the" } Query key with single value or multiple values the same way.
  22. 22. Nested Values > db.books.findOne({}, {publication: 1}) { "_id" : ObjectId("50804ec7d94ccab2da65259a"), "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") } }
  23. 23. Reach into nested values using dot notation > db.books.findOne( {'publication.date' : { $lt : new Date('21 June 1960')} } ) { "_id" : ObjectId("50804391d94ccab2da652598"), "title" : "fellowship of the ring, the", "author" : ObjectId("507ffbb1d94ccab2da652597"), "language" : "english", "genre" : [ "fantasy", "adventure" ], "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") } }
  24. 24. Update books > db.books.update( {"_id" : ObjectId("50804391d94ccab2da652598")}, { $set : { isbn: '0547928211', pages: 432 } }) True agile development . Simply change how you work with the data and the database follows
  25. 25. The Updated Book record db.books.findOne() { "_id" : ObjectId("50804ec7d94ccab2da65259a"), "author" : ObjectId("507ffbb1d94ccab2da652597"), "genre" : [ "fantasy", "adventure" ], "isbn" : "0395082544", "language" : "english", "pages" : 432, "publication" : { "name" : "george allen & unwin", "location" : "London", "date" : ISODate("1954-07-21T04:00:00Z") }, "title" : "fellowship of the ring, the" }
  26. 26. Creating indexes > db.books.ensureIndex({title: 1}) > db.books.ensureIndex({genre : 1}) > db.books.ensureIndex({'publication.date': -1})
  27. 27. Finding author by book > book = db.books.findOne( {"title" : "return of the king, the"}) > db.author.findOne({_id: book.author}) { "_id" : ObjectId("507ffbb1d94ccab2da652597"), "first_name" : "j.r.r.", "last_name" : "tolkien", "bio" : "J.R.R. Tolkien (1892.1973), beloved throughout the world as the creator of The Hobbit and The Lord of the Rings, was a professor of Anglo-Saxon at Oxford, a fellow of Pembroke College, and a fellow of Merton College until his retirement in 1959. His chief interest was the linguistic aspects of the early English written tradition, but even as he studied these classics he was creating a set of his own." }
  28. 28. The Big Data Story
  29. 29. Is actually two stories
  30. 30. Doers & Tellers talking about different things http://www.slideshare.net/siliconangle/trendconnect-big-data-report-september
  31. 31. Tellers
  32. 32. Doers
  33. 33. Doers talk a lot more about actual solutions
  34. 34. They know it's a two sided story Storage Processing
  35. 35. Take aways • MongoDB and Hadoop • MongoDB for storage & operations • Hadoop for processing & analytics
  36. 36. MongoDB & Data Processing
  37. 37. Applications have complex needs • MongoDB ideal operational database • MongoDB ideal for BIG data • Not a data processing engine, but provides processing functionality
  38. 38. Many options for Processing Data • Process in MongoDB using Map Reduce • Process in MongoDB using Aggregation Framework • Process outside MongoDB (using Hadoop)
  39. 39. MongoDB Map Reduce
  40. 40. MongoDB Map Reduce • MongoDB map reduce quite capable... but with limits • - Javascript not best language for processing map reduce • - Javascript limited in external data processing libraries • - Adds load to data store
  41. 41. MongoDB Aggregation • Most uses of MongoDB Map Reduce were for aggregation • Aggregation Framework optimized for aggregate queries • Realtime aggregation similar to SQL GroupBy
  42. 42. MongoDB & Hadoop
  43. 43. DEMO • Install Hadoop MongoDB Plugin • Import tweets from twitter • Write mapper • Write reducer • Call myself a data scientist
  44. 44. Installing Mongo- hadoop https://gist.github.com/1887726 hadoop_version '0.23' hadoop_path="/usr/local/Cellar/hadoop/$hadoop_version.0/libexec/lib" git clone git://github.com/mongodb/mongo-hadoop.git cd mongo-hadoop sed -i '' "s/default/$hadoop_version/g" build.sbt cd streaming ./build.sh
  45. 45. Groking Twitter curl https://stream.twitter.com/1/statuses/sample.json -u<login>:<password> | mongoimport -d test -c live ... let it run for about 2 hours
  46. 46. DEMO 1
  47. 47. Map Hashtags in Java public class TwitterMapper extends Mapper<Object, BSONObject, Text, IntWritable> { @Override public void map( final Object pKey, final BSONObject pValue, final Context pContext ) throws IOException, InterruptedException{ BSONObject entities = (BSONObject)pValue.get("entities"); if(entities == null) return; BasicBSONList hashtags = (BasicBSONList)entities.get("hashtags"); if(hashtags == null) return; for(Object o : hashtags){ String tag = (String)((BSONObject)o).get("text"); pContext.write( new Text( tag ), new IntWritable( 1 ) ); } } }
  48. 48. Reduce hashtags in Java public class TwitterReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce( final Text pKey, final Iterable<IntWritable> pValues, final Context pContext ) throws IOException, InterruptedException{ int count = 0; for ( final IntWritable value : pValues ){ count += value.get(); } pContext.write( pKey, new IntWritable( count ) ); } }
  49. 49. All together #!/bin/sh export HADOOP_HOME="/Users/mike/hadoop/hadoop-1.0.4" declare -a job_args cd .. job_args=("jar" "examples/twitter/target/twitter-example_*.jar") job_args=(${job_args[@]} "com.mongodb.hadoop.examples.twitter.TwitterConfig ") job_args=(${job_args[@]} "-D" "mongo.job.verbose=true") job_args=(${job_args[@]} "-D" "mongo.job.background=false") job_args=(${job_args[@]} "-D" "mongo.input.key=") job_args=(${job_args[@]} "-D" "mongo.input.uri=mongodb://localhost:27017/test.live") job_args=(${job_args[@]} "-D" "mongo.output.uri=mongodb://localhost:27017/test.twit_hashtags") job_args=(${job_args[@]} "-D" "mongo.input.query=") job_args=(${job_args[@]} "-D" "mongo.job.mapper=com.mongodb.hadoop.examples.twitter.TwitterMapper") job_args=(${job_args[@]} "-D" "mongo.job.reducer=com.mongodb.hadoop.examples.twitter.TwitterReducer") job_args=(${job_args[@]} "-D" "mongo.job.input.format=com.mongodb.hadoop.MongoInputFormat") job_args=(${job_args[@]} "-D" "mongo.job.output.format=com.mongodb.hadoop.MongoOutputFormat") job_args=(${job_args[@]} "-D" "mongo.job.output.key=org.apache.hadoop.io.Text") job_args=(${job_args[@]} "-D" "mongo.job.output.value=org.apache.hadoop.io.IntWritable") job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.key=org.apache.hadoop.io.Text") job_args=(${job_args[@]} "-D" "mongo.job.mapper.output.value=org.apache.hadoop.io.IntWritable") job_args=(${job_args[@]} "-D" "mongo.job.combiner=com.mongodb.hadoop.examples.twitter.TwitterReducer") job_args=(${job_args[@]} "-D" "mongo.job.partitioner=") job_args=(${job_args[@]} "-D" "mongo.job.sort_comparator=") #echo "${job_args[@]}" $HADOOP_HOME/bin/hadoop "${job_args[@]}" "$1"
  50. 50. Popular Hash Tags db.twit_hashtags.find().sort( {'count' : -1 }) { "_id" : "YouKnowYoureInLoveIf", "count" : 287 } { "_id" : "teamfollowback", "count" : 200 } { "_id" : "RT", "count" : 150 } { "_id" : "Arsenal", "count" : 148 } { "_id" : "milars", "count" : 145 } { "_id" : "sanremo", "count" : 145 } { "_id" : "LoseMyNumberIf", "count" : 139 } { "_id" : "RelationshipsShould", "count" : 137 } { "_id" : "oomf", "count" : 117 } { "_id" : "TeamFollowBack", "count" : 105 } { "_id" : "WhyDoPeopleThink", "count" : 102 } { "_id" : "np", "count" : 100 }
  51. 51. DEMO 2
  52. 52. Aggregation in Mongo 2.2 db.live.aggregate( { $unwind : "$entities.hashtags" } , { $match : { "entities.hashtags.text" : { $exists : true } } } , { $group : { _id : "$entities.hashtags.text", count : { $sum : 1 } } } , { $sort : { count : -1 } }, { $limit : 10 } )
  53. 53. Popular Hash Tags db.twit_hashtags.aggregate(a){ "result" : [ { "_id" : "YouKnowYoureInLoveIf", "count" : 287 }, { "_id" : "teamfollowback", "count" : 200 }, { "_id" : "RT", "count" : 150 }, { "_id" : "Arsenal", "count" : 148 }, { "_id" : "milars", "count" : 145 }, { "_id" : "sanremo","count" : 145 }, { "_id" : "LoseMyNumberIf", "count" : 139 }, { "_id" : "RelationshipsShould", "count" : 137 }, ],"ok" : 1 }
  54. 54. #MongoDB Questions? Steve Francia Chief Evangelist, 10gen @spf13 Spf13.com

Notas del editor

  • AGPL – GNU Affero General Public License
  • * Big endian and ARM not supported.
  • Kristine to update this graphic at some point
  • Kristine to update this graphic at some point
  • Kristine to update this graphic at some point
  • Powerful message here. Finally a database that enables rapid &amp; agile development.
  • Creating a book here. A few things to make note of.
  • Powerful message here. Finally a database that enables rapid &amp; agile development.

×