An unprecedented amount of data is being created and is accessible. This presentation will instruct on using the new NoSQL technologies to make sense of all this data.
3. @spf13
AKA
Steve Francia
15+ years building
the internet
Father, husband,
skateboarder
Chief Solutions Architect @
responsible for drivers,
integrations, web & docs
4. Company behind MongoDB
Offices in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic
Well Funded: Sequoia, Union Square, Flybridge
6. 2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.
Google’s new index, comprising
more than 1 billion URLs
7. 2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs
(and the number of individual web
pages out there is growing by
several billion pages per day).
8. Data Growth 1,000
1000
750
500
500
250
250
120
55
4 10 24
1
0
2000 2001 2002 2003 2004 2005 2006 2007 2008
Millions of URLs
13. Key-Value Stores
A mapping from a key to a value
The store doesn't know anything about the the
key or value
The store doesn't know anything about the
insides of the value
Operations :
•Set, get, or delete a key-value pair
14. Column-Oriented
Stores
Like a relational store, but flipped around: all
data for a column is kept together
An index provides a means to get a column
value for a record
Operations:
•Get, insert, delete records; updating fields
Streaming column data in and out of Hadoop
15. Graph Databases
Stores vertex-to-vertex edges
Operations:
•Getting and setting edges
•Sometimes possible to annotate vertices
or edges
Query languages support finding paths
between vertices, subject to various
constraints
16. Document Stores
The store is a container for documents
Documents are made up of named fields
(think object/array/dict/hash...)
Can query on any document field(s)
Operations:
•Insert and delete documents
•Update fields within documents
17. MySQL
Data Model Columns Key:Value Columns Documents Relational
Eventual / Eventual /
Consistency Strong Strong Strong
Quorum Quorum
Multi- Multi- Single Single Single
Availability
Master Master Master Master Master
Range or
Partitioning Hash Hash Range N/A
Hash
Thrift, Native Rest, Native
Query SQL
CQL Drivers (6) Thrift Drivers (12)
20. What do we want in
an ideal world?
•Horizontal scaling
•cloud compatible
•works with standard
servers
•Fast
•Development is easy
•Features
•The Right Data Model
•Schema Agility
21. MongoDB philosophy
Keep functionality when we can (key/value
stores are great, but we need more)
Non-relational (no joins) makes scaling
horizontally practical
Document data models are good
Database technology should run anywhere
virtualized, cloud, metal, etc
22. Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped files
i.e. read-through write-through
memory caching.
24. “
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
John Nunemaker
25. Relational made normalized
data look like this
Category
• Name
• Url
Article
User • Name
Tag
• Name • Slug • Name
• Email Address • Publish date • Url
• Text
Comment
• Comment
• Date
• Author
26. Document databases make
normalized data look like this
Article
• Name
• Slug
• Publish date
User • Text
• Name • Author
• Email Address
Comment[]
• Comment
• Date
• Author
Tag[]
• Value
Category[]
• Value
34. CMS / Blog
Needs:
• Business needed modern data store for rapid development and
scale
Solution:
• Use PHP & MongoDB
Results:
• Real time statistics
• All data, images, etc stored together
easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth
35. Photo Meta-Data
Problem:
• Business needed more flexibility than Oracle could deliver
Solution:
• Use MongoDB instead of Oracle
Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle
36. Customer Analytics
Problem:
• Deal with massive data volume across all customer sites
Solution:
• Use MongoDB to replace Google Analytics / Omniture options
Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features
37. Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB
38. Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL
39. E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efficiently) in
RDBMS
Solution:
• Switched from MySQL to MongoDB
Results:
• Massive simplification of code base
• Rapidly build, halving time to market (and cost)
• Eliminated need for external caching system
• 50x+ performance improvement over MySQL
40. Tons more
MongoDB casts a wide net
people keep coming up with
new and brilliant ways to use it
48. MongoDB
High Easy
Performance Development
{ author : “steve”,
date : new Date(),
text : “About MongoDB...”,
tags : [“tech”, “database”]}
Horizontally Scalable
49. http://spf13.com
http://github.com/s
@spf13
Question
download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com
By reducing transactional semantics the db provides, one can still solve an interesting set of problems where performance is very important, and horizontal scaling then becomes easier.\n\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
\n
One site is generating nearly as many URLs as the entire internet 6 years ago.\n