NoSQL databases and managing big data

NoSQL
Databases
&
Managing Big Data

Talking about
What is BIG Data
NoSQL
MongoDB
Future of BIG Data

@spf13

AKA
Steve Francia
15+ years building
the internet

Father, husband,
skateboarder

Chief Solutions Architect @
responsible for drivers,
integrations, web & docs

Company behind MongoDB
Ofﬁces in NYC, Palo Alto, London & Dublin
100+ employees
Support, consulting, training
Mgt: Google/DoubleClick, Oracle, Apple, NetApp, Mark Logic

Well Funded: Sequoia, Union Square, Flybridge

2000
Google Inc
Today announced it has released
the largest search engine on the
Internet.

Google’s new index, comprising
more than 1 billion URLs

2008
Our indexing system for processing
links indicates that
we now count 1 trillion unique URLs

(and the number of individual web
pages out there is growing by
several billion pages per day).

Data Growth 1,000
1000

750

500
500

250
250
120
55
4 10 24
1
0
2000 2001 2002 2003 2004 2005 2006 2007 2008

Millions of URLs

An unprecedented
amount of data is
being created and is
accessible

What good is it if
we can’t utilize this
data?

What is NoSQL?

Key / Value Column Graph Document

Key-Value Stores
A mapping from a key to a value
The store doesn't know anything about the the
key or value
The store doesn't know anything about the
insides of the value
Operations :
•Set, get, or delete a key-value pair

Column-Oriented
Stores
Like a relational store, but ﬂipped around: all
data for a column is kept together
An index provides a means to get a column
value for a record
Operations:
•Get, insert, delete records; updating ﬁelds
Streaming column data in and out of Hadoop

Graph Databases
Stores vertex-to-vertex edges
Operations:
•Getting and setting edges
•Sometimes possible to annotate vertices
or edges
Query languages support ﬁnding paths
between vertices, subject to various
constraints

Document Stores
The store is a container for documents
Documents are made up of named fields
(think object/array/dict/hash...)
Can query on any document field(s)
Operations:
•Insert and delete documents
•Update fields within documents

MySQL

Data Model Columns Key:Value Columns Documents Relational

Eventual / Eventual /
Consistency Strong Strong Strong
Quorum Quorum

Multi- Multi- Single Single Single
Availability
Master Master Master Master Master

Range or
Partitioning Hash Hash Range N/A
Hash

Thrift, Native Rest, Native
Query SQL
CQL Drivers (6) Thrift Drivers (12)

What do we want in
an ideal world?

What do we want in
an ideal world?
•Horizontal scaling
•cloud compatible
•works with standard
servers
•Fast
•Development is easy
•Features
•The Right Data Model
•Schema Agility

MongoDB philosophy
Keep functionality when we can (key/value
stores are great, but we need more)
Non-relational (no joins) makes scaling
horizontally practical
Document data models are good
Database technology should run anywhere
virtualized, cloud, metal, etc

Under the hood
Written in C++
Runs nearly everywhere
Data serialized to BSON
Extensive use of memory-mapped ﬁles
i.e. read-through write-through
memory caching.

Database Landscape
Scalability & Performance

Memcached
MongoDB

RDBMS

Depth of Functionality

“
MongoDB has the best
features of key/value
stores, document
databases and
relational databases
in one.
John Nunemaker

Relational made normalized
data look like this
Category
• Name
• Url

Article
User • Name
Tag
• Name • Slug • Name
• Email Address • Publish date • Url
• Text

Comment
• Comment
• Date
• Author

Document databases make
normalized data look like this
Article
• Name
• Slug
• Publish date
User • Text
• Name • Author
• Email Address
Comment[]
• Comment
• Date
• Author

Tag[]
• Value

Category[]
• Value

Start with an
(or array, hash, dict, e

place1 = {

name : "10gen HQ",

address : "578 Broadway 7th Floor",

city : "New York",

zip : "10011",
tags : [ "business", "awesome" ]
}

Inserting the record
Initial Data Load

> db.places.insert(place1)

> db.places.insert(place1)

Querying
{

name : "10gen HQ",

address : "134 5th Avenue 3rd Floor",

city : "New York",

zip : "10011",
tags : [ "business", "awesome" ]
}

> db.places.ﬁndOne({ zip: "10011",
tags: "awesome" })

> db.places.ﬁnd({tags: "business" })

Nested Documents
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
name : "10gen HQ",

address : "578 Broadway 7th Floor",

city : "New York",

zip : "10011",
tags : [ "business", "awesome" ],
tips : [{

author : "Fred",

date : "Sat Apr 25 2010 20:51:03",

text : "Best Place Ever!"

}]
}

Updating
> db.places.update(
{name : "10gen HQ"},
{ $push :
{ tips :
{ author : "nosh",
date : 6/26/2011,
text : "Office hours are great!"
}
}
}
)

CMS / Blog
Needs:
• Business needed modern data store for rapid development and
scale

Solution:
• Use PHP & MongoDB

Results:
• Real time statistics
• All data, images, etc stored together
easy access, easy deployment, easy high availability
• No need for complex migrations
• Enabled very rapid development and growth

Photo Meta-Data
Problem:
• Business needed more ﬂexibility than Oracle could deliver

Solution:
• Use MongoDB instead of Oracle

Results:
• Developed application in one sprint cycle
• 500% cost reduction compared to Oracle
• 900% performance improvement compared to Oracle

Customer Analytics
Problem:
• Deal with massive data volume across all customer sites

Solution:
• Use MongoDB to replace Google Analytics / Omniture options

Results:
• Less than one week to build prototype and prove business case
• Rapid deployment of new features

Archiving
Why MongoDB:
• Existing application built on MySQL
• Lots of friction with RDBMS based archive storage
• Needed more scalable archive storage backend
Solution:
• Keep MySQL for active data (100mil)
• MongoDB for archive (2+ billion)
Results:
• No more alter table statements taking over 2 months to run
• Sharding enabled horizontal scale
• Very happily looking at other places to use MongoDB

Online Dictionary
Problem:
• MySQL could not scale to handle their 5B+ documents

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simpliﬁcation of code base
• Eliminated need for external caching system
• 20x performance improvement over MySQL

E-commerce
Problem:
• Multi-vertical E-commerce impossible to model (efﬁciently) in
RDBMS

Solution:
• Switched from MySQL to MongoDB

Results:
• Massive simpliﬁcation of code base
• Rapidly build, halving time to market (and cost)
• Eliminated need for external caching system
• 50x+ performance improvement over MySQL

Tons more
MongoDB casts a wide net

people keep coming up with
new and brilliant ways to use it

In Good Company

and 1000s more

What is BIG?
BIG today is
normal tomorrow

Data Growth 9,000
9000

6750

4,400
4500

2,150
2250
1,000
500
55 120 250
1 4 10 24
0
2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Millions of URLs

2012
Generating over
250 Millions of
tweets per day

MongoDB enables
us to scale with
the redeﬁnition
of BIG.

MongoDB
High Easy
Performance Development
{ author : “steve”,
date : new Date(),
text : “About MongoDB...”,
tags : [“tech”, “database”]}

Horizontally Scalable

http://spf13.com
http://github.com/s
@spf13

Question
download at mongodb.org
We’re hiring!! Contact us at jobs@10gen.com

NoSQL databases and managing big data

NoSQL databases and managing big data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a NoSQL databases and managing big data

Similar a NoSQL databases and managing big data (20)

Más de Steven Francia

Más de Steven Francia (20)

Último

Último (20)

NoSQL databases and managing big data

Notas del editor