MongoDB Best Practices in AWS

MongoDB Best Practices in AWS
Chris Harris
Email : charris@10gen.com
Twitter : cj_harris5

Terminology

RDBMS MongoDB
Table Collection
Row(s) JSON Document
Index Index
Join Embedding & Linking
Partition Shard
Partition Key Shard Key

Here is a “simple” SQL Model
mysql> select * from book;
+----+----------------------------------------------------------+
| id | title |
+----+----------------------------------------------------------+
| 1 | The Demon-Haunted World: Science as a Candle in the Dark |
| 2 | Cosmos |
| 3 | Programming in Scala |
+----+----------------------------------------------------------+
3 rows in set (0.00 sec)

mysql> select * from bookauthor;
+---------+-----------+
| book_id | author_id |
+---------+-----------+
| 1| 1|
| 2| 1|
| 3| 2|
| 3| 3|
| 3| 4|
+---------+-----------+

mysql> select * from author;
+----+-----------+------------+-------------+-------------+---------------+
| id | last_name | ﬁrst_name | middle_name | nationality | year_of_birth |
+----+-----------+------------+-------------+-------------+---------------+
| 1 | Sagan | Carl | Edward | NULL | 1934 |
| 2 | Odersky | Martin | NULL | DE | 1958 |
| 3 | Spoon | Lex | NULL | NULL | NULL |
| 4 | Venners | Bill | NULL | NULL | NULL |
+----+-----------+------------+-------------+-------------+---------------+

The Same Data in MongoDB

{
"_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"),
"title" : "Programming in Scala",
"author" : [
{
"first_name" : "Martin",
"last_name" : "Odersky",
"nationality" : "DE",
"year_of_birth" : 1958
},
{
"first_name" : "Lex",
"last_name" : "Spoon"
},
{
"first_name" : "Bill",
"last_name" : "Venners"
}
]
}

Cursors

$gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or,
$not, $mod, $size, $exists, $type, $elemMatch

> var c = db.test.ﬁnd({x: 20}).skip(20).limit(10)> c.next()
> c.next()
...

query
first N results + cursor id

getMore w/ cursor id
next N results + cursor id or 0
...

Creating Indexes
An index on _id is automatic.
For more use ensureIndex:

db.blogs.ensureIndex({author: 1})

1 = ascending
-1 = descending

Compound Indexes

db.blogs.save({
author: "James",
ts: new Date()
...
});

db.blogs.ensureIndex({author: 1, ts: -1})

Indexing Embedded Documents
db.blogs.save({
title: "My First blog",
stats : { views: 0,
followers: 0 }
});

db.blogs.ensureIndex({"stats.followers": -1})

db.blogs.ﬁnd({"stats.followers": {$gt: 500}})

Four things to think about

1. Machine Sizing: Disk and Memory

2. Load Testing and Monitoring

3. Backup and restore

4. Ops Play Book

Collection 1 Virtual
Address
Space 1

Index 1

Address
Space 1

Index 1 This is your virtual
memory size
(mapped)

Address
Space 1

Physical
RAM

Index 1

Address
Space 1

Physical
RAM

Index 1

This is your
resident
memory size

Collection 1 Virtual Disk
Address
Space 1

Physical
RAM

Index 1

Address
Space 1

Physical
RAM

Index 1

Virtual
Address
Space 2

Address
Space 1

Physical
RAM

Index 1

100 ns
=
10,000,000 ns
=

Sizing RAM and Disk
• Working set
• Document Size
• Memory versus disk
• Data lifecycle patterns
• Long tail
• pure random
• bulk removes

Figuring out working Set
> db.wombats.stats()
{

"ns" : "test.wombats", Size of data

"count" : 1338330,

"size" : 46915928, Average document

"avgObjSize" : 35.05557523181876, size

"storageSize" : 86092032,

"numExtents" : 12, Size on disk (and in
memory!)

"nindexes" : 2,

"lastExtentSize" : 20872960,

"paddingFactor" : 1, Size of all indexes

"ﬂags" : 0,

"totalIndexSize" : 99860480,

"indexSizes" : { Size of each index

"_id_" : 55877632,

"name_1" : 43982848

},

Disk conﬁgurations
Single Disk

~200 seeks / second

Single Disk

~200 seeks / second

RAID 0

~200 seeks / second ~200 seeks / second ~200 seeks / second

Single Disk

~200 seeks / second

RAID 0


RAID 10


Basic Tips
• Focus on higher Memory and
not adding CPU core based
instances

• Use 64-bit instances

• Use XFS or EXT4 file system

• Use EBS in RAID. Use RAID 0
or 10 for data volume, RAID 1
for configdb

Basic Installation Steps

1. Create your EC2 Instance
2. Attached EBS Storage
3. Make a EXT4 file system
$sudo mkfs -t ext4 /dev/[connection to volume]
4. Make a data directory
$sudo mkdir -p /data/db
5. Mount the volume
$sudo mount -a /dev/sdf /data/db
6. Install MongoDB
$curl http://[mongodb download site] > m.tgz
$tar xzf m.tgz
7. Start mongoDB
$./mongodb

Types of outage

• Planned
• Hardware upgrade
• O/S or ﬁle-system tuning
• Relocation of data to new ﬁle-system / storage
• Software upgrade

• Unplanned
• Hardware failure
• Data center failure
• Region outage
• Human error
• Application corruption

How MongoDB Replication works

Member
1 Member
3

Member
2

•Set is made up of 2 or more nodes


Member
1 Member
3

Member
2
PRIMARY

•Election establishes the PRIMARY
•Data replication from PRIMARY to SECONDARY

negotiate
new master
Member
1 Member
3

Member
2
DOWN

•PRIMARY may fail
•Automatic election of new PRIMARY if majority
exists


Member
1 Member
3
PRIMARY

Member
2
DOWN

•New PRIMARY elected
•Replication Set re-established


Member
1 Member
3
PRIMARY

Member 2
RECOVERING

•Automatic recovery


Member
1 Member
3
PRIMARY

Member
2

•Replication Set re-established

Replica Set 0
•Two Node?
•Network failure can
cause the nodes to slip
which will result in the
the whole system going
read only

Replica Set 1
•Single datacenter
•Single switch & power
•Points of failure:
•Power
•Network
•Datacenter
•Two node failure
•Automatic recovery of
single node crash

Replica Set 3
•Single datacenter
AZ:1
•Multiple power/network
zones
AZ:3 AZ:2 •Points of failure:
•Datacenter
•Two node failure
•Automatic recovery of
single node crash

Replica Set 4

•Multi datacenter
•DR node for safety
•Can’t do multi data center durable write safely since only 1
node in distant DC

Replica Set 5

•Three data centers
•Can survive full data
center loss
•Can do w= { dc : 2 } to
guarantee write in 2 data
centers

http://community.qlikview.com/cfs-ﬁlesystemﬁle.ashx/__key/CommunityServer.Blogs.Components.WeblogFiles/
theqlikviewblog/Cutting-Grass-with-Scissors-_2D00_-2.jpg

http://www.bitquill.net/blog/wp-content/uploads/2008/07/pack_of_harvesters.jpg

Sharding Across AZs
• Each Shard is made up of a Replica
Set
• Each Replica Set is distributed
across availability zones for HA and
data protection
AZ:1

AZ:3 AZ:2

Balancing
mongos
config
balancer
config
Chunks!
config

1 2 3 4 13 14 15 16 25 26 27 28 37 38 39 40

5 6 7 8 17 18 19 20 29 30 31 32 41 42 43 44

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48

Shard 1 Shard 2 Shard 3 Shard 4

Balancing
mongos
config
balancer
config

Imbalance
Imbalance config

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


Balancing
mongos
config
balancer
config

Move chunk 1 to config
Shard 2

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


Balancing
mongos
config
balancer
config

config

1 2 3 4

5 6 7 8

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


Balancing
mongos
config
balancer
config

config

2 3 4

5 6 7 8 1

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


Balancing
mongos
config
balancer
config
Chunk 1 now lives on
Shard 2
config

2 3 4

5 6 7 8 1

9 10 11 12 21 22 23 24 33 34 35 36 45 46 47 48


Replica Set 3
1. Lock the “Backup” Node:
db.fsyncLock()
backup

2. Check Locked
db.currentOp()

3. Take a EBS Snapshot or MongoDump
ec2-create-snapshot -d mybackup vol-nn

4. Unlock
db.fsyncUnlock()

Monitoring Tools
mongostat -

MMS! - http://mms.10gen.com
munin, cacti, nagios -
http://www.mongodb.org/display/DOCS/Monitoring+and+Diagnostics

download at mongodb.org

We’re Hiring !
Chris Harris
Email : charris@10gen.com
Twitter : cj_harris5

conferences, appearances, and meetups
http://www.10gen.com/events

MongoDB Best Practices in AWS

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a MongoDB Best Practices in AWS

Similar a MongoDB Best Practices in AWS (20)

Último

Último (20)

MongoDB Best Practices in AWS

Notas del editor