3. Here is a “simple” SQL Model
mysql> select * from book;
+----+----------------------------------------------------------+
| id | title |
+----+----------------------------------------------------------+
| 1 | The Demon-Haunted World: Science as a Candle in the Dark |
| 2 | Cosmos |
| 3 | Programming in Scala |
+----+----------------------------------------------------------+
3 rows in set (0.00 sec)
mysql> select * from bookauthor;
+---------+-----------+
| book_id | author_id |
+---------+-----------+
| 1| 1|
| 2| 1|
| 3| 2|
| 3| 3|
| 3| 4|
+---------+-----------+
5 rows in set (0.00 sec)
mysql> select * from author;
+----+-----------+------------+-------------+-------------+---------------+
| id | last_name | first_name | middle_name | nationality | year_of_birth |
+----+-----------+------------+-------------+-------------+---------------+
| 1 | Sagan | Carl | Edward | NULL | 1934 |
| 2 | Odersky | Martin | NULL | DE | 1958 |
| 3 | Spoon | Lex | NULL | NULL | NULL |
| 4 | Venners | Bill | NULL | NULL | NULL |
+----+-----------+------------+-------------+-------------+---------------+
4 rows in set (0.00 sec)
4. The Same Data in MongoDB
{
"_id" : ObjectId("4dfa6baa9c65dae09a4bbda5"),
"title" : "Programming in Scala",
"author" : [
{
"first_name" : "Martin",
"last_name" : "Odersky",
"nationality" : "DE",
"year_of_birth" : 1958
},
{
"first_name" : "Lex",
"last_name" : "Spoon"
},
{
"first_name" : "Bill",
"last_name" : "Venners"
}
]
}
5. Cursors
$gt, $lt, $gte, $lte, $ne, $all, $in, $nin, $or,
$not, $mod, $size, $exists, $type, $elemMatch
> var c = db.test.find({x: 20}).skip(20).limit(10)> c.next()
> c.next()
...
query
first N results + cursor id
getMore w/ cursor id
next N results + cursor id or 0
...
6. Creating Indexes
An index on _id is automatic.
For more use ensureIndex:
db.blogs.ensureIndex({author: 1})
1 = ascending
-1 = descending
23. Disk configurations
Single Disk
~200 seeks / second
RAID 0
~200 seeks / second ~200 seeks / second ~200 seeks / second
RAID 10
~400 seeks / second ~400 seeks / second ~400 seeks / second
24. Basic Tips
• Focus on higher Memory and
not adding CPU core based
instances
• Use 64-bit instances
• Use XFS or EXT4 file system
• Use EBS in RAID. Use RAID 0
or 10 for data volume, RAID 1
for configdb
25. Basic Installation Steps
1. Create your EC2 Instance
2. Attached EBS Storage
3. Make a EXT4 file system
$sudo mkfs -t ext4 /dev/[connection to volume]
4. Make a data directory
$sudo mkdir -p /data/db
5. Mount the volume
$sudo mount -a /dev/sdf /data/db
6. Install MongoDB
$curl http://[mongodb download site] > m.tgz
$tar xzf m.tgz
7. Start mongoDB
$./mongodb
26. Types of outage
• Planned
• Hardware upgrade
• O/S or file-system tuning
• Relocation of data to new file-system / storage
• Software upgrade
• Unplanned
• Hardware failure
• Data center failure
• Region outage
• Human error
• Application corruption
28. How MongoDB Replication works
Member
1 Member
3
Member
2
PRIMARY
•Election establishes the PRIMARY
•Data replication from PRIMARY to SECONDARY
29. How MongoDB Replication works
negotiate
new master
Member
1 Member
3
Member
2
DOWN
•PRIMARY may fail
•Automatic election of new PRIMARY if majority
exists
30. How MongoDB Replication works
Member
1 Member
3
PRIMARY
Member
2
DOWN
•New PRIMARY elected
•Replication Set re-established
31. How MongoDB Replication works
Member
1 Member
3
PRIMARY
Member 2
RECOVERING
•Automatic recovery
32. How MongoDB Replication works
Member
1 Member
3
PRIMARY
Member
2
•Replication Set re-established
33. Replica Set 0
•Two Node?
•Network failure can
cause the nodes to slip
which will result in the
the whole system going
read only
34. Replica Set 1
•Single datacenter
•Single switch & power
•Points of failure:
•Power
•Network
•Datacenter
•Two node failure
•Automatic recovery of
single node crash
35. Replica Set 3
•Single datacenter
AZ:1
•Multiple power/network
zones
AZ:3 AZ:2 •Points of failure:
•Datacenter
•Two node failure
•Automatic recovery of
single node crash
36.
37. Replica Set 4
•Multi datacenter
•DR node for safety
•Can’t do multi data center durable write safely since only 1
node in distant DC
38. Replica Set 5
•Three data centers
•Can survive full data
center loss
•Can do w= { dc : 2 } to
guarantee write in 2 data
centers
42. Sharding Across AZs
• Each Shard is made up of a Replica
Set
• Each Replica Set is distributed
across availability zones for HA and
data protection
AZ:1
AZ:3 AZ:2
53. download at mongodb.org
We’re Hiring !
Chris Harris
Email : charris@10gen.com
Twitter : cj_harris5
conferences, appearances, and meetups
http://www.10gen.com/events
Notas del editor
\n
\n
\n
\n
\n
\n
\n
\n
\n
The things I’m going to talk about are completely inter-related and intertwined. \n\nThere will be talks that go into much greater details on these topics.\n\nArmed with the information you gather and confident in the skills your team has practiced, you should be able to spot long term problems well before it’s too late and handle the emergencies that are sure to arise.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Add a second process just to illustrate what happens when you have more than one process contending for RAM.\n
Since we’re talking about data stores, specifically, MongoDB, before you do anything else at all, you need to understand your data.\n\nHow big is your data set in total?\nHow big is your working set? that is, the size of the data and indexes that need to fit in RAM\nReads vs. writes? (example and use case)\nLong tail or random access? (example)\n\nArmed with this knowledge, you can accommodate both massive growth spurts without excessive over-provisioning.\n\nRandom access:\nTake a user database\nLong tail:\nTwitter feed\nYou need to be ready for 1MM users, how do I size my Use collection.stats to extrapolate\n
\n
Using standard enterprise spinning disks you can get about 200 seeks / second\n\nSo, you want to be thinking about how you can increase my seeks / second\n
Here, if you can imagine that you’re not pulling all your data from a single partition, you can actually increase you throughput by spreading the load across multiple stripes.\n\nSo in this case gaining potentially three times the speed.\n
What we typically recommend to run RAID10 in production which adds a mirror volume for each stripe.\n\nWe’ve found that this configuration really works out well for most use cases.\n\nYou get the benefit of increased redundancy and parallelization, despite the cost of writing each update to two volumes.\n\n\n