3. Starting Point
Requirements for replacement:
• Store data
• High performance
o worldwide access
o multiple physical locations
o read heavy application
• Simple to use
o No DBA...
• MI
o real-time reporting without affecting production
4. • Availability and Partition Tolerance
o from CAP
o by default Consistency not Availability...
• Document-Orientated
o called from OOPHP
• Simple querying using JSON
o benefit over CouchDB which needs MapReduce
• Simple multi-node setup
• MongoLab.com
o web interface for prototyping
So why MongoDB?
5. Read Preference
Defaults to primary (strongly consistent)
Other options allow eventual consistency
primaryPreferred
secondary
secondaryPreferred
7. What are the other members?
One Delayed member provides a rolling
backup by maintaining a dataset, delay is
limited by the size of the oplog
One Hidden member provides access for MI
(and also used for dumping in this scenario)
9. Automatic Failover
Had experience of this in production. It's magic.
Remaining replica set members will vote and
elect a new primary, which will start taking
writes. (PHP extension issue)
Replica set members that have been out of the
cluster will recover automatically on rejoining
o If you've gone past the oplog window then it's a manual
process
10. Sharding, and why we didn't
• Sharding divides the data set and
distributes the data over multiple servers, or
shards
• Necessary when
o data exceeds max for single instance (based on disk
space)
o the Working Set exceeds available RAM
o single instance can’t deal with writes (would have to
be a lot)
• We had none of these and Sharding adds
complexity
12. Sharding Availability
Each Shard would need a Replica Set
containing at least 3 members to achieve the
same level of availability as our non-sharded
architecture.
In addition if one or two of the Config Servers
are unavailable chunk migration and chunk
splitting are suspended
13. DataBase Design
• Schemaless means that the structure and
the content of the “record” are held for each
“row”
o long key names add up
o we used single chars and a lookup table
• Collections
o Can think of them as “buckets” rather than tables
o Contain similar records by design but it’s not
enforced
14. Management Information
• Different conceptually in NoSQL databases
due to the way they are queried, returning
JSON
• Traditional working through a query tool is
less intuitive (although MongoVUE is quite
good)
• This lead us to replace our SQL script ->
Excel -> Graph process (yuck)
• MapReduce, powerful but a steep learning
curve