IAC 2024 - IA Fast Track to Search Focused AI Solutions
Mongo DB Athens user group replication and high availability
1. Replication and High Availability
MongoDB Athens User Group
Athens, Greece 16/1/2013
Alex Giamas
alexandergiamas@yahoo.com
2. History
● Oracle shop
● Non existent OLAP
● Queries in live DB
(c) Alex Giamas, Persado Inc, All rights reserved
3. Initial investigation
● CouchDB
● Riak
● Hbase
● Cassandra
● MongoDB
● Voldemort
(c) Alex Giamas, Persado Inc, All rights reserved
4. History
● Missed all the fun while being in the States
● Hadoop / HBase / llama / Pig
● Sharded MySQL
● Voldemort
● Huge RAC deployments
● Following MongoDB since 1.4 (Replica sets? Nah..
Sharding?..alpha)
5. Reporting and Analytics
● Settled on MongoDB
● Document oriented
● No clue about schema at the time
● No clue about what are we going to do with client data
(c) Alex Giamas, Persado Inc, All rights reserved
6. Prototyping, version 0.5
● MT, MO, User collections
● Sync Map Reduce for reporting
● One DB to rule them all
(c) Alex Giamas, Persado Inc, All rights reserved
7. Show stoppers
● Single server deployment
● Global write lock
● MR in real time
(c) Alex Giamas, Persado Inc, All rights reserved
8. Results
● Demo Christmas Eve 2010
(c) Alex Giamas, Persado Inc, All rights reserved
9. Results
● Demo Christmas Eve 2010
● Slow....
(c) Alex Giamas, Persado Inc, All rights reserved
10. Reporting, version 1.0
● Spring Batch for async computations
● Quartz scheduler firing every 3 minutes
● Separate nodes for OLTP and OLAP Dbs
● Custom cloneCollection()
(c) Alex Giamas, Persado Inc, All rights reserved
11. Real world kicks in
● Everything designed for online integration
● Huge client coming in offering offline integration
● Ride the cloud wagon!
(c) Alex Giamas, Persado Inc, All rights reserved
12. Reporting Version 2 (the real world)
● Files coming in via FTP containing all sorts of time
inconsistencies
● No longer a linear timeline of events, more like a soup of
results
(c) Alex Giamas, Persado Inc, All rights reserved
13. MongoDB on EC2
● 2 replica sets of 2 nodes+arb
● Arbiters crossed wrt replica sets
● Third node could be different availability zone
(c) Alex Giamas, Persado Inc, All rights reserved
17. Replica Sets Configuration
● Can afford 1 failure with fully functional cluster
● Can afford 2 failures with partially functional cluster**
** Terms and conditions may apply
(c) Alex Giamas, Persado Inc, All rights reserved
18. Replica Sets Configuration
● Rolling upgrades without DB downtime
● Schemaless, document oriented offers great flexibility in
application terms
(c) Alex Giamas, Persado Inc, All rights reserved
19. Replica Sets Configuration
● Unix level tweaks:
– raise ulimit
– raise tcp timeout
– Noatime nodiratime
– XFS, ext4
– LVM for snapshotting
● Mongo level tips:
– Use journaling. USE JOURNALING
(c) Alex Giamas, Persado Inc, All rights reserved
20. Replica Sets Configuration
EC2 specific tips:
– Can and will steal back time, plan for it
– Can get flaky at times..
– Design around EBS
(c) Alex Giamas, Persado Inc, All rights reserved
21. Replica Sets Configuration
● EC2 storage:
– Local storage. Ephemeral
– EBS storage. Lasts but not strong durability guarantees.
– S3 storage. Lasts more, slower
(c) Alex Giamas, Persado Inc, All rights reserved
22. Replica Sets Configuration
● Settled for EBS storage.
● Nightly backups, 30 day window
(c) Alex Giamas, Persado Inc, All rights reserved
23. Reporting Version 3
● Aggregation Framework effort led by Chris Westin
– Simpler way to perform Map Reduce jobs without all the pain of JS
– Integrates cleanly with our business logic
● Initial design on sharding
– More on that next..
(c) Alex Giamas, Persado Inc, All rights reserved
24. Reporting Version 3
● Aggregation framework for both storing and retrieving
aggregate data
– New collection for double checking results with MR.
● Faster, simpler, most of the times fits in our problem domain.
● Worked better in dev than production versions ;)
(c) Alex Giamas, Persado Inc, All rights reserved
25. Reporting Version 3
● More fine grained write semantics.
– WriteConcern.SAFE for most write queries
– .REPLICAS_SAFE for non idempotent queries that are costly to
recompute
● Do you feel lucky punk?
– Reactive Mongo
● Asynchronous & Non-Blocking Scala Driver for MongoDB
– Brings the best of WriteConcern.SAFE and WriteConcern.NORMAL
(c) Alex Giamas, Persado Inc, All rights reserved
26. Replication and High Availability Take aways
● Use delayed members
● Size your oplog
● Use writeconcern and readpreference to balance between
providing fresh data and overloading servers
● Failover happens automagically but not instantaneously
● Think your security model
(c) Alex Giamas, Persado Inc, All rights reserved
27. Replication and High Availability Take aways
● More important: Think who has access to your systems.
– No commit, no rollback
● Prepare people for change
– Educate non engineers
– Use morphia
(c) Alex Giamas, Persado Inc, All rights reserved
28. Replication and High Availability Take aways
● Audit – audit – audit
– Monitor closely your MongoDB servers for potential bottlenecks
● mms.10gen.com great tool to do so
– Github is your friend:
● https://github.com/mongolab/dex
(c) Alex Giamas, Persado Inc, All rights reserved
29. Q&A
Ask me anything...
or drop me a line:
alexandergiamas@yahoo.com
alexandros.giamas@persado.com
(c) Alex Giamas, Persado Inc, All rights reserved
Notas del editor
Single server deployment really fast for OLTP, would block in OLAP queries, blocking the whole server
falling asleep during demo @ dashboard view...
- Arbitrarily chosen to be 3 minutes - avoid write lock, give MR the computational resources it needs without affecting OLTP - cloneCollection() didn't support many of the features we needed back then, i.e. Custom business logic to clone and update target doc
Fought my way through the third node
If having 3 data nodes, you can have read only cluster If 2 nodes and 1 arbiter then you can only afford to lose one node + arbiter (obviously)
(actually 0% unplanned downtime in the 1+ year operating the cluster ) Caveat: indexing in dynamic fields not possible, need to plan ahead if you will need to index these fields
LVM for snapshotting : you may or may not need fsync+lock
Because of low I/O esp at small/medium machines. Use LVM with RAID 0
Mention the whole $isoDate mess that was dropped between dev and production versions. Explaion odd vs even numbered versioning.
Delayed members can save you from screwups such as db.drop() when you think you are in the default 'test' db but you are in fact in the production DB. Oplog is 5% by default max(5%, 1G). You may need less or more than that. 20-30 secs for failover to happen using Java driver.