When deploying a MongoDB instance, there is a couple of important decision that has to be made. An important one is the decision whether the instance is going to be a Replica-set or a Sharded Cluster. It’s not uncommon to start with a Replica-set, as its easier to deploy and simpler to operate. For some workload, the Replica-set instance may not be the best option, mainly performance-wise, and the migration to a Sharded cluster is the only way.
This presentation will review the challenges when migrating from a Replica-set instance to a Sharded cluster. We will demonstrate real-world issues that users have encounter when migrating from a Replica-set to a Sharded cluster. We are going to list best-practices for the migration and the changes that may be required when moving to a Sharded cluster.
Keeping your build tool updated in a multi repository world
Migrate from a MongoDB replica-set to a sharded cluster
1. TM
Migrate from a
MongoDB Replica-Set
to a Sharded Cluster
Percona Live Online
May 2020
Antonios Giannopoulos
Database Administrator
Jason Terpko
Database Administrator
2. About us
20 years of DB experience
Combined
30 years in the IT industry
Combined
Antonios Giannopoulos
linkedin.com/in/antonis
Jason Terpko
linkedin.com/in/jterpko
Members of Rackspace
since 2014
Regular Speakers
Percona Live & Europe
Jason loves Antonios loves
MongoDB To hate MongoDB
3. 3
o Definitions
o Reasons to migrate
o Prepare for the migration
o Migration
o Welcome to the Sharding era
o Scaling
o Q&A
Migrate from a
MongoDB Replica-Set
to a Sharded Cluster
Agenda
4. A replica set in MongoDB is a group of mongod processes that maintain the same
data set. Replica-Set
4
5. A Sharded cluster in MongoDB is a group of replica Set(s) that are accessible
through one or many mongos processes. Sharded Cluster
5
6. Why migrate?
He who has a why to live can bear almost any how
-Friedrich Nietzsche
7. 7
Replica-sets supports vertical scaling.
Scalability Replica Sets
0
500
1000
1500
2000
0 100 200 300 400 500 600 700
32G vs 64G of RAM
32G 64G• Only Primary can serve writes
• More secondaries can scale reads
• Increase in performance is not linear
• May start hitting kernel hard limits
• May start hitting storage engine hard limits
• May start hitting hardware limitations
8. 8
Sharded clusters supports horizontal scaling
Scalability – Sharded Clusters
• Add as many shards/mongos for scaling
• Shards can be same or different sizes
• Scales both reads and writes
• Shard key operations close to linear performance increase
11. 11
The implementation is very similar to Geo- distributed clusters
Storage takes the place of the region, and shard key(s) must be prefixed with it
Hot/Cold partition architecture
13. 13
Certain administrative actions run faster on a sharded cluster.
Some examples:
o Build an index
o Perform an Initial sync
o Backup/Restore
Even if sharded clusters are more complex they can reduce costs:
o Its cheaper to have lot of small servers than few large servers
o1 X (T2 Double XL) vs 3 X (T2 Medium)
oMongos and config servers can run on cheap hardware
Manageability & Costs
14. Prepare for the migration
By failing to prepare, you are preparing to fail
― Benjamin Franklin
15. 15
Prepare the additional component – Config servers
§ Requires at least three mongod processes (same version as your RS)
§ Arbiters and delayed slaves are not allowed
§ Doesn’t have high storage/IOPS/CPU/RAM requirements
§ For HA use different servers/VM/Containers of each process
§ Stores two databases:
Ø Admin: Authentication & Authorization
Ø Config: Sharded cluster metadata
Note: We demonstrate basic config. Your organization config may differ
16. 16
§ Requires at least one mongos processes (same version as your RS)
§ For HA purposes deploy at least two mongos – 3 recommended
§ Multiple mongos on the same HW
§ Truth is the actual number depends on the application
§ You can scale-in/out based on your needs
§ Doesn’t store any data locally
§ HW Specs:
ØDisk: A volume for logging
ØRAM: Depends on the number of mongos/type of operations
ØCPU: Depends on the number of mongos/type of operations
Prepare the additional component – mongos
18. 18
All this trouble for two empty databases
Congratulations it’s a Shell
Not completely empty
19. 19
Intra-cluster communication
Open the required rules, an iptables example
An isolation layer (VCN/Subnet) between mongos-mongod is a good practice
Application access
Post-transition applications must connect to the mongos tier ONLY
Exceptions: Monitoring & Backup agents and oplog readers*
*replace oplog access with change streams
Configure Networking
20. 20
Authentication will be handled by the mongos tier
All users/roles from the replica-set must be copied to the mongos
§ Create users/roles from scratch
§ mongoimport/export users/roles
§ Use a script to copy the users/roles
Authentication & Authorization
Spoiler alert: In a later stage application users must be dropped from the PRIMARY
21. 21
Connection string on the driver must change and tested
Connection String
Connection string options: https://bit.ly/2zKkgS0
22. 22
On your replica-set mongod you must add the following section into the config file
Next, a rolling restart is necessary:
1) Restart the Secondaries one at a time
2) Stepdown the cluster
3) Restart the ex-Primary node
If you skip that step, during the sh.addShard command (covered in the next section)
Replica-set changes
23. 23
Make sure you haven’t left any upgrade halfway:
- authSchemaUpgrade (Pre-4.0 clusters)
You want your number match, Version:3 on the authSchema requires to run:
- db.adminCommand({authSchemaUpgrade: 1 });
- setFeatureCompatibilityVersion
Don’t leave unfinished items
24. 24
Make sure shell is properly monitored
- Basic OS level checks
- MongoDB basic checks
- Prometheus monitoring
Plan to add the shell tier to the backup policy
- Mongos: Keep a copy of the config file (repo/deployment script).
- Config servers: Data directory (balancer must be stopped) and config file.
Middleware preparation
25. 25
First blood: Always ”sacrifice” DEV/QA/UAT instances
4.0 specific : Transactions(?)
Inspect your codebase for incompatibilities
For 4.0+ clusters and developers that follow best-practices:
$where does not permit references to the db object from the $where function. This is uncommon in un-sharded collections.
The geoSearch command is not supported in sharded environments.
Pre-4.0
The group does not work with sharding. (Deprecated)
db.eval() is incompatible with sharded collections (Deprecated)
The $isolated update modifier does not work in sharded environments. (Deprecated/Removed)
$snapshot queries do not work in sharded environments (Deprecated)
Inspect your code
27. 27
Let’s Add The Shard
✓
✓
✓sh.addShard("rs0/rs0-0.mongod.local:30000”);
28. 28
What did “addShard” do?
• Expected Result
{"shardAdded" : "rs0", "ok" : 1}
• Populates config.shards
{ "_id" : "rs0", "host" : "rs0/rs0-0.mongod.local:30000,rs0-1.mongod.local:31000,rs0-2.mongod.local:32000", "state" : 1 }
• Initiates a Replica Set Monitor
• Populates config.databases
• Sets up and shards config.system.sessions
Refresh for collection config.system.sessions to version 1|0||5eb70c73880167c73b042a5d took 3 ms
• Time to Test
NETWORK [conn11] Starting new replica set monitor for
rs0/rs0-0.mongod.local:30000,rs0-1.mongod.local:31000,rs0-2.mongod.local:32000
{ "_id" : "production", "primary" : "rs0", "partitioned" : false,
"version" : { "uuid" : UUID("331d084e-ccfe-4072-a2fa-dad614a6b18f"), "lastMod" : 1 } }
29. 29
Updating Your URI
from pymongo import MongoClient
connection = MongoClient(
'mongodb://myuser:mypass@rs0-0.mongod.local:30000,rs0-1.mongod.local:31000/production?replicaSet=rs0’
)
Sharded Cluster
Replica Set
from pymongo import MongoClient
connection = MongoClient(
'mongodb://myuser:mypass@mongos-0.mongos.local:50000,mongos-1.mongos.local:51000/production’
)
32. 32
Testing Resiliency
Planned Maintenance
• Graceful Process Termination
• Election (Step Down)
• Eventually
• Removing Members
• Removing Shards
Unplanned Maintenance
• Non-Graceful Process Termination
• OOM
• Segmentation Fault or Assertion
• Election due to loss of primary
34. Welcome to Sharding era.
So What Now?
Do one thing every day that scares you
– Eleanor Roosevelt
35. 35
Post-Flight
Backups
• Repeatable Deploys
• Data Balance (Timing)
• Oplog Length
• Consistent State
Upgrading and Downgrading
• Order Matters
• Confirm per Major Version
• Feature Compatibility Version
Monitoring Metrics
• Mongos Layer
36. 36
Monitoring Write Operations
from mongotriggers import MongoTrigger
…
triggers = MongoTrigger(client)
triggers.register_insert_trigger(update_last_login, 'app', 'sessions')
triggers.tail_oplog()
Sharded Cluster with ChangeStreams
Replica Set, Potential Legacy Code
from pymongo import MongoClient
with db.sessions.watch([{'$match': {'operationType': 'insert'}}, {'$addFields': {'fullDocument.LastLogin':
datetime.datetime.utcnow()}}]) as stream:
for doc in stream:
resume_token = doc.get("_id")
update_last_login(doc)
37. 37
Why Change Streams?
• Target Flexibility
• Resumable
• Data Manipulation
• Supported Feature
• Changes with Topology
• Single Authentication and Authorization Source
• Transaction Compliant
39. 39
Understand Your Workload
Profiling
Profiling will help you identify your workload.
Enable statement profiling on level 2 (collects profiling data for all database operations)
To collect a representative sample you might need to increase profiler size
*size is in bytes
db.getSiblingDB(<database>).setProfilingLevel(2);
db.getSiblingDB(<database>).setProfilingLevel(0);
db.getSiblingDB(<database>).system.profile.drop();
db.getSiblingDB(<database>).createCollection( "system.profile", { capped: true, size: <size>} );
db.getSiblingDB(<database>).setProfilingLevel(2);
40. 40
Shard Key Candidates
Profiling
Using the data you have collected create the following report per collection.
Collection <Collection Name> - Profiling period <Start time> , <End time> - Total statements: <num>
Number of Inserts: <num>
Number of Queries: <num>
Query Patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of Updates: <num>
Update patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of Removes: <num>
Remove patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
Number of FindAndModify: <num>
FindandModify patterns: {pattern1}: <num> , {pattern2}: <num>, {pattern3}: <num>
41. 41
Considerations for Candidates
• High Cardinality
• Unique Key
• Targeted Operations
• Read
• Write
• High Percentage of
Operations
• Potential Hotspots
• Tiny documents
combined with
mediocre cardinality
• Monotonically
Increasing Fields
• Low cardinality*
• Data Pruning
• Multiple Unique
Indexes
• Null Values
• Modifications to
Key(s)**
• findAndModify or
update not using key
42. 42
Reverting a Shard Key Choice
Evaluate Your State and Choose a Path
• Single Shard and Able To Revert to Replica Set
• Workload Can Tolerate Extended Downtime
• Workload Tolerates Brief or No Downtime
Evaluate Your State and Choose a Path
• Single Shard and Able To Revert to Replica Set
• Workload Can Tolerate Extended Downtime
• Workload Tolerates Brief or No Downtime
Evaluate Your State and Choose a Path
• Single Shard and Able To Revert to Replica Set
• Workload Can Tolerate Extended Downtime
• Workload Tolerates Brief or No Downtime
mongodump -h mongos-1.mongos.local:51000 --authenticationDatabase admin -u <user> -p <pass>
-d production -c users
mongorestore -h mongos-1.mongos.local:51000 --authenticationDatabase admin -u ubuntu -p ubuntu
-d production -c users --drop dump/production/users.bson
db.locks.find({"_id" : "production.users"})
db.collections.find({"_id" : "production.users"})
db.chunks.find({"ns" : "production.users"})
*combined with mongos restarts and rs.stepDown()
43. Scaling
When you scale a mountain, you have to leave your ego at home.”
― Anthony T. Hincks
44. 44
Add a new shard
Adding shards
Start Balancer
Chunk Migrations will begin
47. 47
Parallel migrations
Add one new shard – No parallelism
1 migration at a time
Add two new shards – Parallelism
2 migrations at a time
If Existing shards>New Shards
Number of parallel migrations = New Shards
Else
Number of parallel migrations = Existing shards
48. 48
Balancing adds overhead
Minimize the the impact by considering a:
- Balancing window
- _secondaryThrottle
- _waitForDelete
Be aware that documents in transit maybe visible from the secondaries (Read Concern “available”)
Documents in the source shard are also maybe visible before RangeDeleter runs
Any interference during migration may lead to orphaned documents
Balancing Considerations
49. 49
Unsharded collections located on each
database Primary Shard
Unsharded collections
Use the movePrimary command to
distribute the primaries
Requires write downtime to guarantee
consistency