The document discusses MongoDB Management Service (MMS) automation. MMS automation allows users to automatically create, manage, and upgrade MongoDB systems of any size and configuration without application downtime. Key points:
- MMS automation can automatically create MongoDB systems, manage them by adding capacity or resizing oplogs, and upgrade deployments without downtime.
- MMS now integrates with AWS, allowing users to provision MongoDB servers directly from MMS.
- The talk will demonstrate MMS automation and provisioning on AWS through a series of demos.
- MMS automation works by having agents on each machine communicate with MMS and MongoDB to inspect the cluster, adjust it based on the desired state provided
3. MMS Automation: What Does It Do?
● Create MongoDB systemsof any size, and any configuration
● Manage MongoDB systems, such as adding capacity or resizing the
oplog, with no application downtime
● Upgrade a deployment, with no application downtime
… all from the comfort of your web browser.
4. MMS Automation…. And MMS Provisioning Too
For extra fun, we’ve also added anAmazonAWS integration
which allows you to provision the servers on which your
MongoDB processes will run, directly from MMS.
● Optional component- you can provision your base servers any way you like, doing it
via MMS Provisioning is just one way
● Future plans to integrate OpenStack,VMWare, etc.
9. Upgrading a Cluster Manually
1. Rundb.upgradeCheckAllDBs()tocheckdatasetforcompatibility
2. Resolveanyincompatibilitesinyourdeployment
3. Upgradeauthenticationmodel
4. DownloadmongoDBbinariesfor2.6
5. Disablethebalancer
6. Upgradethecluster’smetadata
7. Waitforeachmongostoexitoncompletion
8. Upgradeeachmongosprocess,oneatatime
9. Upgradeeachconfigserver,oneatatime,withthefirstoneupgradedlast.
10. Upgradeeachshard,oneatatime:
a. Upgradesecondaries,oneatatime:
i. Shut down the mongod
ii. Replace the 2.4 binary with the 2.6 binary
iii. Restart mongod
iv. Wait for member to recover to SECONDARY state
b. Stepdowntheprimary
c. WaitforanothermembertobeelectedtoPRIMARY
d. Upgradepreviousprimary
11. turnbackonthebalancer
12. Automation ≈ Self-Driving Car
● Tell it where you want to go.
o It goes there
● If you steer manually?
o It reroutes and goes toward your destination
● Bridge out?
o It reroutes and goes toward your destination
● Want to go somewhere else?
o It reroutes and goes toward your new destination
13. Architecture Introduction
● One agent per machine
● Agent talks to:
o MMS
instructions
o Mongo
sense
control
● No agent-agent
● Single executable
MMS
MongoD
MongoS
Config Server
Auto. Agent
MongoD
MongoS
Config Server
Auto. Agent
16. What Automation Knows
1. What you want
a. from MMS
2. What’s actually on the cluster
a. from inspecting the cluster
3. How to do various things
a. Like “start Mongod” or “init replica set”
b. from MongoDB documentation
c. preconditions: what has to be true before.
d. postconditions: what should be true after.
17. Planning
● Look at what you want vs. what you have
● If they’re the same, yay!
o Check again in 30 seconds
● Else, make a plan to fix what you have
o Basic robotics-style planning problem
o precondition/postcondition make this work.
18. Executing Plans
● Follow the plan, one move at a time.
● Each move has a bunch of actions
● Check expected vs. reality for each action
o Check preconditions before doing an action
o Check postconditions afterwards
o If they’re different, go make a new plan!
● How could they be different?
o machine crash
o killing/modifying a mongo instance by hand
19. Automation
● Instructions from you
● Information from your cluster
● Expertise about how to run Mongo
● Patience to get all the details right
21. What is an oplog?
● A capped collection that stores an ordered
history of logical writes to a MongoDB database
● Enables replication
22. Why resize the oplog?
1. You may want to increase its size to
accommodate a high write rate and/or high
replication lags to secondaries.
2. You may want to decrease its size to save on
disk space.
24. How to resize an oplog
1. Shutdown a replica set member
2. Start the member in standalone mode, listening on a different port than it usually does
3. Back up the last oplog entry
4. Drop the oplog
5. Recreate the oplog with its new size
6. Insert the saved oplog entry
7. Stop the member
8. Start the member with normal options
9. Repeat Steps 1 - 8 for each of the remaining replica set members
27. MongoDB Management Service Engineering Team, MongoDB
Cailin Nelson, Cadran Cowansage, Louisa Berger, Bard Bloom, Tim
Olsen
#MongoDBWorld
Questions
Notas del editor
Hello, my name is Cailin and I am the VP Engineering for the MongoDB Management Service (or MMS).
Many of you have probably seen or used MMS in the past, but for those of you that have not, let’s start with a quick review of what MMS is about.
MMS is the MongoDB Management Service - a web-based management application that makes it easy and reliable to run MongoDB at scale.
There are three components: Monitoring, Backup and now Automation. MMS can be used in the Cloud (mms.mongodb.com) or installed in your data center.
Automation is a big addition to MMS, and we have a great team working on it. For today’s talk we thought it would be fun to introduce you not only to the product, but also to the engineers that made it happen. As this talk progresses, I’ll be introducing them to you along the way.
Many MongoDB operations at scale require following a fairly detailed number of steps, that scale linearly with the number of nodes in your cluster. I’m thinking about operations like deploying a new sharded cluster, upgrading a sharded cluster, or resizing the oplog for all nodes in a replica set.
Some of these are easier to script (or manage with Chef/Puppet) than others. For example, deploying a new sharded cluster is relatively fun and easy to script. We like to joke that every MongoDB engineer has written their own “bring up a sharded cluster on EC2 script”.... twice.
However, other operations such as upgrading a sharded cluster are much more difficult to script. There are many steps along the way that require you to wait for another process to achieve some particular state, and writing a script or Chef/Puppet config that handles the waiting and state checking is relatively challenging.
The other difficulty in scripting something like a sharded cluster upgrade is handling things that go wrong. What if one node in one shard of your 10 shard cluster is down at the time you wish to upgrade? Do you stop the whole process and bring it back up, then try again? Do you skip it for now? If you skip it, how do you ensure that you remember to come back and upgrade it later?
MMS Automation is designed to handle all these complex interactions and failure modes, therefore removing a lot of the work in operating MongoDB. What you are about to see is dramatically different from the way things are today.
Now its time to dig in to the specifics. This talk is organized into a series of demos. In order to ensure we get through everything we’ve planned, we’re going to ask you to hold your questions to the end!
For our first demo, I’d like to introduce Cadran Cowansage, who built our integration with Amazon AWS. Cadran joined MongoDB as a veteran of several NYC startups - and has built the component of this system which is optimized for getting startups off the ground and onto MongoDB faster. Cadran will demonstrate using MMS Automation with MMS Provisioning to build a sharded cluster on AWS entirely via the MMS API.
Thank-you, Cadran.
For this section of our talk, I’d like to introduce Louisa Berger. Louisa joined MongoDB two years ago, and has been working on MMS Automation ever since. Louisa has watched this project grow from a prototype to its current state on the cusp of release. Louisa is going to continue from where Cadran left off…. now that you have a cluster up and running, how do you manage its lifecycle from here.
Steps To Upgrade :
- Run db.upgradeCheckAllDBs() to check data set for compatibility
- Resolve any incompatibilites in your deployment
- Upgrade authentication model :
- ensure that at least one user exists in the admin db
- upgrade client libs before any mongod or mongos instances
- upgrade all mongoDB processes before upgrading the auth model
- Download mongoDB binaries for 2.6
- For each RS :
- For a cluster :
1. disable the balancer
2. upgrade the cluster’s meta data
3. wait for each mongos to exit on completion of --upgrade
4. upgrade all mongos processes
5. upgrade each config server, one at a time
6. upgrade each shard, one at a time :
- upgrade secondaries, one at a time:
- shut down the mongod
- replace the 2.4 binary with the 2.6 binary
- restart mongo
- wait for member to recover to SECONDARY state
- step down the primary
- wait for another member to be elected to PRIMARY
- shut down previous primary
- replace the 2.4 binary with the 2.6 binary
- restart mongo
- wait for member to recover to SECONDARY state
7. turn back on the balancer
Now that we’ve had a quick taste of what MMS Automation can do, we’re going to pause for a peek under hood and talk a little bit about how this works. For this section of our presentation, I’d like to introduce Bard Bloom. Bard joined MongoDB to work on MMS Automation after 18 years at IBM Research. Bard’s work has focused on the central planning machinery that composes the brains of MMS Automation.
Personal note -- how cool it is to be doing AI for MongoDB!
For our last demo, I’d like to introduce Tim Olsen. Tim joined MongoDB and this project about a year ago. Tim previously worked at some of the internet’s largest content providers, including Akamai and Limewire. I like to think of Tim as our team specialist in “Things That Go Wrong In The Real World”.
But, since nothing could possibly go wrong in a live demo, Tim is instead going to demonstrate a feature that shows how MMS Automation makes real life with MongoDB quite a bit easier.
One of the more amusing features of running MMS is that we use MMS… to run MMS. That is, we use MMS to Monitor the MongoDBs in which your MMS data is stored. We use MMS to Backup these MongoDBs. And now, we use Automation to manage these MongoDBs.
The MMS system is composed of many replica sets, and one large sharded cluster which processes roughly 3 billion updates per day. We try to keep MMS on the latest and greatest versions of MongoDB, but today I’ve purposefully let one replica set fall a version behind so that we could have brief demo of how we run MongoDB… at MongoDB.
And that concludes our demonstration of MMS Automation. At this point, we’ll take questions.