This document summarizes Eric Holscher's talk on safely deploying software on the cutting edge. The talk discusses Urban Airship's deployment process, including using Git for version control, maintaining a QA environment, designing services as composable pieces, and automating deployments using tools like Fabric. It also covers verifying deployments by routing traffic to new instances and checking metrics to identify issues before notifying customers.
5. Process
• Deploy out of git
• Standard git-based production/master branch model
• Production branch has releases are tagged with timestamp
• deploy-2011-08-03_14-37-38
• Feature branches
• http://nvie.com/posts/a-successful-git-branching-model/
Wednesday, September 7, 2011
6. Features
• Easily allows you to hot-fix production
• Keep a stable master
• Run CI on the master branch or long-lived feature branches
Wednesday, September 7, 2011
7. Services
• Everything that we deploy is conceptualized as a service
• Services all live in /mnt/services/<slug> (Thanks ec2)
• A service is an instance of a repository on a machine
• A repository might have multiple services
• eg. Airship deployed into “celery” and “web” services
• This maps really well onto Chef cookbooks
Wednesday, September 7, 2011
8. QA Environment
• Run all of your master branches
• Allow you to get a copy of what will become production
• Catch errors before they are seen by customers
• Spawn new ones for long-lived feature branches
• `host web-0` and figure out based on IP
Wednesday, September 7, 2011
10. Jump machine
• Have a standard place for all deployments to happen
• Log all commands run
Wednesday, September 7, 2011
11. No External Services
• Chishop
• No external server required to deploy code
• All branches are checked out on an admin server
Wednesday, September 7, 2011
12. Services look the same
• Python
• Java
• “Unix”
Wednesday, September 7, 2011
13. Composable
• Small pieces that you can build into better things
• Useful when trying to do something you didn’t plan for
Wednesday, September 7, 2011
15. Environment
• Where code lands on the remote machine
• Mimics a chroot
• Uses virtualenv & supervisord
• Owned by the service-user
• Managed by Chef
Wednesday, September 7, 2011
19. SCRIPT_DIR=$(dirname $0)
SERVICE_DIR=$(cd $SCRIPT_DIR && cd ../ && pwd)
cd $SERVICE_DIR
supervisorctl pid > /dev/null 2>&1
if [ "$?" != "0" ]; then
echo "Supervisord not running, starting."
supervisord
else
echo "Supervisord running, starting all processes."
supervisorctl start all
fi
cd - > /dev/null 2>&1
Wednesday, September 7, 2011
20. Bin scripts
• All of the process-level binscripts wrap supervisord
• bin/start -> supervisordctl start all
• bin/start foo -> supervisorctl start foo
• bin/stop -> supervisorctl stop all
• bin/stop shutdown -> supervisorctl shutdown
Wednesday, September 7, 2011
22. Init.d
• All services share a common init.d script
• This init.d script calls into the service’s bin/
• /etc/init.d/airship start -> /mnt/services/airship/bin/start
Wednesday, September 7, 2011
23. SERVICE_USER='<%= @service %>'
SERVICE_NAME='<%= @service %>'
SERVICE_PATH=/mnt/services/$SERVICE_NAME
set -e
RET_CODE=0
case "$1" in
start)
sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/start
RET_CODE=$?
;;
stop)
sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/stop
RET_CODE=$?
;;
restart)
sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/restart
RET_CODE=$?
;;
status)
sudo su - $SERVICE_USER -c $SERVICE_PATH/bin/status
RET_CODE=$?
;;
*)
echo "$SERVICE_NAME service usage: $0 {start|stop|restart|status}"
;;
esac
exit $RET_CODE
Wednesday, September 7, 2011
27. Pull
• Update the code from the source repository
• Defaults to the “production” branch
• def pull(repo=None, ref='origin/production')
• Can pass in a specific revision/branch/tag/hashish
• local('git reset --hard %s' % ref, capture=False)
Wednesday, September 7, 2011
28. Build
• Could be called “prepare”
• Do local-specific things to get repo into a ready state
• Mostly used for compiling in java-land
• Useful in Python for running pre-install tasks
Wednesday, September 7, 2011
29. Tag
• Set a tag for the deploy in the git repo
• If the current commit already has a tag, use that instead
• git tag --contains HEAD
• deploy-2011-08-03_14-37-38
• strftime('%Y-%m-%d_%H-%M-%S')
Wednesday, September 7, 2011
30. Sync
• Move the code from the local to the remote box
• Uses rsync to put it into the remote service directory
• Also places a copy of the synced code on the admin box
Wednesday, September 7, 2011
31. Install
• Make the code the active path for code on the machine
• This is generally installing code into a virtualenv
• Updating the “current” symlink in the service directory
• Symlink Django settings file based on environment
Wednesday, September 7, 2011
32. Rollback
• When you break things, you need to undo quickly
• Reset the repository to the previous deployed tag
• git tag | grep deploy| sort -nr |head -2 |tail -1
• Deploy that
• Very few moving pieces
Wednesday, September 7, 2011
33. Start/Stop/Reload
• Allow you to bounce services as part of deployment
• Allow reload for services that support it
Wednesday, September 7, 2011
34. CLI UI
• Have nice wrapper commands that do common tasks
• deploy host:web-0 full_deploy:airship
➡ pull, build, tag, sync, install
• deploy host:web-1 deploy:airship
➡ tag, sync, install
• deploy host:web-2 sync:airship
➡ sync
Wednesday, September 7, 2011
36. #!/bin/bash
cd ~/airdeploy
DATE=$(date +%Y_%-m_%-d-%H-%m-%s)
echo "deploy" $@ > logs/$DATE.log
fab $@
cd - > /dev/null 2>&1
Wednesday, September 7, 2011
37. Meta-commands
• Hard-code the correct deployment behavior
• “Make easy things easy, and wrong things hard”
• Knows what machine each service is deployed to
• deploy airship
➡ deploy pull:airship
➡ deploy type:web deploy:airship
Wednesday, September 7, 2011
39. Magicifying
• Now that we have a solid base, we can automate on top
• When you do a meta deploy, it should be a “smart deploy”
Wednesday, September 7, 2011
40. Workflow
• Deploy to one web server, preferably with one worker
• Restart it
• Run it against heuristics to determine if it’s broken
• If it’s broken, rollback, otherwise continue on
Wednesday, September 7, 2011
41. Heuristics
• Any 500s
• Number of 200s to non-200s
• Number of 500s to 200s
• Requests a second
• Response time
• $$$ (Business metrics)
Wednesday, September 7, 2011
42. How it works
• Tell load balancer to take machine out of pool
• /take_me_out_of_the_lb -> 200
• Start your code with 1 worker and a different port
• supervisorctl start canary
• Expose metrics from your services over json
• Make sure your load balancer weights it appropriately
• Poll your metrics for X time before considering it functional
Wednesday, September 7, 2011
43. Thanks
• Alex Kritikos
• Erik Onnen
• Schmichael
Wednesday, September 7, 2011
44. Questions?
• Eric Holscher
• Urban Airship (Hiring and whatnot)
• eric@ericholscher.com
Wednesday, September 7, 2011