At Netflix, we continue to improve upon our continuous delivery process. We thrive in a hybrid environment, where every developer is able to deploy code, and with that freedom comes the responsibility for ensuring that our customers are not negatively impacted. We have constructed Open Source tools toward a Continuous Delivery solution. In this presentation, from QConSF 2013, you will learn about our tool chain so that you can determine which make sense in your environment.
4. Teams Deploy Their Own Code
Run What You Wrote
Rapid Innovation
Rapid Detection
Rapid Response
= Freedom + Responsibility
http://www.slideshare.net/garethbowles/self-servicebuilddeploymentagile2013
5. BUILD
Jenkins Job DSL
Configuration as Code
Groovy Script
Scripts go in Version Control
http://www.slideshare.net/quidryan/configuration-as-code
7. BAKE
Aminator
• Create AMI from Base AMI
• Image contains service and
everything needed to run it
• Unit of Deployment for Test and
Prod
• Abstracts Cloud Details
http://techblog.netflix.com/2013/03/ami-creation-with-aminator.html
10. CANARY ANALYSIS
Test, Int, Prod
Choose where to deploy
Run canary analysis
Scale up new instances
Turn on traffic to new ASG
Turn off traffic to old ASG
Wait … analyze … continue
12. GLISTEN
Extending Asgard’s Workflow
Automated Red/Black Push
Test, Int, Prod stacks
Run canary/analysis
Scale up new instances
Turn on traffic to new ASG
Run more tests
Turn off traffic to old ASG
Wait … analyze … continue
http://techblog.netflix.com/2013/09/glisten-groovy-way-to-use-amazons.html
17. Multi-Region Consistency
Build Tooling to:
Schedule
Deployments
Prefer off peak
Choose next available
region automatically
Provide high visibility
per region
18. Send in the Conformity Monkey
Have deployments diverged?
Balance regional consistency
with regional isolation
Provide meaningful
thresholds
Build best practices into
tooling and reporting
http://techblog.netflix.com/2013/05/conformity-monkey-keeping-yourcloud.html
20. Key Elements for Netflix
Value Self-service
Test Everywhere
Build Awareness of Multiple Regions
Avoid peak times
Roll back quickly and easily
Be Cloud Native
21. Put NetflixOSS to Work for You
Netflix Platform
AMINATOR
** And 30+ more projects at http://netflix.github.io/
22. Keep the Conversation Going
Continuous Delivery
Open Space
Ballroom B/C (here!)
1:35-2:25, immediately
following lunch
Overview. Build, Bake, and DeployTesting.Monkeys: resilient to behaviors inherent in the cloudLeave with understanding of tools that we’ve built and open source.How you might be able to modify, augment or create
Innovate quicklythink outside of the box deploy solutions.Keep promise of availabilityEncourage best practicesrecommendations, not limitations.
Deploy to ProductionBalance innovation with riskSelf-service is scalableDon’t fix build configs, deployGareth Bowles, Agile 2013
Teams have unique flows.Let developers write codeJustin RyanJenkins Job DSL- pluginJava Posse Roundupfoundation for our build configurations at Netflix
Amazon Machine Images (AMIs)Aminate: source component is combined with another component to make something new
BaseAMI : common to all of our microservicesDeploy same image to test, prod, all regionsOther cloud platformsNetflixOSS logo
Self-serviceGroovy appRed/BlackGo through example
Don’t replace cluster.Spin up a new one.Canary/ ACA. Find problem or continueCloud native. Use the cloud.
Scale up.Leave old cluster.Run through peak?Developer knows best
Groovy library that sits on top of SWFClay McCoyStart with a GAllow flow to shineActivity: element of reuse for our deployments. Builds on lessons from manual red/black deployments.
Stop now?Complete picture with runtime resiliencyAutomate all the things
Danger. Chaos ensues. Instances disappear.Latency happens. Litter.Find problems/build resiliencyIntroduce a fewMore ideas, need staff to build them!Look at vulnerabilities
Should we push to everywhere at once?
Multiple regions.Errors sometimes make it to productionLimit impactCost: innovation and speedDriftIncrease cognitive load
scheduled deployments. button push signifies the scheduling not necessarily the actual push. Providing visibility of what is deployed where, tied back to a Jenkins buildreduces that cognitive load.
We can do better. Look across regions:DriftDon’t nag.Use meaningful thresholds.Ask monkeys to help us test our runtimeBalance regional consistency with regional isolation.
Pay for only those instances we needDon’t bother developers with what automation can do better.
Full circleCode checkin to monkeysBalanced priorities
Can you use any of these elements?Share Cloud InfrastructureSolve our business problems!