Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Continuous Deployment: The Dirty Details

Continuous Deployment: The Dirty Details

Descargar para leer sin conexión

Presented at ALM Summit 3 in Redmond, WA. January 2013.

Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.

http://www.etsy.com/careers

Presented at ALM Summit 3 in Redmond, WA. January 2013.

Like what you've read? We're frequently hiring for a variety of engineering roles at Etsy. If you're interested, drop me a line or send me your resume: mike@etsy.com.

http://www.etsy.com/careers

Más Contenido Relacionado

Libros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Continuous Deployment: The Dirty Details

  1. 1. CONTINUOUS DEPLOYMENT The Dirty Details Mike Brittain ENGINEERING DIRECTOR @mikebrittain mike@etsy.com
  2. 2. CONTINUOUS DEPLOYMENT The Dirty Details “OK, sounds cool. But I have some questions...”
  3. 3. CD 100- & 200-levels - CI environment for automated tests - Committing to trunk - Branching in code - Config flags (a.k.a. feature flags) - DevOps mentality - Metrics and alerting - Automated deploy scripts credit: photobookgirl (flickr)
  4. 4. CD 100- & 200-levels - CI environment for automated tests - Committing to trunk - Branching in code - Config flags (a.k.a. feature flags) - DevOps mentality CD 300 level - Metrics and alerting - Deploys vs. releases - Automated deploy scripts - Decoupled systems, schema changes - How we work: Arch. & Process - Integration and Operations credit: photobookgirl (flickr)
  5. 5. www. .com
  6. 6. GROSS MERCHANDISE SALES http://www.etsy.com/blog/news/2013/notes-from-chad-2012-year-in-review/
  7. 7. DECEMBER 2012 1.5 Billion page views $117 Million of goods sold 6 Million items sold Items by anjaysdesigns, betwixxt, OneStarLeatherGoods, mediumcontrol, TheDesignPallet http://www.etsy.com/blog/news/2013/etsy-statistics-december-2012-weather-report/
  8. 8. 175+ Committers, everyone deploys credit: martin_heigan (flickr)
  9. 9. DEPLOYMENTS PER DAY Very end of 2009 Today
  10. 10. Continuous delivery is a pattern language in growing use in software development to improve the process of software delivery. Techniques such as automated testing, continuous integration, and continuous deployment allow software to be developed to a high standard and easily packaged and deployed to test environments, resulting in the ability to rapidly, reliably and repeatedly push out enhancements and bug fixes to customers at low risk and with minimal manual overhead. ~wikipedia credit: Stewart, redgen (flickr)
  11. 11. Architecture Stack Linux, Apache, MySQL, PHP Memcache, Gearman, Postgresql, Solr, Java, Traffic Server, Hadoop, HBase Git, Jenkins credit: Saire Elizabeth (flickr)
  12. 12. Then Now 2009 2010-today Just before we started using CD
  13. 13. Then Now 6-14 hours 15 mins “Deployment Army” 1 person Special event and Part of everyday highly orchestrated workflow
  14. 14. Then Now Blocked for Blocked for 6-14 hours. 15 minutes. 6+ hours to 15 minutes to redeploy redeploy
  15. 15. Then Now Release branch, Mainline, database schemas, minimal linking data transforms, and building, packaging, rsync, rolling restarts, site up cache purging, scheduled downtime
  16. 16. 1st day Put your face on Etsy.com.
  17. 17. 2nd day Complete tax, insurance, and benefits forms. credit: ktpupp (flickr)
  18. 18. WARNING
  19. 19. GROSS MERCHANDISE SALES
  20. 20. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30+ deploys per day.
  21. 21. “Wow... 30 deploys a day. How do you build features so quickly?”
  22. 22. Software Deploy ≠ Product Launch
  23. 23. Deploys frequently gated by config flags (“dark” releases)
  24. 24. $cfg[‘new_search’] = array('enabled' => 'off'); $cfg[‘sign_in’] = array('enabled' => 'on'); $cfg[‘checkout’] = array('enabled' => 'on'); $cfg[‘homepage’] = array('enabled' => 'on');
  25. 25. $cfg[‘new_search’] = array('enabled' => 'off');
  26. 26. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... # old and boring search $results = do_grep();
  27. 27. $cfg[‘new_search’] = array('enabled' => 'off'); // Meanwhile... if ($cfg[‘new_search’] == ‘on’) { # New and fancy search $results = do_solr(); } else { # old and boring search $results = do_grep(); }
  28. 28. $cfg[‘new_search’] = array('enabled' => 'on'); // or... $cfg[‘new_search’] = array('enabled' => 'staff'); // or... $cfg[‘new_search’] = array('enabled' => '1%'); // or... $cfg[‘new_search’] = array('enabled' => 'users', 'user_list' => 'mike,john,kellan');
  29. 29. Validate in production, hidden from public.
  30. 30. What’s in a deploy? Small incremental changes to the application New classes, methods, controllers Graphics, stylesheets, templates Copy/content changes Turning flags on, off, or % ramp up
  31. 31. Low MTTR (response times) Latent bugs and security holes Traffic management, load shedding Adding and removing infrastructure Tweaking config flags or releasing patches.
  32. 32. http://www.flickr.com/photos/flyforfun/2694158656/
  33. 33. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  34. 34. Favorites “on”
  35. 35. Favorites “off”
  36. 36. Many deploys eventually lead to a product launch.
  37. 37. “How do you continuously deploy database schema changes?”
  38. 38. Code deploys ~15-20 minutes Schema changes
  39. 39. Code deploys ~15-20 minutes Schema changes THURSDAYS!
  40. 40. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work)
  41. 41. Our web application is largely monolithic. Etsy.com, Support & Back-office tools, Developer API, Gearman (async work) PHP, Apache, Memcache
  42. 42. External “services” are not deployed with the main application. e.g. Databases, Search, Photo storage, Payments
  43. 43. External “services” are not deployed with the main application. e.g. Databases, Search, Photo storage, Payments MYSQL PCI PROXY CACHE, (schema changes) (controlled access) FILERS, AMAZON S3 SOLR, JVM (specialized infra.) (rolling restarts)
  44. 44. For every config flag, there are two states we can support — present and future.
  45. 45. For every config flag, there are two states we can support — present and future. ... or past and present.
  46. 46. “Non-Breaking Expansions” Expose new version in a service interface; support multiple versions in the consumer.
  47. 47. Example: Changing a Database Schema Merging “users” and “users_prefs”
  48. 48. C RULE OF THUMB: Prefer ADDs over ALTERs (non-breaking expansion)
  49. 49. 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  50. 50. 0. Add new version to schema 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  51. 51. 0. Add new version to schema Schema change to add prefs columns to “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “off” “read_prefs_from_users_table” => “off”
  52. 52. 1. Write to both versions Write code for writing prefs to the “users” table. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “off”
  53. 53. 2. Backfill historical data Offline process to sync existing data from “user_prefs” to new columns in “users”
  54. 54. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “staff”
  55. 55. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “1%”
  56. 56. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “5%”
  57. 57. 3. Read from new version Data validation tests. Ensure consistency both internally and in production. “write_prefs_to_user_prefs_table” => “on” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on” (“on” == “100%”)
  58. 58. 4. Cut-off writes to old version After running on the new table for a significant amount of time, we can cut off writes to the old table. “write_prefs_to_user_prefs_table” => “off” “write_prefs_to_users_table” => “on” “read_prefs_from_users_table” => “on”
  59. 59. “Branch by Astraction” Controller Controller Users Model (Abstraction) “users” (old) “user_prefs” “users” old schema new schema http://paulhammant.com/blog/branch_by_abstraction.html http://continuousdelivery.com/2011/05/make-large-scale-changes-incrementally-with-branch-by-abstraction/
  60. 60. “The Migration 4-Step” 1. Write to both versions 2. Backfill historical data 3. Read from new version 4. Cut-off writes to old version
  61. 61. “When do you clean up all of those config flags?
  62. 62. We might remove config flags for the old version when... It is no longer valid for the business. It is no longer stable, maintained, or trusted. It has poor performance characteristics. The code is a mess, or difficult to read. We can afford to spend time on it.
  63. 63. Promote “dev flags” to “feature flags”
  64. 64. // Feature flag $cfg[‘mobilized_pages’] = array('enabled' => 'on'); // Dev flags $cfg[‘mobile_templates_seller_tools’] = array('enabled' => 'on'); $cfg[‘mobile_templates_account_tools’] = array('enabled' => 'on'); $cfg[‘mobile_templates_member_profile’] = array('enabled' => 'on'); $cfg[‘mobile_templates_search’] = array('enabled' => 'off'); $cfg[‘mobile_templates_activity_feed’] = array('enabled' => 'off'); ... if ($cfg[‘mobilized_pages’] == ‘on’ && $cfg[‘mobile_templates_search’] == ‘on’) { // ... // ... }
  65. 65. // Feature flags $cfg[‘search’] = array('enabled' => 'on'); $cfg[‘developer_api’] = array('enabled' => 'on'); $cfg[‘seller_tools’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site’] = array('enabled' => 'on');
  66. 66. // Feature flags $cfg[‘search’] = array('enabled' => 'on'); $cfg[‘developer_api’] = array('enabled' => 'on'); $cfg[‘seller_tools’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site’] = array('enabled' => 'on'); $cfg[‘the_entire_web_site_no_really_i_mean_it’] = array('enabled' => 'on');
  67. 67. Architecture and Process
  68. 68. Deploying is cheap.
  69. 69. Releasing is cheap.
  70. 70. Some philosophies on product development...
  71. 71. Gathering data should be cheap, too. staff, opt-in prototypes, 1%
  72. 72. Treat first iterations as experiments.
  73. 73. Get into code as quickly as possible.
  74. 74. “Where a new system concept or new technology is used, one has to build a system to throw away, for even the best planning is not so omniscient as to get it right the first time. Hence plan to throw one away; you will, anyhow.” ~ Fred Brooks, The Mythical Man-Month
  75. 75. Architecture largely doesn’t matter.
  76. 76. Kill things that don’t work.
  77. 77. Your assumptions will be wrong once you’ve scaled 10x.
  78. 78. “We don’t optimize for being right. We optimize for quickly detecting when we’re wrong.” ~Kellan Elliott-McCrea, CTO
  79. 79. Become really good at changing your architecture.
  80. 80. Invest time in architecture by the 2nd or 3rd iteration.
  81. 81. Integration and Operations
  82. 82. WARNING REMEMBER THIS?
  83. 83. Continuous Deployment Small, frequent changes. Constantly integrating into production. 30 deploys per day.
  84. 84. Code review before commit
  85. 85. Automated tests before deploy
  86. 86. Why Integrate with Production?
  87. 87. Dev ≠ Prod
  88. 88. Verify frequently and in small batches.
  89. 89. Integrating with production is a test in itself. We do this frequently and in small batches.
  90. 90. More database servers in prod. Bigger database hardware in prod. More web servers. Various replication schemes. Different versions of server and OS software. Schema changes applied at different times. Physical hardware in prod. More data in prod. Legacy data (7 years of odd user states). More traffic in prod. Wait, I mean MUCH more traffic in prod. Fewer elves. Faster disks (SSDs) in prod.
  91. 91. Using a MySQL database in dev for an application that will be running on Oracle in production: Priceless
  92. 92. Verify frequently and in small batches.
  93. 93. Dev ⇾ QA ⇾ Staging ⇾ Prod
  94. 94. Dev ⇾ QA ⇾ Staging ⇾ Prod
  95. 95. Dev ⇾ Pre-Prod ⇾ Prod
  96. 96. Test and integrate where you’ll see value.
  97. 97. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100%
  98. 98. Config flags (again) off, on, staff, opt-in prototypes, user list, 0-100% “canary pools”
  99. 99. Automated tests after deploy
  100. 100. Real-time metrics and dashboards Network & Servers, Application, Business
  101. 101. SERVER METRICS Apache requests/sec, Busy processes, CPU utilization, Script exec time (med. & 95th) APPLICATION METRICS Logins, Registrations, Checkouts, Listings created, Forum posts Time and event correlated.
  102. 102. Real humans reporting trouble!
  103. 103. “Theoretical” vs. “Practical” Engineering
  104. 104. Config flags Operator Metrics http://www.flickr.com/photos/flyforfun/2694158656/
  105. 105. Managing risk during Holiday Shopping season Thanksgiving, “Black Friday,” “Cyber Monday” ➔ Christmas (~30 days) Code Freeze?
  106. 106. DEPLOYMENTS PER DAY “Code Slush”
  107. 107. Tighten your feedback cycles Integrate with production and validate early in cycle. Use tools that allow you to detect issues early. Optimize for quick response times. Applied to both feature development and operability.
  108. 108. Thank you ... and questions? These slides will be available later today at http://mikebrittain.com/talks Mike Brittain ENGINEERING DIRECTOR @mikebrittain mike@etsy.com

×