SlideShare una empresa de Scribd logo
1 de 73
Descargar para leer sin conexión
Shitlist-driven development
and othertricks forworking on large codebases
FLOR IAN WE INGARTEN
flo@shopify.com
@fw1729
3
4
“Programmers at work maintaining a Ruby on Rails application”
(Classic Programmer Paintings)
5
• >400k shops (multi-tenant architecture).
• 20k-40k RPS (80k RPS peak).
• ~800 contributors (developers, designers, …)
• Everybody can merge to master and deploy to production.
• 40-50 deploys (50-100 PRs) shipped to production per day.
The Shopify Monolith
6
MONOLITH AT SCALE
PRODU CTIVI TY PROBLEM 1:
DEPLOYS BECOME A BOT TLENECK
7
Deploy bottleneck: Speed
7
• More people => more PRs => more deploys or bigger deploys.
Deploy bottleneck: Speed
7
• More people => more PRs => more deploys or bigger deploys.
• Small deploys: Fewer changes at once is safer, easier to debug, etc.
Deploy bottleneck: Speed
7
• More people => more PRs => more deploys or bigger deploys.
• Small deploys: Fewer changes at once is safer, easier to debug, etc.
• Observation: If you want small and often, you need fast.
Deploy bottleneck: Speed
7
• More people => more PRs => more deploys or bigger deploys.
• Small deploys: Fewer changes at once is safer, easier to debug, etc.
• Observation: If you want small and often, you need fast.
• Shopify: 40-50 deploys/day, that’s ~6 per (business) hour. If deploys
become slower than ~10min, they become a productivity problem for
us.
Deploy bottleneck: Speed
8
Deploy bottleneck: Speed
8
• Parallel CI builds.
Deploy bottleneck: Speed
8
• Parallel CI builds.
• Build containers in advance and quickly.
Deploy bottleneck: Speed
8
• Parallel CI builds.
• Build containers in advance and quickly.
• Avoid booting application multiple times during container builds.
Deploy bottleneck: Speed
8
• Parallel CI builds.
• Build containers in advance and quickly.
• Avoid booting application multiple times during container builds.
• Deploy to many servers in parallel.
Deploy bottleneck: Speed
8
• Parallel CI builds.
• Build containers in advance and quickly.
• Avoid booting application multiple times during container builds.
• Deploy to many servers in parallel.
• Reduce application boot time.
Deploy bottleneck: Speed
8
• Parallel CI builds.
• Build containers in advance and quickly.
• Avoid booting application multiple times during container builds.
• Deploy to many servers in parallel.
• Reduce application boot time.
• Reduce application shutdown time (e.g. Unicorn timeout, …).
Deploy bottleneck: Speed
9
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
• Asking people to decide when a good time to deploy is doesn’t scale.
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
• Asking people to decide when a good time to deploy is doesn’t scale.
• Asking everyone to pay attention to master CI doesn’t scale.
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
• Asking people to decide when a good time to deploy is doesn’t scale.
• Asking everyone to pay attention to master CI doesn’t scale.
• Asking everyone to pay attention to errors during a deploy doesn’t scale.
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
• Asking people to decide when a good time to deploy is doesn’t scale.
• Asking everyone to pay attention to master CI doesn’t scale.
• Asking everyone to pay attention to errors during a deploy doesn’t scale.
• Asking developers to deploy themselves doesn’t scale.
Deploy bottleneck: Humans
9
• Asking ops team to deploy doesn’t scale.
• Asking people to decide when a good time to deploy is doesn’t scale.
• Asking everyone to pay attention to master CI doesn’t scale.
• Asking everyone to pay attention to errors during a deploy doesn’t scale.
• Asking developers to deploy themselves doesn’t scale.
• Humans don’t scale. Automate!
Deploy bottleneck: Humans
Automatic
deploy
when CI
is passing
Automatic range
lock for reverts
13
MONOLITH AT SCALE
PRODU CTIVI TY PROBLEM 2:
TOO M ANY COOKS IN T HE K ITCH EN
Yay, everything is fixed!
Someone “unfixed" it
Someone added new shit
In the meantime …
Someone “unfixed" it
Someone added new shit
In the meantime …
Too many cooks in the kitchen! !
Have to fix everything at once now :-(
Have to fix everything at once now :-(
Idea: Can we raise only
for B but not for C?
Still shitlisted.
Fixed. Can’t be accidentally “unfixed".
Still shitlisted.
Fixed. Can’t be accidentally “unfixed".
Still shitlisted.
Only B is allowed to do it wrong.
No new shit can be introduced.
Problems:
- Not always possible to change the API.
- Sometimes you want different “granularity".
Granularity is now at the
web request and job level
Granularity is now at the
web request and job level
All jobs and all requests are now
“registering" themselves so the shitlist
can verify which codepaths are allowed
to call the deprecated code.
26
• Great for changing very “broad" behaviour.
• Great for breaking down a huge task into many small chunks.
• Great for generating “To-Do lists”.
• Great for “educating" a large team about how you want them to
write code and enforcing the new behaviour.
Shitlists
27
• Bad error message: “Someone decided that the thing that worked
yesterday is now wrong. Good luck fixing it yourself.”
• Good error message:

“Your code tried to make an HTTP request within a MySQL database

transaction. This has been deprecated since it can negatively impact

database performance. Using after_commit instead of after_save
is often a good fix. If you need more help, please come see us in Slack
in the #database-team channel.”
Shitlist error messages
28
MONOLITH AT SCALE
PRODU CTIVI TY PROBLEM 3:
UN RELIA BLE TEST S
29
Unlikely problems become likely at scale
• Unreliable test: On the same version of the code, the test
sometimes passes and sometimes fails.
• Shopify: About 750 CI runs per day, ~10 min and ~70k tests each.
• If only a single one of those 70k tests is unreliable and fails 1% of the
time, we lose over 1 hour of productivity per day.
30
Types of unreliable tests
Flaky test: time-dependent, load-dependent, …
Leaky test: order-dependent (test B fails if test A ran first)
Automatic test grind
Automatic leaky test "bisect"
• Take list of all tests that ran before the failing test.
• Binary search through list of candidates.
34
TL;DR
SUMM ARY A ND K EY TAKEAWAYS
35
Summary: Monolith productivity at scale
• Productivity problem 1: Deploys.
• Solution: Often and small. Make them fast and automate everything.
• Productivity problem 2: Too many cooks in the kitchen.
• Solution: Shitlist-driven development.
• Productivity problem 3: Unreliable tests.
• Solution: Tracking and alerting. Bisect and grind. Automation.
Thanks! Questions?
FLOR IAN WE INGARTEN
flo@shopify.com
@fw1729

Más contenido relacionado

La actualidad más candente

Devops at Startup Weekend BXL
Devops at Startup Weekend BXLDevops at Startup Weekend BXL
Devops at Startup Weekend BXLKris Buytaert
 
WTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectWTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectSymetris
 
Continuous delivery of your legacy application
Continuous delivery of your legacy applicationContinuous delivery of your legacy application
Continuous delivery of your legacy applicationColdFusionConference
 
The Silver Bullet Syndrome by Alexey Vasiliev
The Silver Bullet Syndrome by Alexey VasilievThe Silver Bullet Syndrome by Alexey Vasiliev
The Silver Bullet Syndrome by Alexey VasilievPivorak MeetUp
 
Crossing the Continuous Delivery Chasm - J. Paul Reed
Crossing the Continuous Delivery Chasm - J. Paul ReedCrossing the Continuous Delivery Chasm - J. Paul Reed
Crossing the Continuous Delivery Chasm - J. Paul ReedAtlassian
 
Digital Success Stack for DCBKK 2018
Digital Success Stack for DCBKK 2018Digital Success Stack for DCBKK 2018
Digital Success Stack for DCBKK 2018Kyvio
 
Take your CFML Legacy Apps to Modernization
Take your CFML Legacy Apps to ModernizationTake your CFML Legacy Apps to Modernization
Take your CFML Legacy Apps to ModernizationOrtus Solutions, Corp
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous DeploymentBrian Moon
 
Create and upload your first Perl module to CPAN
Create and upload your first Perl module to CPANCreate and upload your first Perl module to CPAN
Create and upload your first Perl module to CPANbrian d foy
 
ChefConf 2015 - Chef Retrospective
ChefConf 2015 - Chef RetrospectiveChefConf 2015 - Chef Retrospective
ChefConf 2015 - Chef Retrospectivegwaldo
 
Continuous integration
Continuous integrationContinuous integration
Continuous integrationBasma Alkerm
 
10 Deployments a day - A brief on extreme release protocols
10 Deployments a day - A brief on extreme release protocols10 Deployments a day - A brief on extreme release protocols
10 Deployments a day - A brief on extreme release protocolsVivek Parihar
 
Tech Headline - JavaScript Performance
Tech Headline - JavaScript PerformanceTech Headline - JavaScript Performance
Tech Headline - JavaScript PerformanceRodrigo Castilho
 
DevOps principles and practices - accelerate flow
DevOps principles and practices - accelerate flowDevOps principles and practices - accelerate flow
DevOps principles and practices - accelerate flowMurughan Palaniachari
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Peter Gfader
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stackKris Buytaert
 
DevOps Anti-Patterns
DevOps Anti-PatternsDevOps Anti-Patterns
DevOps Anti-PatternsFernando Ike
 
The benefits of using an APM solution while performance testing
The benefits of using an APM solution while performance testingThe benefits of using an APM solution while performance testing
The benefits of using an APM solution while performance testingDevOpsGroup
 

La actualidad más candente (20)

Devops at Startup Weekend BXL
Devops at Startup Weekend BXLDevops at Startup Weekend BXL
Devops at Startup Weekend BXL
 
WTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal projectWTF: Where To Focus when you take over a Drupal project
WTF: Where To Focus when you take over a Drupal project
 
Continuous delivery of your legacy application
Continuous delivery of your legacy applicationContinuous delivery of your legacy application
Continuous delivery of your legacy application
 
The Silver Bullet Syndrome by Alexey Vasiliev
The Silver Bullet Syndrome by Alexey VasilievThe Silver Bullet Syndrome by Alexey Vasiliev
The Silver Bullet Syndrome by Alexey Vasiliev
 
Crossing the Continuous Delivery Chasm - J. Paul Reed
Crossing the Continuous Delivery Chasm - J. Paul ReedCrossing the Continuous Delivery Chasm - J. Paul Reed
Crossing the Continuous Delivery Chasm - J. Paul Reed
 
Big Websites
Big WebsitesBig Websites
Big Websites
 
Digital Success Stack for DCBKK 2018
Digital Success Stack for DCBKK 2018Digital Success Stack for DCBKK 2018
Digital Success Stack for DCBKK 2018
 
Take your CFML Legacy Apps to Modernization
Take your CFML Legacy Apps to ModernizationTake your CFML Legacy Apps to Modernization
Take your CFML Legacy Apps to Modernization
 
Continuous Deployment
Continuous DeploymentContinuous Deployment
Continuous Deployment
 
Create and upload your first Perl module to CPAN
Create and upload your first Perl module to CPANCreate and upload your first Perl module to CPAN
Create and upload your first Perl module to CPAN
 
Cloud tools
Cloud toolsCloud tools
Cloud tools
 
ChefConf 2015 - Chef Retrospective
ChefConf 2015 - Chef RetrospectiveChefConf 2015 - Chef Retrospective
ChefConf 2015 - Chef Retrospective
 
Continuous integration
Continuous integrationContinuous integration
Continuous integration
 
10 Deployments a day - A brief on extreme release protocols
10 Deployments a day - A brief on extreme release protocols10 Deployments a day - A brief on extreme release protocols
10 Deployments a day - A brief on extreme release protocols
 
Tech Headline - JavaScript Performance
Tech Headline - JavaScript PerformanceTech Headline - JavaScript Performance
Tech Headline - JavaScript Performance
 
DevOps principles and practices - accelerate flow
DevOps principles and practices - accelerate flowDevOps principles and practices - accelerate flow
DevOps principles and practices - accelerate flow
 
Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...Silverlight vs HTML5 - Lessons learned from the real world...
Silverlight vs HTML5 - Lessons learned from the real world...
 
7 tools for your devops stack
7 tools for your devops stack7 tools for your devops stack
7 tools for your devops stack
 
DevOps Anti-Patterns
DevOps Anti-PatternsDevOps Anti-Patterns
DevOps Anti-Patterns
 
The benefits of using an APM solution while performance testing
The benefits of using an APM solution while performance testingThe benefits of using an APM solution while performance testing
The benefits of using an APM solution while performance testing
 

Similar a Shitlist-driven development and other tricks for working on large codebases

Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsRodrigo Campos
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorialduleepa
 
Serverless Toronto helps Startups
Serverless Toronto helps StartupsServerless Toronto helps Startups
Serverless Toronto helps StartupsDaniel Zivkovic
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP120bi
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsAchievers Tech
 
What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?Bill Holtshouser
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughRandy Shoup
 
DevOps is for Everyone - DevOps East
DevOps is for Everyone - DevOps EastDevOps is for Everyone - DevOps East
DevOps is for Everyone - DevOps EastChris Riley ☁
 
Taking ownership of the challenges and problems of owning a grotty API and tu...
Taking ownership of the challenges and problems of owning a grotty API and tu...Taking ownership of the challenges and problems of owning a grotty API and tu...
Taking ownership of the challenges and problems of owning a grotty API and tu...Jexia
 
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Acquia
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.partyKelly Looney
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010Christopher Brown
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Randy Shoup
 
MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubIke Walker
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011Craig Ulliott
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias
 
Progressive Enhancement using WSGI
Progressive Enhancement using WSGIProgressive Enhancement using WSGI
Progressive Enhancement using WSGIMatthew Wilkes
 

Similar a Shitlist-driven development and other tricks for working on large codebases (20)

Velocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOpsVelocity Conference NYC 2014 - Real World DevOps
Velocity Conference NYC 2014 - Real World DevOps
 
Scaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON TutorialScaling a Web Site - OSCON Tutorial
Scaling a Web Site - OSCON Tutorial
 
Serverless Toronto helps Startups
Serverless Toronto helps StartupsServerless Toronto helps Startups
Serverless Toronto helps Startups
 
Dev Ops without the Ops
Dev Ops without the OpsDev Ops without the Ops
Dev Ops without the Ops
 
DevOps Days Ohio
DevOps Days OhioDevOps Days Ohio
DevOps Days Ohio
 
Scaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHPScaling a High Traffic Web Application: Our Journey from Java to PHP
Scaling a High Traffic Web Application: Our Journey from Java to PHP
 
Scaling High Traffic Web Applications
Scaling High Traffic Web ApplicationsScaling High Traffic Web Applications
Scaling High Traffic Web Applications
 
What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?What do the "Cool Kids" know about DevOps?
What do the "Cool Kids" know about DevOps?
 
Minimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good EnoughMinimum Viable Architecture - Good Enough is Good Enough
Minimum Viable Architecture - Good Enough is Good Enough
 
DevOps is for Everyone - DevOps East
DevOps is for Everyone - DevOps EastDevOps is for Everyone - DevOps East
DevOps is for Everyone - DevOps East
 
Taking ownership of the challenges and problems of owning a grotty API and tu...
Taking ownership of the challenges and problems of owning a grotty API and tu...Taking ownership of the challenges and problems of owning a grotty API and tu...
Taking ownership of the challenges and problems of owning a grotty API and tu...
 
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
Story of Multnomah County: Migrating from Vignette and Building a Drupal Ecos...
 
The DevOps Journey at bwin.party
The DevOps Journey at bwin.partyThe DevOps Journey at bwin.party
The DevOps Journey at bwin.party
 
Icebreaker with DevOps
Icebreaker with DevOpsIcebreaker with DevOps
Icebreaker with DevOps
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020Minimal Viable Architecture - Silicon Slopes 2020
Minimal Viable Architecture - Silicon Slopes 2020
 
MySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHubMySQL Infrastructure Testing Automation at GitHub
MySQL Infrastructure Testing Automation at GitHub
 
Social dev camp_2011
Social dev camp_2011Social dev camp_2011
Social dev camp_2011
 
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO ForumChris Mathias Presents Advanced API Design Considerations at LA CTO Forum
Chris Mathias Presents Advanced API Design Considerations at LA CTO Forum
 
Progressive Enhancement using WSGI
Progressive Enhancement using WSGIProgressive Enhancement using WSGI
Progressive Enhancement using WSGI
 

Último

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 

Último (20)

Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 

Shitlist-driven development and other tricks for working on large codebases

  • 1. Shitlist-driven development and othertricks forworking on large codebases FLOR IAN WE INGARTEN flo@shopify.com @fw1729
  • 2.
  • 3. 3
  • 4. 4 “Programmers at work maintaining a Ruby on Rails application” (Classic Programmer Paintings)
  • 5. 5 • >400k shops (multi-tenant architecture). • 20k-40k RPS (80k RPS peak). • ~800 contributors (developers, designers, …) • Everybody can merge to master and deploy to production. • 40-50 deploys (50-100 PRs) shipped to production per day. The Shopify Monolith
  • 6. 6 MONOLITH AT SCALE PRODU CTIVI TY PROBLEM 1: DEPLOYS BECOME A BOT TLENECK
  • 8. 7 • More people => more PRs => more deploys or bigger deploys. Deploy bottleneck: Speed
  • 9. 7 • More people => more PRs => more deploys or bigger deploys. • Small deploys: Fewer changes at once is safer, easier to debug, etc. Deploy bottleneck: Speed
  • 10. 7 • More people => more PRs => more deploys or bigger deploys. • Small deploys: Fewer changes at once is safer, easier to debug, etc. • Observation: If you want small and often, you need fast. Deploy bottleneck: Speed
  • 11. 7 • More people => more PRs => more deploys or bigger deploys. • Small deploys: Fewer changes at once is safer, easier to debug, etc. • Observation: If you want small and often, you need fast. • Shopify: 40-50 deploys/day, that’s ~6 per (business) hour. If deploys become slower than ~10min, they become a productivity problem for us. Deploy bottleneck: Speed
  • 13. 8 • Parallel CI builds. Deploy bottleneck: Speed
  • 14. 8 • Parallel CI builds. • Build containers in advance and quickly. Deploy bottleneck: Speed
  • 15. 8 • Parallel CI builds. • Build containers in advance and quickly. • Avoid booting application multiple times during container builds. Deploy bottleneck: Speed
  • 16. 8 • Parallel CI builds. • Build containers in advance and quickly. • Avoid booting application multiple times during container builds. • Deploy to many servers in parallel. Deploy bottleneck: Speed
  • 17. 8 • Parallel CI builds. • Build containers in advance and quickly. • Avoid booting application multiple times during container builds. • Deploy to many servers in parallel. • Reduce application boot time. Deploy bottleneck: Speed
  • 18. 8 • Parallel CI builds. • Build containers in advance and quickly. • Avoid booting application multiple times during container builds. • Deploy to many servers in parallel. • Reduce application boot time. • Reduce application shutdown time (e.g. Unicorn timeout, …). Deploy bottleneck: Speed
  • 20. 9 • Asking ops team to deploy doesn’t scale. Deploy bottleneck: Humans
  • 21. 9 • Asking ops team to deploy doesn’t scale. • Asking people to decide when a good time to deploy is doesn’t scale. Deploy bottleneck: Humans
  • 22. 9 • Asking ops team to deploy doesn’t scale. • Asking people to decide when a good time to deploy is doesn’t scale. • Asking everyone to pay attention to master CI doesn’t scale. Deploy bottleneck: Humans
  • 23. 9 • Asking ops team to deploy doesn’t scale. • Asking people to decide when a good time to deploy is doesn’t scale. • Asking everyone to pay attention to master CI doesn’t scale. • Asking everyone to pay attention to errors during a deploy doesn’t scale. Deploy bottleneck: Humans
  • 24. 9 • Asking ops team to deploy doesn’t scale. • Asking people to decide when a good time to deploy is doesn’t scale. • Asking everyone to pay attention to master CI doesn’t scale. • Asking everyone to pay attention to errors during a deploy doesn’t scale. • Asking developers to deploy themselves doesn’t scale. Deploy bottleneck: Humans
  • 25. 9 • Asking ops team to deploy doesn’t scale. • Asking people to decide when a good time to deploy is doesn’t scale. • Asking everyone to pay attention to master CI doesn’t scale. • Asking everyone to pay attention to errors during a deploy doesn’t scale. • Asking developers to deploy themselves doesn’t scale. • Humans don’t scale. Automate! Deploy bottleneck: Humans
  • 26.
  • 28.
  • 30.
  • 31.
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
  • 39.
  • 40. 13 MONOLITH AT SCALE PRODU CTIVI TY PROBLEM 2: TOO M ANY COOKS IN T HE K ITCH EN
  • 41.
  • 42.
  • 43.
  • 44.
  • 46. Someone “unfixed" it Someone added new shit In the meantime …
  • 47. Someone “unfixed" it Someone added new shit In the meantime … Too many cooks in the kitchen! !
  • 48.
  • 49.
  • 50. Have to fix everything at once now :-(
  • 51. Have to fix everything at once now :-( Idea: Can we raise only for B but not for C?
  • 52.
  • 53.
  • 54.
  • 55.
  • 57. Fixed. Can’t be accidentally “unfixed". Still shitlisted.
  • 58. Fixed. Can’t be accidentally “unfixed". Still shitlisted. Only B is allowed to do it wrong. No new shit can be introduced.
  • 59. Problems: - Not always possible to change the API. - Sometimes you want different “granularity".
  • 60.
  • 61. Granularity is now at the web request and job level
  • 62. Granularity is now at the web request and job level All jobs and all requests are now “registering" themselves so the shitlist can verify which codepaths are allowed to call the deprecated code.
  • 63. 26 • Great for changing very “broad" behaviour. • Great for breaking down a huge task into many small chunks. • Great for generating “To-Do lists”. • Great for “educating" a large team about how you want them to write code and enforcing the new behaviour. Shitlists
  • 64. 27 • Bad error message: “Someone decided that the thing that worked yesterday is now wrong. Good luck fixing it yourself.” • Good error message:
 “Your code tried to make an HTTP request within a MySQL database
 transaction. This has been deprecated since it can negatively impact
 database performance. Using after_commit instead of after_save is often a good fix. If you need more help, please come see us in Slack in the #database-team channel.” Shitlist error messages
  • 65. 28 MONOLITH AT SCALE PRODU CTIVI TY PROBLEM 3: UN RELIA BLE TEST S
  • 66. 29 Unlikely problems become likely at scale • Unreliable test: On the same version of the code, the test sometimes passes and sometimes fails. • Shopify: About 750 CI runs per day, ~10 min and ~70k tests each. • If only a single one of those 70k tests is unreliable and fails 1% of the time, we lose over 1 hour of productivity per day.
  • 67. 30 Types of unreliable tests Flaky test: time-dependent, load-dependent, … Leaky test: order-dependent (test B fails if test A ran first)
  • 68.
  • 70. Automatic leaky test "bisect" • Take list of all tests that ran before the failing test. • Binary search through list of candidates.
  • 71. 34 TL;DR SUMM ARY A ND K EY TAKEAWAYS
  • 72. 35 Summary: Monolith productivity at scale • Productivity problem 1: Deploys. • Solution: Often and small. Make them fast and automate everything. • Productivity problem 2: Too many cooks in the kitchen. • Solution: Shitlist-driven development. • Productivity problem 3: Unreliable tests. • Solution: Tracking and alerting. Bisect and grind. Automation.
  • 73. Thanks! Questions? FLOR IAN WE INGARTEN flo@shopify.com @fw1729