SlideShare una empresa de Scribd logo
1 de 39
From box product to cloud
cadence: The VSTS story
Buck Hodges
Director of Software Engineering
Visual Studio Team Services
Team Foundation
Server (TFS)
Visual Studio
Team Services (VSTS)
3 weeks
Team Foundation Server (TFS)
Visual Studio Team Services (VSTS)
Single master branch, multiple release branches
Shared Platform Services (SPS)
North Central
TFS SU1
North Central
TFS SU0
West Central
TFS SU7
Australia
Hosted Build
Pool
Hosted Build
Pool
Today: Micro Services
TFS
Work Item Tracking
Version Control
Build
Test Case Management
Service
Hooks
Release
Management
Search Code Lens
Extension
Management
Hosted Build
Pool
Cloud Load
Test
VSTS
Blobstore
Feeds
Packaging
SPS
Identity
Account
Commerce
Licensing
Moving to Containerized Services
Online upgrades
No such thing as ‘partial automation’
Set-Options “-p 0”
Features to be revealed at a big event in November 2013
We turned features on globally just before the keynote
It didn’t go well.
Customer IntelligenceBusiness IntelligenceOperational Intelligence
Dashboards DevOps Debug Experiments
Volume
~7TBAverage per day
and growing!
Alerts
Activity
Logging
Traces
Customer
Intelligence
Synthetic
KPI
Metrics
Job
History
Perf
Counters
NetworkPlatform
Gather everything
SLA
Mindset shift from on-premises to the cloud
Test at the lowest level possible
Fast and reliable
Product is designed for testability
Test code is product code
Pull Requests for code
reviews
Build required by policy
Unit tests run before merge
Autocomplete makes it
convenient
Master stays high quality
Thank You!
Buck Hodges
@tfsbuck
Learn more about our evolving
DevOps journey
https://aka.ms/DevOps
Sprint 1
August 2010
Sprint 135
May 2018
Team Rooms
August 2013
1ES
Spring 2014
On-call Duty
October 2013
Combined
Engineering
November 2014
Test Conversion
Completed
April 2017
Service Online
April 23, 2011
Service Preview
June 2012
TFS SU1
North Central
Online upgrades
On call rotation
Gather data for root cause & mitigate for
customers
Every action recorded
Create & track Repair Items to prevent
reoccurrence and improve detection time
Test at the lowest level possible
Fast and reliable
Product is designed for testability
Test code is product code
End to end tests can run in production
Over 22 hours for nightly run and 2 days for the full run
Only ~60% of P0 runs passed 100%; Each run had many failures
Took days to sift through failures before deployment could start
L0 – requires only built binaries, no dependencies
L1 – adds ability to use SQL and file system
Run L0 & L1 in the pull request builds
L2 – test a service via REST APIs
L3 – full environment to test end to end
TRA tests – Legacy functional tests
A strategy adopted by our teams to provide
focus, and assist with an interrupt culture.
• The team self-organizes each sprint into two
distinct sub-teams: Features and Shield
• Rotates each sprint
Team of 10 Engineers
Shield Team
Deals with all live-site
issues and interruptions
Feature Team
Works on committed
features (new work)
• Conference bridge created
• DRI’s brought in to call
• Communication externally and
internally
• Pursue multiple theories
• Gather data for root cause & mitigate
• Record changes
• Rotate people during long running
LSIs
Repair work-items are logged in VSTS but linked into
the post mortem for traceability
Time-to’s are a key KPI that are reviewed for improvements
Each Feature Team has goals for closing repair items
If we can’t prevent failure – can we limit the impact?
https://github.com/Netflix/Hystrix/wiki
•
•
•
•
Day 1
Ring 0
Binaries
Delay
1 hour
Ring 0
Servicing
Delay
2 hours
Ring 1
Binaries
Delay
1 hour
Ring 1
Servicing
Delay
2 hours
Ring 2
Binaries
Delay
1 hour
Ring 2
Servicing
Day 2
Ring 3
Binaries
Delay
1 hour
Ring 3
Servicing
Delay
3 hours
Ring 4
Binaries
Delay
1 hour
Ring 4
Servicing
PR to Merge is 30 mins
600 PR builds per day
~60,000 tests in each build
175 pushes to master
Merge to CI Build is 22 mins
120 builds per day
2,864 projects (C# and C++)
10 GB Build Drop
Merge to SelfTest is 58 mins
6 SelfTest suits triggered in parallel
518 tests executed in <8 mins
Merge to SelfHost is 120 mins
4 SelfHost suits triggered in parallel
3260 tests executed in < 75 mins
Why move to containers?
Agility for teams while keeping COGs under control
Faster deployments
Get test results faster
Improve quality of service by simpler auto-scaling
Same for production and engineering environments
Global DevOps Bootcamp 2018 Keynote

Más contenido relacionado

Último

introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 

Último (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Pharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodologyPharm-D Biostatistics and Research methodology
Pharm-D Biostatistics and Research methodology
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...Chinsurah Escorts ☎️8617697112  Starting From 5K to 15K High Profile Escorts ...
Chinsurah Escorts ☎️8617697112 Starting From 5K to 15K High Profile Escorts ...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 

Destacado

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Saba Software
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
Simplilearn
 

Destacado (20)

How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them wellGood Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
Good Stuff Happens in 1:1 Meetings: Why you need them and how to do them well
 
Introduction to C Programming Language
Introduction to C Programming LanguageIntroduction to C Programming Language
Introduction to C Programming Language
 

Global DevOps Bootcamp 2018 Keynote

  • 1.
  • 2. From box product to cloud cadence: The VSTS story Buck Hodges Director of Software Engineering Visual Studio Team Services
  • 3. Team Foundation Server (TFS) Visual Studio Team Services (VSTS)
  • 4. 3 weeks Team Foundation Server (TFS) Visual Studio Team Services (VSTS) Single master branch, multiple release branches
  • 5.
  • 6.
  • 7. Shared Platform Services (SPS) North Central TFS SU1 North Central TFS SU0 West Central TFS SU7 Australia
  • 8. Hosted Build Pool Hosted Build Pool Today: Micro Services TFS Work Item Tracking Version Control Build Test Case Management Service Hooks Release Management Search Code Lens Extension Management Hosted Build Pool Cloud Load Test VSTS Blobstore Feeds Packaging SPS Identity Account Commerce Licensing Moving to Containerized Services
  • 9.
  • 11. No such thing as ‘partial automation’ Set-Options “-p 0”
  • 12.
  • 13.
  • 14. Features to be revealed at a big event in November 2013 We turned features on globally just before the keynote It didn’t go well.
  • 15. Customer IntelligenceBusiness IntelligenceOperational Intelligence Dashboards DevOps Debug Experiments Volume ~7TBAverage per day and growing! Alerts Activity Logging Traces Customer Intelligence Synthetic KPI Metrics Job History Perf Counters NetworkPlatform Gather everything SLA Mindset shift from on-premises to the cloud
  • 16. Test at the lowest level possible Fast and reliable Product is designed for testability Test code is product code
  • 17. Pull Requests for code reviews Build required by policy Unit tests run before merge Autocomplete makes it convenient Master stays high quality
  • 18.
  • 19. Thank You! Buck Hodges @tfsbuck Learn more about our evolving DevOps journey https://aka.ms/DevOps
  • 20. Sprint 1 August 2010 Sprint 135 May 2018 Team Rooms August 2013 1ES Spring 2014 On-call Duty October 2013 Combined Engineering November 2014 Test Conversion Completed April 2017 Service Online April 23, 2011 Service Preview June 2012
  • 21.
  • 24. On call rotation Gather data for root cause & mitigate for customers Every action recorded Create & track Repair Items to prevent reoccurrence and improve detection time
  • 25.
  • 26.
  • 27. Test at the lowest level possible Fast and reliable Product is designed for testability Test code is product code End to end tests can run in production
  • 28. Over 22 hours for nightly run and 2 days for the full run Only ~60% of P0 runs passed 100%; Each run had many failures Took days to sift through failures before deployment could start
  • 29. L0 – requires only built binaries, no dependencies L1 – adds ability to use SQL and file system Run L0 & L1 in the pull request builds L2 – test a service via REST APIs L3 – full environment to test end to end TRA tests – Legacy functional tests
  • 30.
  • 31. A strategy adopted by our teams to provide focus, and assist with an interrupt culture. • The team self-organizes each sprint into two distinct sub-teams: Features and Shield • Rotates each sprint Team of 10 Engineers Shield Team Deals with all live-site issues and interruptions Feature Team Works on committed features (new work)
  • 32. • Conference bridge created • DRI’s brought in to call • Communication externally and internally • Pursue multiple theories • Gather data for root cause & mitigate • Record changes • Rotate people during long running LSIs
  • 33. Repair work-items are logged in VSTS but linked into the post mortem for traceability Time-to’s are a key KPI that are reviewed for improvements Each Feature Team has goals for closing repair items
  • 34. If we can’t prevent failure – can we limit the impact? https://github.com/Netflix/Hystrix/wiki
  • 35.
  • 36. • • • • Day 1 Ring 0 Binaries Delay 1 hour Ring 0 Servicing Delay 2 hours Ring 1 Binaries Delay 1 hour Ring 1 Servicing Delay 2 hours Ring 2 Binaries Delay 1 hour Ring 2 Servicing Day 2 Ring 3 Binaries Delay 1 hour Ring 3 Servicing Delay 3 hours Ring 4 Binaries Delay 1 hour Ring 4 Servicing
  • 37. PR to Merge is 30 mins 600 PR builds per day ~60,000 tests in each build 175 pushes to master Merge to CI Build is 22 mins 120 builds per day 2,864 projects (C# and C++) 10 GB Build Drop Merge to SelfTest is 58 mins 6 SelfTest suits triggered in parallel 518 tests executed in <8 mins Merge to SelfHost is 120 mins 4 SelfHost suits triggered in parallel 3260 tests executed in < 75 mins
  • 38. Why move to containers? Agility for teams while keeping COGs under control Faster deployments Get test results faster Improve quality of service by simpler auto-scaling Same for production and engineering environments

Notas del editor

  1. Welcome to the Global DevOps Bootcamp
  2. Hello, I’m Buck Hodges, director of software engineering for Visual Studio Team Services, and today I’m going to talk to you about our journey from a box product to a cloud cadence.
  3. TFS and VSTS provide Git, agile planning, build automation, and more. TFS ships on-premises, and VSTS is the equivalent running as a service on Azure.
  4. As of October 2017: Single repo 430 people pushed to the repo in the last 30 days ~40 feature teams Code base is 90+% the same Teams work in master No nightly build Over 3,000 projects (doubled in the last 3 years) Pull requests build & run unit test validation
  5. Single tenant = every account sign up created a new database in Azure North Central (Chicago)…got to 11,000 DBs
  6. Blast radius Scale limit (soft) Geos/sovereignty VMs – PaaS web and worker roles and moving to Containers App tiers – serve web UI, web service endpoints Job agents – background processing like scheduled builds, clean up, commit processing, etc. DB – only metadata in SQL Azure, multi-tenant Blob – file data in Azure Storage Collections are accounts SU1 was the first only originally…no incremental roll out when there’s only one! Then SPS (March 2013) Then SU0 Then more scale units in the US and around the world Now have SPS SU0 (February 2017) Organized in deployment rings Health check runs after each ring is deployed Today we have four rings with outer rings having multiple scale units in them Each service has scale units organized in rings
  7. Micro services Search, RM, Package, etc. We require all services to operate the same way Consistent deployments Consistent framework
  8. Dark launch – decouples marketing and engineering
  9. Turn new features on completely at least 24 hours ahead of an event Turn on incrementally Monitor Use feature flags for back end changes
  10. We only had TFS SU1 and SPS at the time https://blogs.msdn.microsoft.com/bharry/2013/11/25/a-rough-patch/
  11. Story about initial release – Twitter was a better monitor than what we had in 2011 when we first announced the service at //Build in 2011 ~60GB per day in 2015
  12. Test at the lowest level possible Fast and reliable Product is designed for testability Test code is product code
  13. We started with an on-prem product, TFS 2010 We refactored it in production Shifted to
  14. Thanks for being here. If you would like to learn more about our DevOps journey, go to aka.ms/devops. Have a lot of fun today learning DevOps. Now over to Marcel…
  15. We adopted Scrum during the summer of 2010 after we had completed TFS 2010. TFS 2010 was the first release that supported load balanced application tiers and collections. TFS had a consistent SQL component access layer for quite a while.
  16. VMs – Azure PaaS web and worker roles App tiers – serve web UI, web service endpoints Job agents – background processing like scheduled builds, clean up, commit processing, etc. DB – only metadata in SQL Azure Blob – file data in Azure Storage Collections are accounts SU1 was the first only originally…no incremental roll out when there’s only one!
  17. Deployment mode is a mode where every time a sproc is called the sproc grabs a reader lock on the schema so that the schema can’t be changed during the call
  18. Single tenant = every account sign up created a new database in Azure North Central (Chicago)…got to 11,000 DBs Added multi-tenancy in February 2012
  19. Stats as of September 2017 Other Azure Services – DocumentDB, DataLake, Service Fabric, Elastic Search Clusters, ServiceFabric, VMSS (Virtual Machine Scale Sets), AFD, CDN, Azure Traffic Manager, AzureActiveDirectory Services, IaaS Services = 31: AEX ALMSearch artifact AX blobstore clt CodeLens Coss csstool dataimport devtestlabs DrService entreq extmgmt feeds gov kalypso market MMS mps msdnadmin MySubscriptions OrgSearch pe pkgs Portal ReleaseManagement sh sps spsext tfs
  20. Key Takeaway: --5 whys -- define improvement for both code and process -- visibility to ensure learnings are applied Outcomes: -- Improve how you respond (TTx) -- Stop from happening again
  21. Limit the impact of a problem Degrade gracefully Once problem is over, the service should self-recover quickly “Release It! – Design and Deploy Production-Ready Software”  Michael T. Nygard  Netflix Hystrix – “Making the Netflix API More Resilient”  Ben Schmaus https://github.com/Netflix/Hystrix/wiki
  22. Binary deployments are faster: 3 minutes vs. 30 minutes