Global DevOps Bootcamp 2018 Keynote

From box product to cloud
cadence: The VSTS story
Buck Hodges
Director of Software Engineering
Visual Studio Team Services

Team Foundation
Server (TFS)
Visual Studio
Team Services (VSTS)

3 weeks
Team Foundation Server (TFS)
Visual Studio Team Services (VSTS)
Single master branch, multiple release branches

Shared Platform Services (SPS)
North Central
TFS SU1
North Central
TFS SU0
West Central
TFS SU7
Australia

Hosted Build
Pool
Hosted Build
Pool
Today: Micro Services
TFS
Work Item Tracking
Version Control
Build
Test Case Management
Service
Hooks
Release
Management
Search Code Lens
Extension
Management
Hosted Build
Pool
Cloud Load
Test
VSTS
Blobstore
Feeds
Packaging
SPS
Identity
Account
Commerce
Licensing
Moving to Containerized Services

No such thing as ‘partial automation’
Set-Options “-p 0”

Features to be revealed at a big event in November 2013
We turned features on globally just before the keynote
It didn’t go well.

Customer IntelligenceBusiness IntelligenceOperational Intelligence
Dashboards DevOps Debug Experiments
Volume
~7TBAverage per day
and growing!
Alerts
Activity
Logging
Traces
Customer
Intelligence
Synthetic
KPI
Metrics
Job
History
Perf
Counters
NetworkPlatform
Gather everything
SLA
Mindset shift from on-premises to the cloud

Test at the lowest level possible
Fast and reliable
Product is designed for testability
Test code is product code

Pull Requests for code
reviews
Build required by policy
Unit tests run before merge
Autocomplete makes it
convenient
Master stays high quality

Thank You!
Buck Hodges
@tfsbuck
Learn more about our evolving
DevOps journey
https://aka.ms/DevOps

Sprint 1
August 2010
Sprint 135
May 2018
Team Rooms
August 2013
1ES
Spring 2014
On-call Duty
October 2013
Combined
Engineering
November 2014
Test Conversion
Completed
April 2017
Service Online
April 23, 2011
Service Preview
June 2012

On call rotation
Gather data for root cause & mitigate for
customers
Every action recorded
Create & track Repair Items to prevent
reoccurrence and improve detection time

Test at the lowest level possible
Fast and reliable
Product is designed for testability
Test code is product code
End to end tests can run in production

Over 22 hours for nightly run and 2 days for the full run
Only ~60% of P0 runs passed 100%; Each run had many failures
Took days to sift through failures before deployment could start

L0 – requires only built binaries, no dependencies
L1 – adds ability to use SQL and file system
Run L0 & L1 in the pull request builds
L2 – test a service via REST APIs
L3 – full environment to test end to end
TRA tests – Legacy functional tests

A strategy adopted by our teams to provide
focus, and assist with an interrupt culture.
• The team self-organizes each sprint into two
distinct sub-teams: Features and Shield
• Rotates each sprint
Team of 10 Engineers
Shield Team
Deals with all live-site
issues and interruptions
Feature Team
Works on committed
features (new work)

• Conference bridge created
• DRI’s brought in to call
• Communication externally and
internally
• Pursue multiple theories
• Gather data for root cause & mitigate
• Record changes
• Rotate people during long running
LSIs

Repair work-items are logged in VSTS but linked into
the post mortem for traceability
Time-to’s are a key KPI that are reviewed for improvements
Each Feature Team has goals for closing repair items

If we can’t prevent failure – can we limit the impact?
https://github.com/Netflix/Hystrix/wiki

•
•
•
•
Day 1
Ring 0
Binaries
Delay
1 hour
Ring 0
Servicing
Delay
2 hours
Ring 1
Binaries
Delay
1 hour
Ring 1
Servicing
Delay
2 hours
Ring 2
Binaries
Delay
1 hour
Ring 2
Servicing
Day 2
Ring 3
Binaries
Delay
1 hour
Ring 3
Servicing
Delay
3 hours
Ring 4
Binaries
Delay
1 hour
Ring 4
Servicing

PR to Merge is 30 mins
600 PR builds per day
~60,000 tests in each build
175 pushes to master
Merge to CI Build is 22 mins
120 builds per day
2,864 projects (C# and C++)
10 GB Build Drop
Merge to SelfTest is 58 mins
6 SelfTest suits triggered in parallel
518 tests executed in <8 mins
Merge to SelfHost is 120 mins
4 SelfHost suits triggered in parallel
3260 tests executed in < 75 mins

Why move to containers?
Agility for teams while keeping COGs under control
Faster deployments
Get test results faster
Improve quality of service by simpler auto-scaling
Same for production and engineering environments

Global DevOps Bootcamp 2018 Keynote

Global DevOps Bootcamp 2018 Keynote

Recomendados

Recomendados

Más contenido relacionado

Último

Último (20)

Destacado

Destacado (20)

Global DevOps Bootcamp 2018 Keynote

Notas del editor