How Small Team Get Ready for SRE (public version)

Presented byPresented by
How SmallTeam Get Ready for
Site Reliability Engineering (SRE)
Setyo Legowo
Facebook Developer Circles – Bandung
October 1st, 2017

Presented by
Sources
SREcon17 Asia/Australia: How
Could Small Teams Get Ready for
SRE
Zehua Liu, Zendesk Singapore
Facebook DevCircles - BandungOctober 1st, 2017
Source: https://www.usenix.org/sites/default/files/srecon_europe_wide.png

Presented by
What is SRE?
October 1st, 2017 Facebook DevCircles - Bandung
Source: https://landing.google.com/sre/interview/ben-treynor.html

Presented by
Key Points of SRE
• Hire only coders
• Have an SLA for your service
• Measure and report performance against SLA
• Use error budgets and gate launches on them
• Common staffing pool for SRE and DEV
• Excess Ops work overflows to Dev team
• Cap SRE operational load at 50%
• Share 5% of Ops work with Dev team
• Maximum of 2 events per oncall shift per person is all that's possible
• minimum group size of 8 people (8 people x1 location or 6x2)
• Post mortem for every event
• Post mortems are blameless and focus on process and technology, not people
Ben Treynor
VP Engineering at Google
Image Source:
https://www.usenix.org/sites/default/files/conference-files/ben_treynor_300.png
Source: https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre

Presented by
UrbanIndo’s
DeploymentTransformation Journey
How UrbanIndo change its services deployment?

Presented by
The Problem
• Small Teams?
• Small company
• Small engineering team
• The case:
• A small team in a big company
• ~100 employees
• ~10 Engineer
• 6 Software Engineer
• 1 DevOps – Infrastructure Engineer
• 3 Mobile Application Engineer

Presented by
Growth Problem
• Total visitor grows gradually each month
And also new features
• Issues in productivity and site reliability
• Onboarding new hires
• Slower deployment time
• More incidents
• No clear SLA
Source: https://c1.staticflickr.com/6/5260/5519749611_a95070b507.jpg

Presented by
Do we have any
solution?
• Started a series of engineering
initiatives
• Implement SCRUM instead of FDD
• Automated test
• Simple deployment
• Easy-to-use development environment
• …
Source: http://www.doncio.navy.mil/uploads/0803IXR47425.jpg

Presented by
Dedicated Engineering Resources
• SCRUM Development – Past
• CTO led feature development
• Toil task fixed when encountered
• SCRUM Development – Now
• Hired more engineers
• Tried to eliminate technical debts
• No feature development for operational team
• Develop tools that support developers

Presented by
Simple Deployment
• Production Deployment – Past
• Manual: ssh and copy and paste scripts
• Prone human error
• Only few engineers could do it
• Could not accommodate new engineers and more frequent deployment
• Production Deployment – New
• Jenkins  Travis  Jenkins
• DevOps team install deployment script on new apps
• Ownership for engineers

Presented by
Easy-to-use Development Environment
• Setup development environment – Past
• Had ~30 steps setup steps
• Non uniform application version whether they installed the same apps
• Hard for new engineers
• Setup development environment – Now
• Spent one quarter dockerizing dev and test environment
• Current development/deployment pipeline:
• Develop locally  Test in Docker  Deploy to Staging  Test on staging
 Deploy to Production

Presented by
AutomatedTest
• Automated Test – Past
• No automated test
• Manual test directly by product owner
• Automated Test – Now
• Automated unit and acceptance test in Docker
• Manual test by QA
• Test coverage report saved in reliable storage
• Insert automated test in each deployment step

Presented by
Miscellaneous Initiatives
• Change velocity, several deployment for each day
• Deploy to staging/production in minutes
• Build useful monitoring dashboard
• And alert notification
• Rotate monitoring shift
• Establish post mortem culture
• Report every incident as post mortem

Presented by
Do those initiatives meet all requirements of SRE?
• Yes, but …
• Do not have to do SRE like Go*gle
• Adjust with your needs/issues as you grow and SRE will come to you
• You don’t even need an SRE team!
• Focus on how to deliver reliable services

Presented by
Unfulfilled Goals
• When we become a big guy
• Data center operations
• On-premise devices
• Reliability checklist
• SLA  SLI, SLO
• Incident management
• Good for reporting
Source: https://commons.wikimedia.org/wiki/File:Pilgrims_on_the_Way_of_St.James_near_Saint-Martin-des-Champs.JPG

Presented by
Why SRE?
Why not DevOps?

Presented by
What is the difference with DevOps?
Image source: https://commons.wikimedia.org/wiki/File:Devops-toolchain.svg

Presented byOctober 1st, 2017 Facebook DevCircles - Bandung

Presented by
Watch & Reading List
• How Could Small Team Get Ready for SRE, by Zehua Liu
https://www.usenix.org/conference/srecon17asia/program/presentation/liu
• Key Points of SRE, by Ben Treynor
https://www.usenix.org/conference/srecon14/technical-sessions/presentation/keys-sre
• https://landing.google.com/sre/interview/ben-treynor.html
• Usenix Youtube Channel, https://www.youtube.com/channel/UC4-
GrpQBx6WCGwmwozP744Q
• Site Reliability Engineering: How Google Runs Production Systems, Edited by Betsy Beyer,
Chris Jones, Jennifer Petoff, and Niall Richard Murphy
• The DevOps Handbook: How to Create World-Class Agility, Reliability, & Security in
Technology Organizations, by Gene Kim, Jez Humble, Patrick Debois, and John Willis
• Linux Foundation Events Youtube Channel,
https://www.youtube.com/channel/UCthvmTSlmIcMH93LIJNe-2w

Presented by
ThankYou
Setyo Legowo
• Software Engineer at UrbanIndo
• Office e-mail address: setyo@urbanindo.com
• Personal e-mail address: setyolegowo94@gmail.com
• LinkedIn: https://www.linkedin.com/in/setyolegowo/

How Small Team Get Ready for SRE (public version)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to How Small Team Get Ready for SRE (public version)

Similar to How Small Team Get Ready for SRE (public version) (20)

Recently uploaded

Recently uploaded (20)

How Small Team Get Ready for SRE (public version)