SlideShare una empresa de Scribd logo
1 de 28
Descargar para leer sin conexión
SRE in startup
Zonky 17.1.2017
Ladislav Prskavec, Apiary
ladislav@apiary.io
@abtris
1
What is SRE?
2
"What happens when a software engineer is tasked with what used to
be called operations."
» Ben Treynor Sloss, Vice President, Google Engineering,
founder of Google SRE
3
"Our work is like being part of the world's most intense pit crew. We
change the tires of a race car as it's going 100 mph."
» Andrew Widdowson, Site Reliability Engineer, Mountain View
4
In general, an SRE team is responsible for:
» availability
» latency
» performance
» efficiency
» change management
» monitoring
» emergency response
» capacity planning
5
6
If the team agrees on a 99.9% SLA,
that gives them an error budget of
0.1%.
7
8
Rule
If service is in SLA, launch away
- clearly DEV team is doing a good job
If service is not within SLA, launch freeze
- Until you earn back enough error budget
9
Error budget
» removes SRE - DEV conflict
» DEV teams make self-police
10
Common staffing pool
» one more SRE = one less Dev
11
SRE hires only coders
» they get bored easily
» speak same language as Dev
12
50% cap on ops work
» if you succeed works scales with traffic
» coding reduce work / traffic ratio
13
Keep Dev in rotation
» 5% ops handled by devs
14
Speaking of Dev and Ops work
» excess operations load (tickets, oncall, etc.)
15
SRE portability
» no requirement to stick with project or SRE
16
Outages
» minimalize impact
» prevent recurrence
17
Minimalize damage
» no NOC
» good diagnostic information
» practice, practice, practice
18
Prevent recurrence
1. Handle event
2. Write post-mortems
3. Reset
19
Post-mortems philosophy
» blameless, focus on process and technology
» create timeline
» get all facts
» create bugs for all followup work
20
How are specific SRE
in startup?
21
1:10
22
Horizontal team
23
SaaS oriented
24
Oncall culture
25
It's cool work
26
SRE book
27
"May the Queries Flow,
And the Pagers Remain Silent"
SRE Benediction
28

Más contenido relacionado

La actualidad más candente

Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLI
Knoldus Inc.
 

La actualidad más candente (20)

Site reliability engineering - Lightning Talk
Site reliability engineering - Lightning TalkSite reliability engineering - Lightning Talk
Site reliability engineering - Lightning Talk
 
SRE 101
SRE 101SRE 101
SRE 101
 
SRE vs DevOps
SRE vs DevOpsSRE vs DevOps
SRE vs DevOps
 
Site (Service) Reliability Engineering
Site (Service) Reliability EngineeringSite (Service) Reliability Engineering
Site (Service) Reliability Engineering
 
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
Implementing SRE practices: SLI/SLO deep dive - David Blank-Edelman - DevOpsD...
 
Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)Getting started with Site Reliability Engineering (SRE)
Getting started with Site Reliability Engineering (SRE)
 
DevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE ConceptsDevOps Torino Meetup - SRE Concepts
DevOps Torino Meetup - SRE Concepts
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
 
SRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil EliminationSRE Demystified - 05 - Toil Elimination
SRE Demystified - 05 - Toil Elimination
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...SRE (service reliability engineer) on big DevOps platform running on the clou...
SRE (service reliability engineer) on big DevOps platform running on the clou...
 
Building an SRE Organization @ Squarespace
Building an SRE Organization @ SquarespaceBuilding an SRE Organization @ Squarespace
Building an SRE Organization @ Squarespace
 
SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)SRE 101 (Site Reliability Engineering)
SRE 101 (Site Reliability Engineering)
 
Kks sre book_ch1,2
Kks sre book_ch1,2Kks sre book_ch1,2
Kks sre book_ch1,2
 
Overview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practicesOverview of Site Reliability Engineering (SRE) & best practices
Overview of Site Reliability Engineering (SRE) & best practices
 
How to SRE when you have no SRE
How to SRE when you have no SREHow to SRE when you have no SRE
How to SRE when you have no SRE
 
SRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLASRE Demystified - 01 - SLO SLI and SLA
SRE Demystified - 01 - SLO SLI and SLA
 
SRE From Scratch
SRE From ScratchSRE From Scratch
SRE From Scratch
 
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
SRE-iously: Defining the Principles, Habits, and Practices of Site Reliabilit...
 
Service Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLIService Level Terminology : SLA ,SLO & SLI
Service Level Terminology : SLA ,SLO & SLI
 

Destacado

Destacado (20)

The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
The Social Requirements Engineering (SRE) Approach to Developing a Large-scal...
 
The ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concreteThe ROLE SRE Approach - Getting more concrete
The ROLE SRE Approach - Getting more concrete
 
Sre con16 tier 1 metamorphosis
Sre con16   tier 1 metamorphosisSre con16   tier 1 metamorphosis
Sre con16 tier 1 metamorphosis
 
Using the Splunk Java SDK
Using the Splunk Java SDKUsing the Splunk Java SDK
Using the Splunk Java SDK
 
SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016SouthBay SRE Meetup Jan 2016
SouthBay SRE Meetup Jan 2016
 
Scio Saa S Readiness Evaluation Sre V1.0
Scio Saa S Readiness Evaluation Sre V1.0Scio Saa S Readiness Evaluation Sre V1.0
Scio Saa S Readiness Evaluation Sre V1.0
 
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, GoogleStephen McHenry - Chanecellor of Site Reliability Engineering, Google
Stephen McHenry - Chanecellor of Site Reliability Engineering, Google
 
Works of site reliability engineer
Works of site reliability engineerWorks of site reliability engineer
Works of site reliability engineer
 
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
Cloud Native, Microservices and SRE/Chaos Engineering: The new Rules of The G...
 
Linux Server Hardening - Steps by Steps
Linux Server Hardening - Steps by StepsLinux Server Hardening - Steps by Steps
Linux Server Hardening - Steps by Steps
 
TXLF: Automated Deployment of OpenStack with Chef
TXLF: Automated Deployment of OpenStack with ChefTXLF: Automated Deployment of OpenStack with Chef
TXLF: Automated Deployment of OpenStack with Chef
 
Software Reliability Engineering
Software Reliability EngineeringSoftware Reliability Engineering
Software Reliability Engineering
 
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
DOES16 San Francisco - David Blank-Edelman - Lessons Learned from a Parallel ...
 
You got a couple Microservices, now what? - Adding SRE to DevOps
You got a couple Microservices, now what?  - Adding SRE to DevOpsYou got a couple Microservices, now what?  - Adding SRE to DevOps
You got a couple Microservices, now what? - Adding SRE to DevOps
 
SRE Tools
SRE ToolsSRE Tools
SRE Tools
 
Load balancing in the SRE way
Load balancing in the SRE wayLoad balancing in the SRE way
Load balancing in the SRE way
 
SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016SRE - drupal day aveiro 2016
SRE - drupal day aveiro 2016
 
Site Reliability Engineering Helps Google Conquer The World
Site Reliability Engineering Helps Google Conquer The WorldSite Reliability Engineering Helps Google Conquer The World
Site Reliability Engineering Helps Google Conquer The World
 
DevOps and Chef
DevOps and ChefDevOps and Chef
DevOps and Chef
 
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
Docker Azure Friday OSS March 2017 - Developing and deploying Java & Linux on...
 

Similar a SRE in Startup

Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdf
DeepakGupta747774
 
Dan Cornell - The Real Cost of Software Remediation
Dan Cornell  - The Real Cost of Software RemediationDan Cornell  - The Real Cost of Software Remediation
Dan Cornell - The Real Cost of Software Remediation
Source Conference
 
On designing and deploying internet scale services
On designing and deploying internet scale servicesOn designing and deploying internet scale services
On designing and deploying internet scale services
billowqiu
 

Similar a SRE in Startup (20)

Site-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdfSite-Reliability-Engineering-v2[6241].pdf
Site-Reliability-Engineering-v2[6241].pdf
 
SRE in Enterprise - Local Journey DevopsDays Galway
SRE in Enterprise - Local Journey  DevopsDays GalwaySRE in Enterprise - Local Journey  DevopsDays Galway
SRE in Enterprise - Local Journey DevopsDays Galway
 
Dan Cornell - The Real Cost of Software Remediation
Dan Cornell  - The Real Cost of Software RemediationDan Cornell  - The Real Cost of Software Remediation
Dan Cornell - The Real Cost of Software Remediation
 
Real Cost of Software Remediation
Real Cost of Software RemediationReal Cost of Software Remediation
Real Cost of Software Remediation
 
Test Automation on Large Agile Projects: It's Not a Cakewalk
Test Automation on Large Agile Projects: It's Not a CakewalkTest Automation on Large Agile Projects: It's Not a Cakewalk
Test Automation on Large Agile Projects: It's Not a Cakewalk
 
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
NUS-ISS Learning Day 2019-Site Reliability Engineering – The Modern Method fo...
 
Those Other SLOs.pdf
Those Other SLOs.pdfThose Other SLOs.pdf
Those Other SLOs.pdf
 
5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes5 Habits of High-Velocity Teams Using Kubernetes
5 Habits of High-Velocity Teams Using Kubernetes
 
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
Björn Rabenstein - About SRE – and how (not) to apply it - Codemotion Berlin ...
 
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
Björn Rabenstein - About SRE and how (not) to apply it - Codemotion Berlin 2018
 
On designing and deploying internet scale services
On designing and deploying internet scale servicesOn designing and deploying internet scale services
On designing and deploying internet scale services
 
The Stream Process™ for Defining Projects
The Stream Process™ for Defining ProjectsThe Stream Process™ for Defining Projects
The Stream Process™ for Defining Projects
 
SRE in Apiary
SRE in ApiarySRE in Apiary
SRE in Apiary
 
DevOpsDays Galway 2019 - SRE at Genesys
DevOpsDays Galway 2019 - SRE at GenesysDevOpsDays Galway 2019 - SRE at Genesys
DevOpsDays Galway 2019 - SRE at Genesys
 
Beyond the Scrum Team: Delivering "Done" at Scale
Beyond the Scrum Team: Delivering "Done" at ScaleBeyond the Scrum Team: Delivering "Done" at Scale
Beyond the Scrum Team: Delivering "Done" at Scale
 
Combining Speed of Delivery and Quality in Complex Systems
Combining Speed of Delivery and Quality in Complex SystemsCombining Speed of Delivery and Quality in Complex Systems
Combining Speed of Delivery and Quality in Complex Systems
 
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
 
Extreme programming
Extreme programmingExtreme programming
Extreme programming
 
DOES14 - Scott Prugh - CSG - DevOps and Lean in Legacy Environments
DOES14 - Scott Prugh - CSG - DevOps and Lean in Legacy EnvironmentsDOES14 - Scott Prugh - CSG - DevOps and Lean in Legacy Environments
DOES14 - Scott Prugh - CSG - DevOps and Lean in Legacy Environments
 
Software Engineering at RightScale
Software Engineering at RightScaleSoftware Engineering at RightScale
Software Engineering at RightScale
 

Más de Ladislav Prskavec

Más de Ladislav Prskavec (20)

Modern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and MarkupModern Web Architecture<br>based on JS, API and Markup
Modern Web Architecture<br>based on JS, API and Markup
 
How you can kill Wordpress!
How you can kill Wordpress!How you can kill Wordpress!
How you can kill Wordpress!
 
CI and CD
CI and CDCI and CD
CI and CD
 
Datascript: Serverless Architetecture
Datascript: Serverless ArchitetectureDatascript: Serverless Architetecture
Datascript: Serverless Architetecture
 
Serverless Architecture
Serverless ArchitectureServerless Architecture
Serverless Architecture
 
CI and CD
CI and CDCI and CD
CI and CD
 
PragueJS meetups 30th anniversary
PragueJS meetups 30th anniversaryPragueJS meetups 30th anniversary
PragueJS meetups 30th anniversary
 
How to easy deploy app into any cloud
How to easy deploy app into any cloudHow to easy deploy app into any cloud
How to easy deploy app into any cloud
 
Docker - modern platform for developement and operations
Docker - modern platform for developement and operationsDocker - modern platform for developement and operations
Docker - modern platform for developement and operations
 
GDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWSGDGSCL - Docker a jeho provoz v Heroku a AWS
GDGSCL - Docker a jeho provoz v Heroku a AWS
 
AWS Elastic Container Service
AWS Elastic Container ServiceAWS Elastic Container Service
AWS Elastic Container Service
 
Comparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls APIComparison nodejs frameworks using Polls API
Comparison nodejs frameworks using Polls API
 
Docker Elastic Beanstalk
Docker Elastic BeanstalkDocker Elastic Beanstalk
Docker Elastic Beanstalk
 
Docker včera, dnes a zítra
Docker včera, dnes a zítraDocker včera, dnes a zítra
Docker včera, dnes a zítra
 
Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.
 
Docker.io
Docker.ioDocker.io
Docker.io
 
Docker.io
Docker.ioDocker.io
Docker.io
 
AngularJS
AngularJSAngularJS
AngularJS
 
Firebase and AngularJS
Firebase and AngularJSFirebase and AngularJS
Firebase and AngularJS
 
AngularJS at PyVo
AngularJS at PyVoAngularJS at PyVo
AngularJS at PyVo
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

SRE in Startup