SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Production Engineering (PE)
There Is No Spoon
Ran Leibman
Production Engineer
Agenda
1. How Production Engineering was formed in Facebook
2. What we do in Onavo
3. How we moved to the PE model in Onavo
4. Q & A
Facebook - Pre PE Days
SRO - Site Reliability Operations
1. Keep the site up 24/7
2. Follow the sun
3. Capacity plans
Why SRO was not enough ?
What are the alternatives ?
NOC
The Production Engineering Model
1. PEs are embedded within the software engineering teams
2. Taking part in meetings
3. Involved in roadmap plans
4. Reviewing diffs
5. Oncall - Software & Production Engineers
Onavo - Adopting The PE Model
» Protect user traffic using IPsec
» Protect against malicious sites
» Compress user traffic
» Control data leakage
Save, Measure & Protect your mobile data
a bit of context
Onavo
1. Founded at 2010
2. Classic Startup Dev & Ops teams
1. Dev - writes code
2. Ops - keeps the infra up & running
3. Acquired by Facebook at 2013
Making The Change - Step By Step
Step 1 - Go Sit Close/Next With The Developers
Step 2 - Get The Colleagues Onboard
Step 3 - Get Your Tooling Ready
you don’t want that “Confused Travolta” moment …
Have Good (short) Documentation
» Document your alerts
» Links to dashboards
» Links to third party software docs
» Runbooks - how to debug in prod
» log files, how to restart the service, getting stack traces & metrics …
» Links to config management
Dev Friendly Systems
avoid the graph porn …
Simple And Indicative Dashboards
1. Match the product KPIs
2. Strong signal
3. Intuitive titles
4. Easy to spot anomalies
5. Easy to find correlations
Step 4 - Review Your Alerts
rm -rf /all/false/alarms*
Refactor Your Alerts as Needed
» The first challenge is to make sure alerts are handled
» To make it possible every alert should be
» Indicate a real problem
» Clear to understand - Informative
» Impactful
» Actionable
Step 5 - Train The Team - Get Them Ready
learning is easy - remembering is hard
Train The Team
» Wiki / Doc based
» makes it easier to remember
» Hands-on Hands-on Hands-on
» Pre create task pool (even if low impact)
» Give oncall use cases & examples
» Reusable
Step 6 - Oncall + Hand Holding
make yourself available and adjust as you go
Shared Oncall
» Short oncall cycles, 1-2 days
» Increase the period each cycle
» Oncall Summaries
» Do oncall as well - set an example
» Preemptively check status with

the current oncall
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
√ Step 6 - Oncall + Hand Holding
√ Step 5 - Train The Team
√ Step 4 - Review Your Alerts
√ Step 3 - Get Your Tooling Ready
√ Step 2 - Get The Colleagues Onboard
√ Step 1 - Go Sit Close/Next With The Developers
The Steps
Questions?
Ran Leibman
Production Engineer
There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

Más contenido relacionado

Destacado

Production testing through monitoring
Production testing through monitoringProduction testing through monitoring
Production testing through monitoringLeon Fayer
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsBrendan Gregg
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Adrian Cockcroft
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's MonitoringBrian Overstreet
 
Bruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of AppsBruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of Appsbrucelawson
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanizecoreygoldberg
 
Continuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeContinuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeNicole Forsgren
 

Destacado (8)

Production testing through monitoring
Production testing through monitoringProduction testing through monitoring
Production testing through monitoring
 
SREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREsSREcon 2016 Performance Checklists for SREs
SREcon 2016 Performance Checklists for SREs
 
Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016Microservices Workshop All Topics Deck 2016
Microservices Workshop All Topics Deck 2016
 
Scaling Pinterest's Monitoring
Scaling Pinterest's MonitoringScaling Pinterest's Monitoring
Scaling Pinterest's Monitoring
 
Bruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of AppsBruce Lawson: Progressive Web Apps: the future of Apps
Bruce Lawson: Progressive Web Apps: the future of Apps
 
How to Speak "Manager"
How to Speak "Manager"How to Speak "Manager"
How to Speak "Manager"
 
Performance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-MechanizePerformance and Scalability Testing with Python and Multi-Mechanize
Performance and Scalability Testing with Python and Multi-Mechanize
 
Continuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps AwesomeContinuous Delivery: Making DevOps Awesome
Continuous Delivery: Making DevOps Awesome
 

Más de DevOpsDays Tel Aviv

YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...DevOpsDays Tel Aviv
 
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoGRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoDevOpsDays Tel Aviv
 
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...DevOpsDays Tel Aviv
 
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...DevOpsDays Tel Aviv
 
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogPRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogDevOpsDays Tel Aviv
 
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...DevOpsDays Tel Aviv
 
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGGDevOpsDays Tel Aviv
 
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...DevOpsDays Tel Aviv
 
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityTHE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityDevOpsDays Tel Aviv
 
THE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELTHE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELDevOpsDays Tel Aviv
 
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackCONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackDevOpsDays Tel Aviv
 
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapSOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapDevOpsDays Tel Aviv
 
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...DevOpsDays Tel Aviv
 
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHHOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHDevOpsDays Tel Aviv
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBDevOpsDays Tel Aviv
 
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaFLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaDevOpsDays Tel Aviv
 
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITYDevOpsDays Tel Aviv
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioDevOpsDays Tel Aviv
 
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryDevOpsDays Tel Aviv
 
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDevOpsDays Tel Aviv
 

Más de DevOpsDays Tel Aviv (20)

YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
YOUR OPEN SOURCE PROJECT IS LIKE A STARTUP, TREAT IT LIKE ONE, EYAR ZILBERMAN...
 
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, SaltoGRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
GRAPHQL TO THE RES(T)CUE, ELLA SHARAKANSKI, Salto
 
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
MICROSERVICES ABOVE THE CLOUD - DESIGNING THE INTERNATIONAL SPACE STATION FOR...
 
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
THE (IR)RATIONAL INCIDENT RESPONSE: HOW PSYCHOLOGICAL BIASES AFFECT INCIDENT ...
 
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDogPRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
PRINCIPLES OF OBSERVABILITY // DANIEL MAHER, DataDog
 
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
NUDGE AND SLUDGE: DRIVING SECURITY WITH DESIGN // J. WOLFGANG GOERLICH, Duo S...
 
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
(Ignite) TAKE A HIKE: PREVENTING BATTERY CORROSION - LEAH VOGEL, CHEGG
 
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
BUILDING A DR PLAN FOR YOUR CLOUD INFRASTRUCTURE FROM THE GROUND UP, MOSHE BE...
 
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider SecurityTHE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
THE THREE DISCIPLINES OF CI/CD SECURITY, DANIEL KRIVELEVICH, Cider Security
 
THE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABELTHE PLEASURES OF ON-PREM, TOMER GABEL
THE PLEASURES OF ON-PREM, TOMER GABEL
 
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPackCONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
CONFIGURATION MANAGEMENT IN THE CLOUD NATIVE ERA, SHAHAR MINTZ, EggPack
 
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, DeveleapSOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
SOLVING THE DEVOPS CRISIS, ONE PERSON AT A TIME, CHRISTINA BABITSKI, Develeap
 
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
OPTIMIZING PERFORMANCE USING CONTINUOUS PRODUCTION PROFILING ,YONATAN GOLDSCH...
 
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKHHOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
HOW TO SCALE YOUR ONCALL OPERATION, AND SURVIVE TO TELL, ANTON DRUKH
 
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearBHOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
HOW TO OPTIMIZE NON-CODING TIME, ORI KEREN, LinearB
 
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, IcingaFLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
FLYING BLIND - ACCESSIBILITY IN MONITORING, FEU MOUREK, Icinga
 
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
(Ignite) WHAT'S BURNING THROUGH YOUR CLOUD BILL - GIL BAHAT, CIDER SECURITY
 
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.ioSLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
SLO DRIVEN DEVELOPMENT, ALON NATIV, Tomorrow.io
 
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, AuguryONBOARDING IN LOCKDOWN, HILA FOX, Augury
ONBOARDING IN LOCKDOWN, HILA FOX, Augury
 
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, FireflyDON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
DON'T PANIC: GETTING YOUR INFRASTRUCTURE DRIFT UNDER CONTROL, ERAN BIBI, Firefly
 

Último

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Último (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

There is No Spoon - Ran Leibman, Facebook - DevOpsDays Tel Aviv 2016

  • 1. Production Engineering (PE) There Is No Spoon Ran Leibman Production Engineer
  • 2. Agenda 1. How Production Engineering was formed in Facebook 2. What we do in Onavo 3. How we moved to the PE model in Onavo 4. Q & A
  • 3. Facebook - Pre PE Days
  • 4. SRO - Site Reliability Operations 1. Keep the site up 24/7 2. Follow the sun 3. Capacity plans
  • 5. Why SRO was not enough ?
  • 6.
  • 7.
  • 8.
  • 9. What are the alternatives ?
  • 10. NOC
  • 11.
  • 12. The Production Engineering Model 1. PEs are embedded within the software engineering teams 2. Taking part in meetings 3. Involved in roadmap plans 4. Reviewing diffs 5. Oncall - Software & Production Engineers
  • 13. Onavo - Adopting The PE Model
  • 14. » Protect user traffic using IPsec » Protect against malicious sites » Compress user traffic » Control data leakage Save, Measure & Protect your mobile data
  • 15. a bit of context Onavo 1. Founded at 2010 2. Classic Startup Dev & Ops teams 1. Dev - writes code 2. Ops - keeps the infra up & running 3. Acquired by Facebook at 2013
  • 16.
  • 17. Making The Change - Step By Step
  • 18. Step 1 - Go Sit Close/Next With The Developers
  • 19. Step 2 - Get The Colleagues Onboard
  • 20.
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Step 3 - Get Your Tooling Ready
  • 31.
  • 32. you don’t want that “Confused Travolta” moment … Have Good (short) Documentation » Document your alerts » Links to dashboards » Links to third party software docs » Runbooks - how to debug in prod » log files, how to restart the service, getting stack traces & metrics … » Links to config management
  • 34. avoid the graph porn … Simple And Indicative Dashboards 1. Match the product KPIs 2. Strong signal 3. Intuitive titles 4. Easy to spot anomalies 5. Easy to find correlations
  • 35. Step 4 - Review Your Alerts
  • 36. rm -rf /all/false/alarms* Refactor Your Alerts as Needed » The first challenge is to make sure alerts are handled » To make it possible every alert should be » Indicate a real problem » Clear to understand - Informative » Impactful » Actionable
  • 37. Step 5 - Train The Team - Get Them Ready
  • 38. learning is easy - remembering is hard Train The Team » Wiki / Doc based » makes it easier to remember » Hands-on Hands-on Hands-on » Pre create task pool (even if low impact) » Give oncall use cases & examples » Reusable
  • 39. Step 6 - Oncall + Hand Holding
  • 40. make yourself available and adjust as you go Shared Oncall » Short oncall cycles, 1-2 days » Increase the period each cycle » Oncall Summaries » Do oncall as well - set an example » Preemptively check status with
 the current oncall
  • 41. √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 42. √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 43. √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 44. √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 45. √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 46. √ Step 6 - Oncall + Hand Holding √ Step 5 - Train The Team √ Step 4 - Review Your Alerts √ Step 3 - Get Your Tooling Ready √ Step 2 - Get The Colleagues Onboard √ Step 1 - Go Sit Close/Next With The Developers The Steps
  • 47.
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.