SlideShare a Scribd company logo
1 of 54
Resilience and Compliance
at Speed and Scale
ISACA SV Spring Conference
Jason Chan
chan@netflix.com
linkedin.com/in/jasonbchan
@chanjbs
About Me
 Engineering Director @ Netflix:
 Security: product, app, ops, IR, fraud/abuse
 Previously:
 Led infosec team @ VMware
 Consultant - @stake, iSEC Partners
About Netflix
Common Approaches to Reslience
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed to standardize on
design patterns, vendors, etc.
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed to control and de-
risk change
 Focus on artifacts, test and
rollback plans
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Separate Ops team deploys at
a pre-ordained time (e.g.
weekly, monthly)
 Problems for Netflix:
 Freedom and Responsibility
Culture
 Highly aligned and loosely
coupled
 Innovation cycles
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 High reliance on vendor
solutions to provide HA and
resilience
 Problems for Netflix:
 Traditional data center oriented
systems do not translate well
to the cloud
 Heavy use of open source
Common Controls to Promote Resilience
 Architectural committees
 Change approval boards
 Centralized deployments
 Vendor-specific, component-
level HA
 Standards and checklists
 Designed for repeatable
execution
 Problems for Netflix:
 Reliance on humans
Approaches to Resilience @ Netflix
What does the business value?
 Customer experience
 Innovation and agility
 In other words:
 Stability and availability for customer experience
 Rapid development and change to continually improve product
and outpace competition
 Not that different from anyone else
Overall Approach
 Understand and solve for relevant failure modes
 Rely on automation and tools, not humans or
committees
 Make no assumptions that planned controls will work
 Provide train tracks and guardrails, but invite deviation
Goals of Simian Army
“Each system has to be able to succeed, no matter what, even all on its own.
We’re designing each distributed system to expect and tolerate failure from
other systems on which it depends.”
http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
Systems fail
Chaos Monkey
 “By frequently causing failures, we force our services to
be built in a way that is more resilient.”
 Terminates cluster nodes during business hours
 Rejects “If it ain’t broke, don’t fix it”
 Goals:
 Simulate random hardware failures, human error at small scale
 Identify weaknesses
 No service impact
Lots of systems fail
Chaos Gorilla
 Chaos Monkey’s bigger brother
 Standard deployment pattern is to distribute
load/systems/data across three data centers (AZs)
 What happens if one is lost?
 Goals:
 Simulate data center loss, hardware/service failures at larger
scale
 Identify weaknesses, dependencies, etc.
 Minimal service impact
What about larger catastrophes?
Chaos Kong
 Simulate an entire region (US west coast, US east coast)
failing
 For example – hurricane, large winter storm, earthquake, etc.
 Goals:
 Exercise end-to-end large-scale failover (routing, DNS, scaling
up)
The sick and wounded
Latency Monkey
 Distributed systems have many upstream/downstream
connections
 How fault-tolerant are systems to dependency
failure/slowdown?
 Goals:
 Simulate latencies and error codes, see how a service responds
 Survivable services regardless of dependencies
Outliers and rebels
Conformity Monkey
 Without architecture review, how do you ensure designs
leverage known successful patterns?
 Conformity Monkey provides automated analysis for
pattern adherence
 Goals:
 Evaluate deployment modes (data center distribution)
 Evaluate health checks, discoverability, versions of key libraries
 Help ensure service has best chance of successful operation
Cruft, junk, and clutter
Janitor Monkey
 Clutter accumulates, in the form of:
 Complexity
 Vulnerabilities
 Cost
 Janitor identifies unused resources and reaps them to
save money and reduce exposure
 Goals:
 Automated hygiene
 More freedom for engineers to innovate and move fast
Non-Simian Approaches
 Org model
 Engineers write, deploy, support code
 Culture
 De-centralized with as few processes and rules as possible
 Lots of local autonomy
 “If you’re not failing, you’re not trying hard enough”
 Peer pressure
 Productive and transparent incident reviews
Software Deployment for Compliance-Sensitive Apps
Control Objectives for Software Deployments
Visibility and transparency
 Who did what, when?
 What was the scope of the
change or deployment?
 Was it reviewed?
 Was it tested?
 Was it approved?
Typically attempted via:
 Restricted access/SoD
 CMDBs
 Change management
processes
 Test results
 Change windows
Large and Dynamic Systems Need a Different Approach
 No operations organization
 No acceptable windows for downtime
 Thousands of deployments and changes per day
Control Objectives Haven’t Changed
Visibility and transparency
 Who did what, when?
 What was the scope of the change or deployment?
 Was it reviewed?
 Was it tested?
 Was it approved?
System-wide view on changes
Access to changes by app,
region, environment, etc.
Lookback in time
as needed
Changes, via email
When?
By who?
What changed?
Integrated awareness
Chat integration
lets engineers
easily access info
Automated testing
1000+ tests to compare
proposed vs. existing
Automated scoring and
deployment decision
Complete view of deployment lifecycle
Jenkins
(CI) job
App name
Currently
running clusters
by
region/environm
ent
Cluster ID
Deployment
details
AMI version
SCM commit
Modified
files
Source
diffs
Link to
relevant
JIRA(s)
Takeaway
 Control objectives have not changed, but advantages of
new technologies and operational models dictate
updated approaches
Netflix References
 http://netflix.github.com
 http://techblog.netflix.com
 http://slideshare.net/netflix
Questions?
chan@netflix.com

More Related Content

What's hot

Securing Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsSecuring Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsAmazon Web Services
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software DevelopmentDevOps.com
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case StudyAndy Hoernecke
 
Overcoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsOvercoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsAlert Logic
 
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...CloudPassage
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsAmazon Web Services
 
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016Shannon Lietz
 
Cloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSACloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSAShannon Lietz
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015Shannon Lietz
 
DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationUtkarsh Pandey
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilitySylvain Hellegouarch
 
Shared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudShared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudAlert Logic
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015Shannon Lietz
 
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkFrom Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkAmazon Web Services
 
DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017kieranjacobsen
 
CSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudCSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudAlert Logic
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsAlert Logic
 

What's hot (20)

Securing Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOpsSecuring Systems at Cloud Scale with DevSecOps
Securing Systems at Cloud Scale with DevSecOps
 
Introduction to DevSecOps
Introduction to DevSecOpsIntroduction to DevSecOps
Introduction to DevSecOps
 
Security at the Speed of Software Development
Security at the Speed of Software DevelopmentSecurity at the Speed of Software Development
Security at the Speed of Software Development
 
Proactive Security AppSec Case Study
Proactive Security AppSec Case StudyProactive Security AppSec Case Study
Proactive Security AppSec Case Study
 
Overcoming Security Challenges in DevOps
Overcoming Security Challenges in DevOpsOvercoming Security Challenges in DevOps
Overcoming Security Challenges in DevOps
 
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...Best Practices for Workload Security: Securing Servers in Modern Data Center ...
Best Practices for Workload Security: Securing Servers in Modern Data Center ...
 
Maturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOpsMaturing your organization from DevOps to DevSecOps
Maturing your organization from DevOps to DevSecOps
 
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
A Throwaway Deck for Cloud Security Essentials 2.0 delivered at RSA 2016
 
Cloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSACloud Security Essentials 2.0 at RSA
Cloud Security Essentials 2.0 at RSA
 
Implementing DevSecOps
Implementing DevSecOpsImplementing DevSecOps
Implementing DevSecOps
 
ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015ISACA Ireland Keynote 2015
ISACA Ireland Keynote 2015
 
DevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With AutomationDevOps In Azure: Deliver Value With Automation
DevOps In Azure: Deliver Value With Automation
 
Chaos Engineering and Systems Reliability
Chaos Engineering and Systems ReliabilityChaos Engineering and Systems Reliability
Chaos Engineering and Systems Reliability
 
Shared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure CloudShared Security Responsibility for the Azure Cloud
Shared Security Responsibility for the Azure Cloud
 
DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015DevSecCon KeyNote London 2015
DevSecCon KeyNote London 2015
 
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance FrameworkFrom Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
From Zero to ATO: A Step-by-Step Guide on the DoD Compliance Framework
 
Azure Security Center
Azure Security CenterAzure Security Center
Azure Security Center
 
DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017DevSecOps - CrikeyCon 2017
DevSecOps - CrikeyCon 2017
 
CSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the CloudCSS17: Atlanta - Realities of Security in the Cloud
CSS17: Atlanta - Realities of Security in the Cloud
 
Managed Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS ApplicationsManaged Threat Detection & Response for AWS Applications
Managed Threat Detection & Response for AWS Applications
 

Viewers also liked

Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services SecurityJason Chan
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security AutomationJason Chan
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from AbuseJason Chan
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedJason Chan
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud SecurityJason Chan
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security AutomationJason Chan
 
Careers in Security
Careers in SecurityCareers in Security
Careers in SecurityJason Chan
 
Real World Cloud Application Security
Real World Cloud Application SecurityReal World Cloud Application Security
Real World Cloud Application SecurityJason Chan
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooAlex Stamos
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesJason Chan
 
Cloud Security @ Netflix
Cloud Security @ NetflixCloud Security @ Netflix
Cloud Security @ NetflixJason Chan
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalaspyker
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integrationaspyker
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowAdrian Cockcroft
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Sourceaspyker
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4aspyker
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveJason Chan
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesMatt McCarthy
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thingaspyker
 

Viewers also liked (20)

Amazon Web Services Security
Amazon Web Services SecurityAmazon Web Services Security
Amazon Web Services Security
 
The Psychology of Security Automation
The Psychology of Security AutomationThe Psychology of Security Automation
The Psychology of Security Automation
 
Defending Netflix from Abuse
Defending Netflix from AbuseDefending Netflix from Abuse
Defending Netflix from Abuse
 
Cloud Application Security: Lessons Learned
Cloud Application Security: Lessons LearnedCloud Application Security: Lessons Learned
Cloud Application Security: Lessons Learned
 
Practical Cloud Security
Practical Cloud SecurityPractical Cloud Security
Practical Cloud Security
 
Practical Security Automation
Practical Security AutomationPractical Security Automation
Practical Security Automation
 
Careers in Security
Careers in SecurityCareers in Security
Careers in Security
 
Real World Cloud Application Security
Real World Cloud Application SecurityReal World Cloud Application Security
Real World Cloud Application Security
 
Security at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at YahooSecurity at Scale - Lessons from Six Months at Yahoo
Security at Scale - Lessons from Six Months at Yahoo
 
Analyze System and Code Interactions
Analyze System and Code InteractionsAnalyze System and Code Interactions
Analyze System and Code Interactions
 
Virtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit PerspectivesVirtualization: Security and IT Audit Perspectives
Virtualization: Security and IT Audit Perspectives
 
Cloud Security @ Netflix
Cloud Security @ NetflixCloud Security @ Netflix
Cloud Security @ Netflix
 
Ibm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinalIbm cloud nativenetflixossfinal
Ibm cloud nativenetflixossfinal
 
Re:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS IntegrationRe:invent 2016 Container Scheduling, Execution and AWS Integration
Re:invent 2016 Container Scheduling, Execution and AWS Integration
 
Netflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search RoadshowNetflix Global Applications - NoSQL Search Roadshow
Netflix Global Applications - NoSQL Search Roadshow
 
Netflix Cloud Platform and Open Source
Netflix Cloud Platform and Open SourceNetflix Cloud Platform and Open Source
Netflix Cloud Platform and Open Source
 
Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4Netflix OSS Meetup Season 4 Episode 4
Netflix OSS Meetup Season 4 Episode 4
 
AWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's PerspectiveAWS Security: A Practitioner's Perspective
AWS Security: A Practitioner's Perspective
 
Netflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV DevicesNetflix Webkit-Based UI for TV Devices
Netflix Webkit-Based UI for TV Devices
 
Netflix and Containers: Not A Stranger Thing
Netflix and Containers:  Not A Stranger ThingNetflix and Containers:  Not A Stranger Thing
Netflix and Containers: Not A Stranger Thing
 

Similar to Resilience and Compliance at Speed and Scale

Enterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseEnterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseIBM UrbanCode Products
 
Dev ops developer (session 3)
Dev ops developer (session 3)Dev ops developer (session 3)
Dev ops developer (session 3)MSDEVMTL
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...agilemaine
 
Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategyDaniel Giraldo
 
Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersInfo-Tech Research Group
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auevanbottcher
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven TestingJorge Boria
 
WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopAmazon Web Services
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous deliveryMasas Dani
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014Brad Power
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)John Pape
 
ალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itunihack
 
Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Siddhesh Bhobe
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsMicrosoft Developer Norway
 
Large scale agile development practices
Large scale agile development practicesLarge scale agile development practices
Large scale agile development practicesSkills Matter
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! ReloadedCodemotion
 
Encontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesEncontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesSoftware Guru
 
Curiosity Software Presents: Modelling for Continuous Testing
Curiosity Software Presents: Modelling for Continuous TestingCuriosity Software Presents: Modelling for Continuous Testing
Curiosity Software Presents: Modelling for Continuous TestingCuriosity Software Ireland
 

Similar to Resilience and Compliance at Speed and Scale (20)

Enterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, ReleaseEnterprise DevOps: Scaling Build, Deploy, Test, Release
Enterprise DevOps: Scaling Build, Deploy, Test, Release
 
Dev ops developer (session 3)
Dev ops developer (session 3)Dev ops developer (session 3)
Dev ops developer (session 3)
 
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
Continuous Delivery and Continuous Agile by Andy Singleton - Agile Maine Day...
 
Implementing a testing strategy
Implementing a testing strategyImplementing a testing strategy
Implementing a testing strategy
 
Use DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End CustomersUse DevOps to Respond Faster to End Customers
Use DevOps to Respond Faster to End Customers
 
From Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.auFrom Monoliths to Microservices at Realestate.com.au
From Monoliths to Microservices at Realestate.com.au
 
Risk Driven Testing
Risk Driven TestingRisk Driven Testing
Risk Driven Testing
 
WKS402 Well-Architected Workshop
WKS402 Well-Architected WorkshopWKS402 Well-Architected Workshop
WKS402 Well-Architected Workshop
 
Continuous delivery
Continuous deliveryContinuous delivery
Continuous delivery
 
Andy singleton continuous delivery-fcb - nov 2014
Andy singleton   continuous delivery-fcb - nov 2014Andy singleton   continuous delivery-fcb - nov 2014
Andy singleton continuous delivery-fcb - nov 2014
 
Best practice adoption (and lack there of)
Best practice adoption (and lack there of)Best practice adoption (and lack there of)
Best practice adoption (and lack there of)
 
ალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release itალექსანდრე ნემსაძე - Release it
ალექსანდრე ნემსაძე - Release it
 
Enterprise DevOps
Enterprise DevOpsEnterprise DevOps
Enterprise DevOps
 
Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3Anti Patterns Siddhesh Lecture2 Of3
Anti Patterns Siddhesh Lecture2 Of3
 
DevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operationsDevOps Roadshow - removing barriers between development and operations
DevOps Roadshow - removing barriers between development and operations
 
Large scale agile development practices
Large scale agile development practicesLarge scale agile development practices
Large scale agile development practices
 
Raise the Bar! Reloaded
Raise the Bar! ReloadedRaise the Bar! Reloaded
Raise the Bar! Reloaded
 
Raise the bar! Reloaded
Raise the bar! ReloadedRaise the bar! Reloaded
Raise the bar! Reloaded
 
Encontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de AplicacionesEncontrando la Aguja en el Rendimiento de Aplicaciones
Encontrando la Aguja en el Rendimiento de Aplicaciones
 
Curiosity Software Presents: Modelling for Continuous Testing
Curiosity Software Presents: Modelling for Continuous TestingCuriosity Software Presents: Modelling for Continuous Testing
Curiosity Software Presents: Modelling for Continuous Testing
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Resilience and Compliance at Speed and Scale

  • 1. Resilience and Compliance at Speed and Scale ISACA SV Spring Conference Jason Chan chan@netflix.com linkedin.com/in/jasonbchan @chanjbs
  • 2. About Me  Engineering Director @ Netflix:  Security: product, app, ops, IR, fraud/abuse  Previously:  Led infosec team @ VMware  Consultant - @stake, iSEC Partners
  • 5. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed to standardize on design patterns, vendors, etc.  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 6. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed to control and de- risk change  Focus on artifacts, test and rollback plans  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 7. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Separate Ops team deploys at a pre-ordained time (e.g. weekly, monthly)  Problems for Netflix:  Freedom and Responsibility Culture  Highly aligned and loosely coupled  Innovation cycles
  • 8. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  High reliance on vendor solutions to provide HA and resilience  Problems for Netflix:  Traditional data center oriented systems do not translate well to the cloud  Heavy use of open source
  • 9. Common Controls to Promote Resilience  Architectural committees  Change approval boards  Centralized deployments  Vendor-specific, component- level HA  Standards and checklists  Designed for repeatable execution  Problems for Netflix:  Reliance on humans
  • 11. What does the business value?  Customer experience  Innovation and agility  In other words:  Stability and availability for customer experience  Rapid development and change to continually improve product and outpace competition  Not that different from anyone else
  • 12. Overall Approach  Understand and solve for relevant failure modes  Rely on automation and tools, not humans or committees  Make no assumptions that planned controls will work  Provide train tracks and guardrails, but invite deviation
  • 13.
  • 14. Goals of Simian Army “Each system has to be able to succeed, no matter what, even all on its own. We’re designing each distributed system to expect and tolerate failure from other systems on which it depends.” http://techblog.netflix.com/2010/12/5-lessons-weve-learned-using-aws.html
  • 16.
  • 17. Chaos Monkey  “By frequently causing failures, we force our services to be built in a way that is more resilient.”  Terminates cluster nodes during business hours  Rejects “If it ain’t broke, don’t fix it”  Goals:  Simulate random hardware failures, human error at small scale  Identify weaknesses  No service impact
  • 19.
  • 20. Chaos Gorilla  Chaos Monkey’s bigger brother  Standard deployment pattern is to distribute load/systems/data across three data centers (AZs)  What happens if one is lost?  Goals:  Simulate data center loss, hardware/service failures at larger scale  Identify weaknesses, dependencies, etc.  Minimal service impact
  • 21. What about larger catastrophes?
  • 22.
  • 23. Chaos Kong  Simulate an entire region (US west coast, US east coast) failing  For example – hurricane, large winter storm, earthquake, etc.  Goals:  Exercise end-to-end large-scale failover (routing, DNS, scaling up)
  • 24. The sick and wounded
  • 25.
  • 26. Latency Monkey  Distributed systems have many upstream/downstream connections  How fault-tolerant are systems to dependency failure/slowdown?  Goals:  Simulate latencies and error codes, see how a service responds  Survivable services regardless of dependencies
  • 28.
  • 29. Conformity Monkey  Without architecture review, how do you ensure designs leverage known successful patterns?  Conformity Monkey provides automated analysis for pattern adherence  Goals:  Evaluate deployment modes (data center distribution)  Evaluate health checks, discoverability, versions of key libraries  Help ensure service has best chance of successful operation
  • 30. Cruft, junk, and clutter
  • 31.
  • 32. Janitor Monkey  Clutter accumulates, in the form of:  Complexity  Vulnerabilities  Cost  Janitor identifies unused resources and reaps them to save money and reduce exposure  Goals:  Automated hygiene  More freedom for engineers to innovate and move fast
  • 33. Non-Simian Approaches  Org model  Engineers write, deploy, support code  Culture  De-centralized with as few processes and rules as possible  Lots of local autonomy  “If you’re not failing, you’re not trying hard enough”  Peer pressure  Productive and transparent incident reviews
  • 34. Software Deployment for Compliance-Sensitive Apps
  • 35. Control Objectives for Software Deployments Visibility and transparency  Who did what, when?  What was the scope of the change or deployment?  Was it reviewed?  Was it tested?  Was it approved? Typically attempted via:  Restricted access/SoD  CMDBs  Change management processes  Test results  Change windows
  • 36. Large and Dynamic Systems Need a Different Approach  No operations organization  No acceptable windows for downtime  Thousands of deployments and changes per day
  • 37. Control Objectives Haven’t Changed Visibility and transparency  Who did what, when?  What was the scope of the change or deployment?  Was it reviewed?  Was it tested?  Was it approved?
  • 39. Access to changes by app, region, environment, etc. Lookback in time as needed
  • 45.
  • 46. 1000+ tests to compare proposed vs. existing Automated scoring and deployment decision
  • 47. Complete view of deployment lifecycle
  • 48. Jenkins (CI) job App name Currently running clusters by region/environm ent
  • 51.
  • 52. Takeaway  Control objectives have not changed, but advantages of new technologies and operational models dictate updated approaches
  • 53. Netflix References  http://netflix.github.com  http://techblog.netflix.com  http://slideshare.net/netflix