SlideShare una empresa de Scribd logo
1 de 65
Resilience Planning and how the
empire strikes back
Bhakti Mehta
@bhakti_mehta
Introduction
• Senior Software Engineer at Blue Jeans
Network
• Worked at Sun Microsystems/Oracle for 13
years
• Committer to numerous open source projects
including GlassFish Application Server
My recent book
Previous book
Blue Jeans Network
Blue Jeans Network
• Video conferencing in the cloud
• Customers in all segments
• Millions of users
• Interoperable
• Video sharing, Content sharing
• Mobile friendly
• Solutions for large scale events
What you will learn
• Blue Jeans architecture
• Challenges at scale
• Lessons learned, tips and practices to prevent
cascading failures
• Resilience planning at various stages
• Real world examples
Customer B
Top level architecture
INTERNET
Customer A
SIP, H.323
HTTP / HTTPS
MediaNode
Web Server
Middleware
services
Cache
Servicediscovery
Messaging
DB
Proxy
layer
Connector Node
Micro services architecture
Path to Micro services
• Advantages
– Simplicity
– Isolation of problems
– Scale up and scale down
– Easy deployment
– Clear separation of concerns
– Heterogeneity and polyglotism
Microservices
• Disadvantages
– Not a free lunch!
– Distributed systems prone to failures
– Eventual consistency
– More effort in terms of deployments, release
managements
– Challenges in testing the various services evolving
independently, regression tests etc
Resilient system
• Processes transactions, even when there are
transient impulses, persistent stresses
• Functions even when there are component
failures disrupting normal processing
• Accepts failures will happen
• Designs for crumple zones
Kinds of failures
• Challenges at scale
• Integration point failures
– Network errors
– Semantic errors.
– Slow responses
– Outright hang
– GC issues
Anticipate failures at scale
• Anticipate growth
• Design for next order of magnitude
• Design for 10x plan to rewrite for 100x
Resiliency planning Stage 1
• When developing code
– Avoiding Cascading failures
• Circuit breaker
• Timeouts
• Retry
• Bulkhead
• Cache optimizations
– Avoid malicious clients
• Rate limiting
Resiliency planning Stage 2
• Planning for dealing with failures before
deploy
– load test
– a/b test
– longevity
Resiliency planning Stage 3
• Watching out for failures after deploy
– health check
– metrics
Cascading failures
Caused by Chain reactions
For example
One node in a load balance group fails
Others need to pick up work
Eventually performance can degenerate
Cascading failures with aggregation
Cascading failure with aggregation
Timeouts
• Clients may prefer a response
– failure
– success
– job queued for later
All aggregation requests to microservices should
have reasonable timeouts set
Types of Timeouts
• Connection timeout
– Max time before connection can be established or
Error
• Socket timeout
– Max time of inactivity between two packets once
connection is established
Timeouts pattern
• Timeouts + Retries go together
• Transient failures can be remedied with fast
retries
• However problems in network can last for a
while so probability of retries failing
Timeouts in code
In JAX-RS
Client client = ClientBuilder.newClient();
client.property(ClientProperties.CONNECT_TIMEOUT, 5000);
client.property(ClientProperties.READ_TIMEOUT, 5000)
Retry pattern
• Retry for failures in case of network failures,
timeouts or server errors
• Helps transient network errors such as
dropped connections or server fail over
Retry pattern
• If one of the services is slow or malfunctioning
and other services keep retrying then the
problem becomes worse
• Solution
– Exponential backoff
– Circuit breaker pattern
Circuit breaker pattern
Circuit breaker A circuit breaker is an electrical device used in an
electrical panel that monitors and controls the amount of amperes
(amps) being sent through
Circuit breaker pattern
• Safety device
• If a power surge occurs in the electrical wiring,
the breaker will trip.
• Flips from “On” to “Off” and shuts electrical
power from that breaker
Circuit breaker
• Netflix Hystrix follows circuit breaker pattern
• If a service’s error rate exceeds a threshold it
will trip the circuit breaker and block the
requests for a specific period of time
Bulkhead
Bulkhead
• Avoiding chain reactions by isolating failures
• Helps prevent cascading failures
Bulkhead
• An example of bulkhead could be isolating the
database dependencies per service
• Similarly other infrastructure components can
be isolated such as cache infrastructure
Rate Limiting
• Restricting the number of requests that can be
made by a client
• Client can be identified based on the access
token used
• Additionally clients can be identified based on
IP address
Rate Limiting
• With JAX-RS Rate limiting can be implemented
as a filter
• This filter can check the access count for a
client and if within limit accept the request
• Else throw a 429 Error
• Code at https://github.com/bhakti-
mehta/samples/tree/master/ratelimiting
Cache optimizations
• Stores response information related to
requests in a temporary storage for a specific
period of time
• Ensures that server is not burdened
processing those requests in future when
responses can be fulfilled from the cache
Cache optimizations
Getting from first level cache
Getting from second
level cache
Getting from the DB
Dealing with latencies in response
• Have a timeout for the aggregation service
• Dispatch requests in parallel and collect
responses
• Associate a priority with all the responses
collected
Handling partial failures best practices
• One service calls another which can be slow or
unavailable
• Never block indefinitely waiting for the service
• Try to return partial results
• Provide a caching layer and return cached
data
Asynchronous Patterns
• Pattern to deal with long running jobs
• Some resources may take longer time to
provide results
• Not needing client to wait for the response
Reactive programming model
• Use reactive programming such as
CompletableFuture in Java 8, ListenableFuture
• Rx Java
Asynchronous API
• Reactive patterns
• Message Passing
– Akka actor model
• Message queues
– Communication between services via shared
message queues
– Websockets
Logging
• Complex distributed systems introduce many
points of failure
• Logging helps link events/transactions between
various components that make an application or
a business service
• ELK stack
• Splunk, syslog
• Loggly
• LogEntries
Logging best practices
• Include detailed, consistent pattern across
service logs
• Obfuscate sensitive data
• Identify caller or initiator as part of logs
• Do not log payloads by default
Best practices when designing APIs for
mobile clients
– Avoid chattiness
– Use aggregator pattern
Resilience planning Stage 2
• Before deploy
– Load testing
– Longevity testing
– Capacity planning
Load testing
• Ensure that you test for load on APIs
– Jmeter
• Plan for longevity testing
Capacity Planning
• Anticipate growth
• Design for handling exponential growth
Resilience planning Stage 3
• After deploy
– Health check
– Metrics
– Phased rollout of features
Health Check
• Memory
• CPU
• Threads
• Error rate
• If any of the checks exceed a threshold send
alert
Monitoring
Monitoring
server
Production Environment
CHECKS
ALERTS
Email
Monitoring Stack
• Log Aggregation frameworkApplication
• Newrelic (Java, Python)
OS / Application
Code
• Collectd / GraphiteNetwork, Server
IcingaHealthchecks
Metrics
• Response times, throughput
– Identify slow running DB queries
• GC rate and pause duration
– Garbage collection can cause slow responses
• Monitor unusual activity
• Third party library metrics
– For example Couchbase hits
– atop
Metrics
• Load average
• Uptime
• Log sizes
Rollout of new features
• Phasing rollout of new features
• Have a way to turn features off if not behaving
as expected
• Alerts and more alerts!
Real time examples
• Netflix's Simian Army induces failures of
services and even datacenters during the
working day to test both the application's
resilience and monitoring.
• Latency Monkey to simulate slow running
requests
• Wiremock to mock services
• Saboteur to create deliberate network
mayhem
Takeaway
• Inevitability of failures
– Expect systems will fail
– Failure prevention
References
• https://commons.wikimedia.org/wiki/File:Bulkhead_PSF.png
• https://en.wikipedia.org/wiki/Circuit_breaker#/media/File:Four_1_pole_circuit_breakers_fitted_in_a_met
er_box.jpg
• https://www.flickr.com/photos/skynoir/ Beer in hand: skynoir/Flickr/Creative Commons License
Questions
• Twitter: @bhakti_mehta
• Email: bhakti@bluejeans.com

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

MQ Appliance - Intro and 8.0.0.5 updates
MQ Appliance - Intro and 8.0.0.5 updatesMQ Appliance - Intro and 8.0.0.5 updates
MQ Appliance - Intro and 8.0.0.5 updates
 
Effective admin and development in iib
Effective admin and development in iibEffective admin and development in iib
Effective admin and development in iib
 
Building a Highly available messaging hub using the IBM MQ Appliance
Building a Highly available messaging hub using the IBM MQ ApplianceBuilding a Highly available messaging hub using the IBM MQ Appliance
Building a Highly available messaging hub using the IBM MQ Appliance
 
IBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplifiedIBM MQ Appliance - Administration simplified
IBM MQ Appliance - Administration simplified
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
 
3433 IBM messaging security why securing your environment is important-feb2...
3433   IBM messaging security why securing your environment is important-feb2...3433   IBM messaging security why securing your environment is important-feb2...
3433 IBM messaging security why securing your environment is important-feb2...
 
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryIBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
 
Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...Building block development in managed hosting - Angelo Rossi, Manager, Comple...
Building block development in managed hosting - Angelo Rossi, Manager, Comple...
 
Cs556 section2
Cs556 section2Cs556 section2
Cs556 section2
 
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
Load and Performance Testing for J2EE - Testing, monitoring and reporting usi...
 
IBM Integration Bus High Availability Overview
IBM Integration Bus High Availability OverviewIBM Integration Bus High Availability Overview
IBM Integration Bus High Availability Overview
 
SHARE2016: DevOps - IIB Administration for Continuous Delivery and DevOps
SHARE2016:  DevOps - IIB Administration for Continuous Delivery and DevOpsSHARE2016:  DevOps - IIB Administration for Continuous Delivery and DevOps
SHARE2016: DevOps - IIB Administration for Continuous Delivery and DevOps
 
Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...Hhm 3474 mq messaging technologies and support for high availability and acti...
Hhm 3474 mq messaging technologies and support for high availability and acti...
 
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
IBM IMPACT 2014 - AMC-1882 Building a Scalable & Continuously Available IBM M...
 
IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
Disaster Recovery: Is Your iSeries Recoverable?
Disaster Recovery: Is Your iSeries Recoverable?Disaster Recovery: Is Your iSeries Recoverable?
Disaster Recovery: Is Your iSeries Recoverable?
 
Understanding mq deployment choices and use cases
Understanding mq deployment choices and use casesUnderstanding mq deployment choices and use cases
Understanding mq deployment choices and use cases
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
Client Server Model and Distributed Computing
Client Server Model and Distributed ComputingClient Server Model and Distributed Computing
Client Server Model and Distributed Computing
 
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
Expanding your options with the IBM MQ Appliance - IBM InterConnect 2016
 

Destacado

Con fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhaktiCon fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhakti
Bhakti Mehta
 

Destacado (6)

Con fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhaktiCon fess 2013-sse-websockets-json-bhakti
Con fess 2013-sse-websockets-json-bhakti
 
Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...Expect the unexpected: Anticipate and prepare for failures in microservices b...
Expect the unexpected: Anticipate and prepare for failures in microservices b...
 
Real world RESTful service development problems and solutions
Real world RESTful service development problems and solutionsReal world RESTful service development problems and solutions
Real world RESTful service development problems and solutions
 
50 tips50minutes
50 tips50minutes50 tips50minutes
50 tips50minutes
 
Think async
Think asyncThink async
Think async
 
Let if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and moreLet if flow: Java 8 Streams puzzles and more
Let if flow: Java 8 Streams puzzles and more
 

Similar a Resilience planning and how the empire strikes back

Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability Tuning
Andres March
 

Similar a Resilience planning and how the empire strikes back (20)

Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15Microservices for java architects it-symposium-2015-09-15
Microservices for java architects it-symposium-2015-09-15
 
Stay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolithStay productive_while_slicing_up_the_monolith
Stay productive_while_slicing_up_the_monolith
 
WebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck ThreadsWebLogic Stability; Detect and Analyse Stuck Threads
WebLogic Stability; Detect and Analyse Stuck Threads
 
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
Mieke Gevers - Performance Testing in 5 Steps - A Guideline to a Successful L...
 
Software Architecture for Cloud Infrastructure
Software Architecture for Cloud InfrastructureSoftware Architecture for Cloud Infrastructure
Software Architecture for Cloud Infrastructure
 
Kafka PPT.pptx
Kafka PPT.pptxKafka PPT.pptx
Kafka PPT.pptx
 
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
 
Cloud based dlms cosem metering head end
Cloud based dlms cosem metering head endCloud based dlms cosem metering head end
Cloud based dlms cosem metering head end
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Production Ready Microservices at Scale
Production Ready Microservices at ScaleProduction Ready Microservices at Scale
Production Ready Microservices at Scale
 
Performance and Scalability Tuning
Performance and Scalability TuningPerformance and Scalability Tuning
Performance and Scalability Tuning
 
Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014Performance tuning Grails applications SpringOne 2GX 2014
Performance tuning Grails applications SpringOne 2GX 2014
 
Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...Debugging Microservices - key challenges and techniques - Microservices Odesa...
Debugging Microservices - key challenges and techniques - Microservices Odesa...
 
Tech talk microservices debugging
Tech talk microservices debuggingTech talk microservices debugging
Tech talk microservices debugging
 
Micro service architecture
Micro service architecture  Micro service architecture
Micro service architecture
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Technical Architectures
Technical ArchitecturesTechnical Architectures
Technical Architectures
 
Mma 10g r2_936
Mma 10g r2_936Mma 10g r2_936
Mma 10g r2_936
 
Ncerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssmNcerc rlmca202 adm m4 ssm
Ncerc rlmca202 adm m4 ssm
 

Último

Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
dollysharma2066
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ankushspencer015
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
Navigating Complexity: The Role of Trusted Partners and VIAS3D in Dassault Sy...
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 

Resilience planning and how the empire strikes back

  • 1. Resilience Planning and how the empire strikes back Bhakti Mehta @bhakti_mehta
  • 2. Introduction • Senior Software Engineer at Blue Jeans Network • Worked at Sun Microsystems/Oracle for 13 years • Committer to numerous open source projects including GlassFish Application Server
  • 6. Blue Jeans Network • Video conferencing in the cloud • Customers in all segments • Millions of users • Interoperable • Video sharing, Content sharing • Mobile friendly • Solutions for large scale events
  • 7. What you will learn • Blue Jeans architecture • Challenges at scale • Lessons learned, tips and practices to prevent cascading failures • Resilience planning at various stages • Real world examples
  • 8. Customer B Top level architecture INTERNET Customer A SIP, H.323 HTTP / HTTPS MediaNode Web Server Middleware services Cache Servicediscovery Messaging DB Proxy layer Connector Node
  • 10. Path to Micro services • Advantages – Simplicity – Isolation of problems – Scale up and scale down – Easy deployment – Clear separation of concerns – Heterogeneity and polyglotism
  • 11. Microservices • Disadvantages – Not a free lunch! – Distributed systems prone to failures – Eventual consistency – More effort in terms of deployments, release managements – Challenges in testing the various services evolving independently, regression tests etc
  • 12. Resilient system • Processes transactions, even when there are transient impulses, persistent stresses • Functions even when there are component failures disrupting normal processing • Accepts failures will happen • Designs for crumple zones
  • 13. Kinds of failures • Challenges at scale • Integration point failures – Network errors – Semantic errors. – Slow responses – Outright hang – GC issues
  • 14.
  • 15.
  • 16. Anticipate failures at scale • Anticipate growth • Design for next order of magnitude • Design for 10x plan to rewrite for 100x
  • 17. Resiliency planning Stage 1 • When developing code – Avoiding Cascading failures • Circuit breaker • Timeouts • Retry • Bulkhead • Cache optimizations – Avoid malicious clients • Rate limiting
  • 18. Resiliency planning Stage 2 • Planning for dealing with failures before deploy – load test – a/b test – longevity
  • 19. Resiliency planning Stage 3 • Watching out for failures after deploy – health check – metrics
  • 20.
  • 21. Cascading failures Caused by Chain reactions For example One node in a load balance group fails Others need to pick up work Eventually performance can degenerate
  • 22. Cascading failures with aggregation
  • 23. Cascading failure with aggregation
  • 24.
  • 25. Timeouts • Clients may prefer a response – failure – success – job queued for later All aggregation requests to microservices should have reasonable timeouts set
  • 26. Types of Timeouts • Connection timeout – Max time before connection can be established or Error • Socket timeout – Max time of inactivity between two packets once connection is established
  • 27. Timeouts pattern • Timeouts + Retries go together • Transient failures can be remedied with fast retries • However problems in network can last for a while so probability of retries failing
  • 28. Timeouts in code In JAX-RS Client client = ClientBuilder.newClient(); client.property(ClientProperties.CONNECT_TIMEOUT, 5000); client.property(ClientProperties.READ_TIMEOUT, 5000)
  • 29. Retry pattern • Retry for failures in case of network failures, timeouts or server errors • Helps transient network errors such as dropped connections or server fail over
  • 30. Retry pattern • If one of the services is slow or malfunctioning and other services keep retrying then the problem becomes worse • Solution – Exponential backoff – Circuit breaker pattern
  • 31. Circuit breaker pattern Circuit breaker A circuit breaker is an electrical device used in an electrical panel that monitors and controls the amount of amperes (amps) being sent through
  • 32. Circuit breaker pattern • Safety device • If a power surge occurs in the electrical wiring, the breaker will trip. • Flips from “On” to “Off” and shuts electrical power from that breaker
  • 33. Circuit breaker • Netflix Hystrix follows circuit breaker pattern • If a service’s error rate exceeds a threshold it will trip the circuit breaker and block the requests for a specific period of time
  • 35. Bulkhead • Avoiding chain reactions by isolating failures • Helps prevent cascading failures
  • 36. Bulkhead • An example of bulkhead could be isolating the database dependencies per service • Similarly other infrastructure components can be isolated such as cache infrastructure
  • 37. Rate Limiting • Restricting the number of requests that can be made by a client • Client can be identified based on the access token used • Additionally clients can be identified based on IP address
  • 38. Rate Limiting • With JAX-RS Rate limiting can be implemented as a filter • This filter can check the access count for a client and if within limit accept the request • Else throw a 429 Error • Code at https://github.com/bhakti- mehta/samples/tree/master/ratelimiting
  • 39. Cache optimizations • Stores response information related to requests in a temporary storage for a specific period of time • Ensures that server is not burdened processing those requests in future when responses can be fulfilled from the cache
  • 40. Cache optimizations Getting from first level cache Getting from second level cache Getting from the DB
  • 41. Dealing with latencies in response • Have a timeout for the aggregation service • Dispatch requests in parallel and collect responses • Associate a priority with all the responses collected
  • 42. Handling partial failures best practices • One service calls another which can be slow or unavailable • Never block indefinitely waiting for the service • Try to return partial results • Provide a caching layer and return cached data
  • 43. Asynchronous Patterns • Pattern to deal with long running jobs • Some resources may take longer time to provide results • Not needing client to wait for the response
  • 44. Reactive programming model • Use reactive programming such as CompletableFuture in Java 8, ListenableFuture • Rx Java
  • 45. Asynchronous API • Reactive patterns • Message Passing – Akka actor model • Message queues – Communication between services via shared message queues – Websockets
  • 46. Logging • Complex distributed systems introduce many points of failure • Logging helps link events/transactions between various components that make an application or a business service • ELK stack • Splunk, syslog • Loggly • LogEntries
  • 47. Logging best practices • Include detailed, consistent pattern across service logs • Obfuscate sensitive data • Identify caller or initiator as part of logs • Do not log payloads by default
  • 48. Best practices when designing APIs for mobile clients – Avoid chattiness – Use aggregator pattern
  • 49. Resilience planning Stage 2 • Before deploy – Load testing – Longevity testing – Capacity planning
  • 50. Load testing • Ensure that you test for load on APIs – Jmeter • Plan for longevity testing
  • 51. Capacity Planning • Anticipate growth • Design for handling exponential growth
  • 52. Resilience planning Stage 3 • After deploy – Health check – Metrics – Phased rollout of features
  • 53.
  • 54. Health Check • Memory • CPU • Threads • Error rate • If any of the checks exceed a threshold send alert
  • 55.
  • 57. Monitoring Stack • Log Aggregation frameworkApplication • Newrelic (Java, Python) OS / Application Code • Collectd / GraphiteNetwork, Server IcingaHealthchecks
  • 58. Metrics • Response times, throughput – Identify slow running DB queries • GC rate and pause duration – Garbage collection can cause slow responses • Monitor unusual activity • Third party library metrics – For example Couchbase hits – atop
  • 59. Metrics • Load average • Uptime • Log sizes
  • 60. Rollout of new features • Phasing rollout of new features • Have a way to turn features off if not behaving as expected • Alerts and more alerts!
  • 61. Real time examples • Netflix's Simian Army induces failures of services and even datacenters during the working day to test both the application's resilience and monitoring. • Latency Monkey to simulate slow running requests • Wiremock to mock services • Saboteur to create deliberate network mayhem
  • 62. Takeaway • Inevitability of failures – Expect systems will fail – Failure prevention
  • 63.
  • 65. Questions • Twitter: @bhakti_mehta • Email: bhakti@bluejeans.com

Notas del editor

  1. A little bit on my background beforre we begin. I am currently a senior sw engineer at BJN I worked at Sun Microsystems for 10 and Oracle for 3 years I m a committer in numerous open source projects most notable of them is gf
  2. Whats up with the name? if you think my name was sassy my employer is better than me They want video conferencing to be as ubiquitous and widely used as a favorite pair of jeans We do video conferencing in the cloud As this picture suggests we support various devices, room systems, mobile offering and desktop options too ustomers in the space of client services, education, entertainment, IT healthcare legal It is a cloud based service for content sharing video sharing collaborationonference room video systems, most companies have desktop software like Microsoft Lync or Cisco Jabber for internal chat and video. Employees also bring mobile devices to work. Blue Jeans enables all of these devices and services to connect to the same video meeting for simple, any-device collaboratio imple scheduling, including integration with Microsoft Outlook and Google Calendar Click to join meetings from email invitation Intuitive in-meeting controls to mute/unmute, share/view content, change layouts, view participants
  3. I find your lack of faith disturning
  4. Apache HttpClient and other network clients implement some stability features out of the box. For instance, the client might execute retries internally under some circumstances. This strategy helps to handle transient network errors such as dropped connections or server failovers. Retrying will not help in the case of permanent errors, however. In this case retrying wastes resource and time on both the client and server side