SlideShare una empresa de Scribd logo
1 de 35
TrafficShift: Avoiding Disasters at
Scale
Jeff Weiner
Chief Executive Officer
Michael Kehoe
Staff SRE
Anil Mallapur
Sr SRE
Today’s
agenda
1 Introductions
2 Evolution of the Infrastructure
3 Planning for Disaster
4 LinkedIn Traffic-Tier
5 TrafficShift
6 Load Testing
7 Q&A
Key Takeaways
• Design infrastructure to facilitate disaster
recovery
• Test regularly
• Automate everything
Introductions
World’s largest professional network
Largest global network
of professionals
500+M members
Serving users world-
wide
200+ Countries
Who are we?
PRODUCTION-SRE TEAM AT LINKEDIN
• Assist in restoring stability to services
during site-critical issues
• Develop applications to improve MTTD
and MTTR
• Provide direction and guidelines for site
monitoring
• Build tools for efficient site-issue
detection, correlation & troubleshooting,
Terminologies
Terminologies
• Fabric/Colo Data Center with full application stack deployed
• PoP/ Edge Entry point to LinkedIn network (TCP/ SSL
Termination)
• Load Test Planned stress testing of data centers
Evolution of the
Infrastructure
Evolution of the Infrastructure
2003 2010 2011 2013 2014 2017
Active &
Passive
Active &
Active
Multi-colo
3-way
Active &
Active
Multi-colo
n-way
Active &
Active
2017
4 Data Centers 13 PoPs 1000+ services
Planning for Disaster
Why care about Disasters ?
What are Disasters
Service
Degradation
Infrastructure
Issues
Human Error Data Center
on Fire
One Solution for all Disasters
• TrafficShift – Reroute user traffic to
different datacenters without any user
interruption.
LinkedIn Traffic-Tier
LinkedIn Traffic-Tier
Border
Router IPVS ATS ATS Frontend
EDGE FABRIC
Stickyrouting
LinkedIn Traffic-Tier
ATS
EDGE FABRIC
DC1
DC2
DC1 in Cookie
Got DC2 as primary fabric
Gets primary
fabric for user
Stickyrouting
LinkedIn Traffic-Tier
Fabric
Buckets
1
91
2 3 10
92 93 100
How Stickyrouting assigns users to a fabric?
Capacity of a
Datacenter
Geographic
distance to
users
Hadoop
Advantages of Stickyrouting
Less Latency Store data
where needed
Control over
capacity
TrafficShift
Site Traffic and Disaster Recovery
DC2 DC3
DC1
DC4
EDGE
30%
Distributed Load
50%
Distributed Load
50%
Distributed Load
10%
Distributed Load
Traffic stops being
served to offline
fabrics when we
mark buckets offline
Traffic is shifted to online
fabrics as ATS redirects
those users to their
secondary fabric
DC1
DC4
When to TrafficShift
Impact
Mitigation
Planned
Maintenance
Stress Test
TrafficShift Architecture
Web
application
Salt master
Stickyrouting
ServiceCouchbase Backend Worker
Processes
FABRIC
BUCKETS
Load Testing
What is Load Testing?
3x a week Peak hour traffic Fixed SLA
Load Testing
FABRIC
DC3
DC1 DC2
60%
Traffic
Percentage
Benefits of Load testing
Capacity
Planning
Stress Test Identify Bugs Confidence
Big Red Buttom
• Kill-switch for a datacenter
• Failout of a datacenter & PoP in minutes
• Minimal user impact
Key Takeaways
Key Takeaways
• Design infrastructure to facilitate disaster
recovery
• Stress test regularly to avoid surprises
• Automate everything to reduce time to
mitigate impact
Q & A
Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale

Más contenido relacionado

La actualidad más candente

Shanghai airship-project-update
Shanghai airship-project-updateShanghai airship-project-update
Shanghai airship-project-update
dkataria7
 

La actualidad más candente (20)

Do You Need A Service Mesh?
Do You Need A Service Mesh?Do You Need A Service Mesh?
Do You Need A Service Mesh?
 
Why modern cloud infrastructure require automation
Why modern cloud infrastructure require automationWhy modern cloud infrastructure require automation
Why modern cloud infrastructure require automation
 
Shanghai airship-project-update
Shanghai airship-project-updateShanghai airship-project-update
Shanghai airship-project-update
 
Monoliths to Microservices: App Transformation - Jacksonville Workshop Slides
Monoliths to Microservices: App Transformation - Jacksonville Workshop SlidesMonoliths to Microservices: App Transformation - Jacksonville Workshop Slides
Monoliths to Microservices: App Transformation - Jacksonville Workshop Slides
 
Microservices Architecture
Microservices ArchitectureMicroservices Architecture
Microservices Architecture
 
Don't Assume Your API Gateway is Ready for Microservices
Don't Assume Your API Gateway is Ready for MicroservicesDon't Assume Your API Gateway is Ready for Microservices
Don't Assume Your API Gateway is Ready for Microservices
 
The service mesh management plane
The service mesh management planeThe service mesh management plane
The service mesh management plane
 
Ammar Murtaza-IM
Ammar Murtaza-IMAmmar Murtaza-IM
Ammar Murtaza-IM
 
Mastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to MicroservicesMastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to Microservices
 
building microservices
building microservicesbuilding microservices
building microservices
 
Rapidly Updating Microservices
Rapidly Updating MicroservicesRapidly Updating Microservices
Rapidly Updating Microservices
 
About Microservices, Containers and their Underestimated Impact on Network Pe...
About Microservices, Containers and their Underestimated Impact on Network Pe...About Microservices, Containers and their Underestimated Impact on Network Pe...
About Microservices, Containers and their Underestimated Impact on Network Pe...
 
Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...
Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...
Transformation During a Global Pandemic | Ashish Pandit and Scott Lee, Univer...
 
Intro to Environment as a Service - Cloudify 5.0.5 Webinar
Intro to Environment as a Service - Cloudify 5.0.5 WebinarIntro to Environment as a Service - Cloudify 5.0.5 Webinar
Intro to Environment as a Service - Cloudify 5.0.5 Webinar
 
stackconf 2021 | Prometheus in 2021 and beyond
stackconf 2021 | Prometheus in 2021 and beyondstackconf 2021 | Prometheus in 2021 and beyond
stackconf 2021 | Prometheus in 2021 and beyond
 
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and StrategiesDigital Transformation: Highly Resilient Streaming Architecture and Strategies
Digital Transformation: Highly Resilient Streaming Architecture and Strategies
 
Monoliths to Microservices: App Transformation - introduction
Monoliths to Microservices: App Transformation - introductionMonoliths to Microservices: App Transformation - introduction
Monoliths to Microservices: App Transformation - introduction
 
Cloud Testing: The Future of software Testing
Cloud Testing: The Future of software TestingCloud Testing: The Future of software Testing
Cloud Testing: The Future of software Testing
 
Migrating from One Cloud Provider to Another (Without Losing Your Data or You...
Migrating from One Cloud Provider to Another (Without Losing Your Data or You...Migrating from One Cloud Provider to Another (Without Losing Your Data or You...
Migrating from One Cloud Provider to Another (Without Losing Your Data or You...
 
Devtest Orchestration for SDN & NFV
Devtest Orchestration for SDN & NFVDevtest Orchestration for SDN & NFV
Devtest Orchestration for SDN & NFV
 

Similar a Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale

5G-USA-Telemetry
5G-USA-Telemetry5G-USA-Telemetry
5G-USA-Telemetry
snrism
 

Similar a Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale (20)

Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
 
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
Risc and velostrata  2 28 2018 lessons_in_cloud_migrationRisc and velostrata  2 28 2018 lessons_in_cloud_migration
Risc and velostrata 2 28 2018 lessons_in_cloud_migration
 
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
SF Chaos Engineering Meetup: Building Disaster Recovery via Resilience Engine...
 
The evolution of data center network fabrics
The evolution of data center network fabricsThe evolution of data center network fabrics
The evolution of data center network fabrics
 
Lisa Guess - Embracing the Cloud
Lisa Guess - Embracing the CloudLisa Guess - Embracing the Cloud
Lisa Guess - Embracing the Cloud
 
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...Manage the Digital Transformation with Machine Learning in a Reactive Microse...
Manage the Digital Transformation with Machine Learning in a Reactive Microse...
 
Managing IT environment complexity in a Multi-Cloud World
Managing IT environment complexity in a Multi-Cloud WorldManaging IT environment complexity in a Multi-Cloud World
Managing IT environment complexity in a Multi-Cloud World
 
20-datacenter-measurements.pptx
20-datacenter-measurements.pptx20-datacenter-measurements.pptx
20-datacenter-measurements.pptx
 
Cisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready InfrastructureCisco’s Cloud Ready Infrastructure
Cisco’s Cloud Ready Infrastructure
 
SolarWinds Online Federal User Group
SolarWinds Online Federal User GroupSolarWinds Online Federal User Group
SolarWinds Online Federal User Group
 
cncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetescncf overview and building edge computing using kubernetes
cncf overview and building edge computing using kubernetes
 
Data Center Interconnects: An Overview
Data Center Interconnects: An OverviewData Center Interconnects: An Overview
Data Center Interconnects: An Overview
 
Meetup Microservices Commandments
Meetup Microservices CommandmentsMeetup Microservices Commandments
Meetup Microservices Commandments
 
iWAN - Cisco Application Experience Solution
iWAN - Cisco Application Experience SolutioniWAN - Cisco Application Experience Solution
iWAN - Cisco Application Experience Solution
 
Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained  Reactive Integrations - Caveats and bumps in the road explained
Reactive Integrations - Caveats and bumps in the road explained
 
Introduction to SDN
Introduction to SDNIntroduction to SDN
Introduction to SDN
 
5G-USA-Telemetry
5G-USA-Telemetry5G-USA-Telemetry
5G-USA-Telemetry
 
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
SDN 101: Software Defined Networking Course - Sameh Zaghloul/IBM - 2014
 
ONF & iSDX Webinar
ONF & iSDX WebinarONF & iSDX Webinar
ONF & iSDX Webinar
 
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...
Tech Talk: Leverage the combined power of CA Unified Infrastructure Managemen...
 

Más de Michael Kehoe

Más de Michael Kehoe (20)

eBPF Workshop
eBPF WorkshopeBPF Workshop
eBPF Workshop
 
eBPF Basics
eBPF BasicseBPF Basics
eBPF Basics
 
Code Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart wayCode Yellow: Helping operations top-heavy teams the smart way
Code Yellow: Helping operations top-heavy teams the smart way
 
QConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready ApplicationsQConSF 2018: Building Production-Ready Applications
QConSF 2018: Building Production-Ready Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
AllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortemsAllDayDevops: What the NTSB teaches us about incident management & postmortems
AllDayDevops: What the NTSB teaches us about incident management & postmortems
 
Linux Container Basics
Linux Container BasicsLinux Container Basics
Linux Container Basics
 
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet DropsPapers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
 
What the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortemsWhat the NTSB teaches us about incident management & postmortems
What the NTSB teaches us about incident management & postmortems
 
PyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python ApplicationsPyBay 2018: Production-Ready Python Applications
PyBay 2018: Production-Ready Python Applications
 
Helping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart wayHelping operations top-heavy teams the smart way
Helping operations top-heavy teams the smart way
 
The Next Wave of Reliability Engineering
The Next Wave of Reliability EngineeringThe Next Wave of Reliability Engineering
The Next Wave of Reliability Engineering
 
Building Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSFBuilding Production-Ready Microservices: DevopsExchangeSF
Building Production-Ready Microservices: DevopsExchangeSF
 
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
SRECon-Europe-2017: Reducing MTTR and False Escalations: Event Correlation at...
 
SRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREsSRECon-Europe-2017: Networks for SREs
SRECon-Europe-2017: Networks for SREs
 
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedInReducing MTTR and False Escalations: Event Correlation at LinkedIn
Reducing MTTR and False Escalations: Event Correlation at LinkedIn
 
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
APRICOT 2017: Trafficshifting: Avoiding Disasters & Improving Performance at ...
 
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedInCouchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
Couchbase Connect 2016: Monitoring Production Deployments The Tools – LinkedIn
 
Couchbase Connect 2016
Couchbase Connect 2016Couchbase Connect 2016
Couchbase Connect 2016
 
Using SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production SystemsUsing SaltStack to Auto Triage and Remediate Production Systems
Using SaltStack to Auto Triage and Remediate Production Systems
 

Último

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Christo Ananth
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
VIP Model Call Girls Kothrud ( Pune ) Call ON 8005736733 Starting From 5K to ...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...Bhosari ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For ...
Bhosari ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For ...
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 

Velocity San Jose 2017: Traffic shifts: Avoiding disasters at scale