Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Vanessa Vuibert
Sta
ff
Production Engineer
Resilient Ka
f
ka: How DNS Tra
ff
ic Management
and Client Wrappers Ensure Availability
@V3_XD
862 14
Scale
Ka
f
ka brokers Ka
f
ka clusters
14M 9
Messages per sec GCP Regions
@V3_XD
• Maintenance
• Incidents
• Regionalize tra
ff
ic
Tra
ff
ic management use cases
Kubernetes (K8s) out of the box
🔓open source
Kafka broker
K8s out of the box
dig +short service.namespace.svc.cluster.local
IP0
IP1
IP2
K8s out of the box
bootstrap.servers=
service.namespace.svc.cluster.local:9092
K8s out of the box
dig +short pod2.service.namespace.svc.cluster.local
IP2
K8s out of the box
advertised.listeners=
pod2.service.namespace.svc.cluster.local:9092
• Readiness
• Startup
• Liveness
K8s StatefulSet: probes
dig +short service.namespace.svc.cluster.local
IP0
IP2
K8s readiness probe
dig +short service.namespace.svc.cluster.local
IP0
IP2
IP3
K8s readiness probe
not ready
publishNotReadyAddresses: true
Regional pairs
External tra
ff
ic: load balancers
External tra
ff
ic: load balancers
bootstrap.servers
External tra
ff
ic: load balancers
advertised.listeners
• Issues scaling
• Manual broker DNS
records
• Limited tra
ff
ic
control
Built automation with
k8s controllers.
Stateful buddy: load balancers
🔒closed source
Name buddy: DNS records
🔒closed source
Ka
f
ka access buddy: endpoints
🔒closed source
Ka
f
ka Access Buddy: consumer
Ka
f
ka Access Buddy: producer failover
east
- Elasticsearch on call
“Let me failover real quick.”
Faster failovers with a
DNS tra
ff
ic manager.
DNS tra
ff
ic manager
🔒closed source
DNS tra
ff
ic manager: normal
dig +short us-east1.somedomain.com
US-East1-IP
DNS tra
ff
ic manager: failover
dig +short us-east1.somedomain.com
US-Central1-IP
- A Ka
f
ka client
“DNS trickery.”
used to take
40
Minutes
now only takes
1
Minutes
Failover time savings
@V3_XD
Incident during
fl
ashsale
Failover during
fl
ashsale
US Central1 -> US East1
Reduced toil with
client wrappers.
• Failover reconnection
• Everything needed for connection
• Ruby, go and python
Client wrappers
K8s Deployment template: bootstrap.servers
K8s Deployment template: client ID
K8s Deployment template
Improved availability
with local consumers.
• More availability
• Reduced latency
• Reduced storage costs
• Reduced network costs
Local consumers
Aggregate consumer
Local consumers
Local consumers: DNS records
Aggregate
500
ms
Regional
20
ms
Latency 99th
@V3_XD
Connect directly
through private IPs.
• More secure
• Reduced network costs
• Fetch from closest replica: KIP
-
392
Public to private tra
ff
ic
Tra
ff
ic manager: pod IPs
Reduction
-6%
bill
Network represents
29%
bill
Network cost reduction
@V3_XD
• GKE 1.24 -> 1.25
incident
• Apply
f
irewall rules
• LB more secure for
public tra
ff
ic
Failover: pod IPs
Single stop shop with Multi-
Cluster Services (MCS).
MCS endpoints
🔒closed source
Tra
ff
ic sources
Regional pairs: uneven distribution
Regionalize tra
ff
ic: Ka
f
ka access buddy
east
Regionalize tra
ff
ic: MCS
40 18
MCS time savings
Minutes to regionalize tra
ff
ic Minutes to deploy
1 13
Minutes after migration Minutes after migration
@V3_XD
Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability
• Resiliency: DNS
tra
ff
ic management
• Toil: client wrappers
• Availability: local
consumption
Thanks!
@V3_XD
1 de 58

Recomendados

Keystone - ApacheCon 2016 por
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
301 vistas75 diapositivas
Capital One Delivers Risk Insights in Real Time with Stream Processing por
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
1.6K vistas53 diapositivas
From Three Nines to Five Nines - A Kafka Journey por
From Three Nines to Five Nines - A Kafka JourneyFrom Three Nines to Five Nines - A Kafka Journey
From Three Nines to Five Nines - A Kafka JourneyAllen (Xiaozhong) Wang
1.4K vistas39 diapositivas
Accelerated SDN in Azure por
Accelerated SDN in AzureAccelerated SDN in Azure
Accelerated SDN in AzureOpen Networking Summit
712 vistas25 diapositivas
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic... por
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...
LF_DPDK17_OpenNetVM: A high-performance NFV platforms to meet future communic...LF_DPDK
282 vistas23 diapositivas
Cloud Native SDN por
Cloud Native SDNCloud Native SDN
Cloud Native SDNRomana Project
1.9K vistas17 diapositivas

Más contenido relacionado

Similar a Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability

Uber Real Time Data Analytics por
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data AnalyticsAnkur Bansal
2.4K vistas71 diapositivas
In Flux Limiting for a multi-tenant logging service por
In Flux Limiting for a multi-tenant logging serviceIn Flux Limiting for a multi-tenant logging service
In Flux Limiting for a multi-tenant logging serviceDataWorks Summit/Hadoop Summit
1.4K vistas15 diapositivas
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 por
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Monal Daxini
1.2K vistas96 diapositivas
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 por
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022HostedbyConfluent
749 vistas27 diapositivas
DNS Survival Guide. por
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.Qrator Labs
102 vistas53 diapositivas
DNS Survival Guide por
DNS Survival GuideDNS Survival Guide
DNS Survival GuideAPNIC
403 vistas53 diapositivas

Similar a Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability(20)

Uber Real Time Data Analytics por Ankur Bansal
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
Ankur Bansal2.4K vistas
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015 por Monal Daxini
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Netflix Keystone Pipeline at Big Data Bootcamp, Santa Clara, Nov 2015
Monal Daxini1.2K vistas
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022 por HostedbyConfluent
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
Azure Event Hubs - Behind the Scenes With Kasun Indrasiri | Current 2022
HostedbyConfluent749 vistas
DNS Survival Guide. por Qrator Labs
DNS Survival Guide.DNS Survival Guide.
DNS Survival Guide.
Qrator Labs102 vistas
DNS Survival Guide por APNIC
DNS Survival GuideDNS Survival Guide
DNS Survival Guide
APNIC403 vistas
Experience with Kafka & Storm por Otto Mok
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
Otto Mok4.9K vistas
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris... por Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Battle Tested Event-Driven Patterns for your Microservices Architecture - Ris...
Natan Silnitsky143 vistas
Battle Tested Event-Driven Patterns for your Microservices Architecture por Natan Silnitsky
Battle Tested Event-Driven Patterns for your Microservices ArchitectureBattle Tested Event-Driven Patterns for your Microservices Architecture
Battle Tested Event-Driven Patterns for your Microservices Architecture
Natan Silnitsky170 vistas
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual... por Amazon Web Services
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
AWS re:Invent 2016: NextGen Networking: New Capabilities for Amazon’s Virtual...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn... por HostedbyConfluent
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
Sharing is Caring: Toward Creating Self-tuning Multi-tenant Kafka (Anna Povzn...
HostedbyConfluent1.4K vistas
Summit 16: Achieving Low Latency Network Function with Opnfv por OPNFV
Summit 16: Achieving Low Latency Network Function with OpnfvSummit 16: Achieving Low Latency Network Function with Opnfv
Summit 16: Achieving Low Latency Network Function with Opnfv
OPNFV816 vistas
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate por PROIDEA
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGatePLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PLNOG 9: Robert Dąbrowski - Carrier-grade NAT (CGN) Solution with FortiGate
PROIDEA245 vistas
Integrating OpenStack To Existing Infrastructure por Hui Cheng
Integrating OpenStack To Existing InfrastructureIntegrating OpenStack To Existing Infrastructure
Integrating OpenStack To Existing Infrastructure
Hui Cheng3.7K vistas
(BDT318) How Netflix Handles Up To 8 Million Events Per Second por Amazon Web Services
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
Amazon Web Services79.2K vistas
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc... por Docker, Inc.
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker Networking in Production at Visa - Sasi Kannappan, Visa and Mark Churc...
Docker, Inc.2.7K vistas
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning por Guido Schmutz
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & PartitioningApache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Apache Kafka - Event Sourcing, Monitoring, Librdkafka, Scaling & Partitioning
Guido Schmutz1.6K vistas
Practice of large Hadoop cluster in China Mobile por DataWorks Summit
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit796 vistas
ddsf-student-presentation_756205.pptx por ssuser498be2
ddsf-student-presentation_756205.pptxddsf-student-presentation_756205.pptx
ddsf-student-presentation_756205.pptx
ssuser498be22 vistas
FreeSWITCH as a Microservice por Evan McGee
FreeSWITCH as a MicroserviceFreeSWITCH as a Microservice
FreeSWITCH as a Microservice
Evan McGee3.4K vistas

Último

ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf por
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfAlhamduKure
10 vistas11 diapositivas
taylor-2005-classical-mechanics.pdf por
taylor-2005-classical-mechanics.pdftaylor-2005-classical-mechanics.pdf
taylor-2005-classical-mechanics.pdfArturoArreola10
37 vistas808 diapositivas
CCNA_questions_2021.pdf por
CCNA_questions_2021.pdfCCNA_questions_2021.pdf
CCNA_questions_2021.pdfVUPHUONGTHAO9
7 vistas196 diapositivas
MongoDB.pdf por
MongoDB.pdfMongoDB.pdf
MongoDB.pdfArthyR3
51 vistas6 diapositivas
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx por
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptxlwang78
314 vistas19 diapositivas
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... por
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...IJCNCJournal
5 vistas25 diapositivas

Último(20)

ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf por AlhamduKure
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdfASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
ASSIGNMENTS ON FUZZY LOGIC IN TRAFFIC FLOW.pdf
AlhamduKure10 vistas
taylor-2005-classical-mechanics.pdf por ArturoArreola10
taylor-2005-classical-mechanics.pdftaylor-2005-classical-mechanics.pdf
taylor-2005-classical-mechanics.pdf
ArturoArreola1037 vistas
MongoDB.pdf por ArthyR3
MongoDB.pdfMongoDB.pdf
MongoDB.pdf
ArthyR351 vistas
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx por lwang78
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
2023Dec ASU Wang NETR Group Research Focus and Facility Overview.pptx
lwang78314 vistas
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R... por IJCNCJournal
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
Trust Metric-Based Anomaly Detection via Deep Deterministic Policy Gradient R...
IJCNCJournal5 vistas
IRJET-Productivity Enhancement Using Method Study.pdf por SahilBavdhankar
IRJET-Productivity Enhancement Using Method Study.pdfIRJET-Productivity Enhancement Using Method Study.pdf
IRJET-Productivity Enhancement Using Method Study.pdf
SahilBavdhankar10 vistas
REACTJS.pdf por ArthyR3
REACTJS.pdfREACTJS.pdf
REACTJS.pdf
ArthyR339 vistas
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth por Innomantra
BCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for GrowthBCIC - Manufacturing Conclave -  Technology-Driven Manufacturing for Growth
BCIC - Manufacturing Conclave - Technology-Driven Manufacturing for Growth
Innomantra 22 vistas
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc... por csegroupvn
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
Design of Structures and Foundations for Vibrating Machines, Arya-ONeill-Pinc...
csegroupvn16 vistas
Unlocking Research Visibility.pdf por KhatirNaima
Unlocking Research Visibility.pdfUnlocking Research Visibility.pdf
Unlocking Research Visibility.pdf
KhatirNaima11 vistas
Basic Design Flow for Field Programmable Gate Arrays por Usha Mehta
Basic Design Flow for Field Programmable Gate ArraysBasic Design Flow for Field Programmable Gate Arrays
Basic Design Flow for Field Programmable Gate Arrays
Usha Mehta10 vistas
Web Dev Session 1.pptx por VedVekhande
Web Dev Session 1.pptxWeb Dev Session 1.pptx
Web Dev Session 1.pptx
VedVekhande23 vistas
Design_Discover_Develop_Campaign.pptx por ShivanshSeth6
Design_Discover_Develop_Campaign.pptxDesign_Discover_Develop_Campaign.pptx
Design_Discover_Develop_Campaign.pptx
ShivanshSeth656 vistas
GDSC Mikroskil Members Onboarding 2023.pdf por gdscmikroskil
GDSC Mikroskil Members Onboarding 2023.pdfGDSC Mikroskil Members Onboarding 2023.pdf
GDSC Mikroskil Members Onboarding 2023.pdf
gdscmikroskil72 vistas

Resilient Kafka: How DNS Traffic Management and Client Wrappers Ensure Availability