Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

Balance Kafka Cluster with Zero Data Movement
Yaodong Yang (Apple), Haochen Li (Apple)
Yaodong Yang, Apple Inc. May, 2023
Haochen Li, Apple Inc. NOT A CONTRIBUTION
Balance Kafka Cluster with Zero
Data Movement
Kafka Cluster Load Balancing
• Bene
fi
ts
• High Performance
• Cost E
ffi
ciency
• Determining Factors
• Kafka Partition Placement
• Kafka Partition Access Pattern
• Challenges
• Kafka Partitions are Heterogenous
• Storage Retention Requirement
• Produce & Consume Tra
ffi
c Pattern
Current Solution
• Continuously rebalance Kafka cluster based on Load Metrics
• collect the load metrics from Kafka
• generate the cluster load model
• compute the optimization proposal
• execute the proposal
• Overhead
• data movement between di
ff
erent brokers
• negative impact for producers and consumers
• long time to
fi
nish (hours or even days)
• infra cost
Data Ingestion Use Case
• Workload Pattern
• Data events are randomly assigned
to partitions from the kafka topic
• All partitions from one topic are
consumed evenly
• Kafka producers and consumers
don’t have strict requirement for
Kafka Partition Count
• Kafka Partitions from the same topic
• Same data volumes produced,
consumed and retained
Kafka Partition Replica Placement
• Partition Replica Placement Strategy
• Partition Count
• scale_number: Number of Leader Replica per broker
for a topic
• partition_count = scale_number * broker_count
• Partition Replica Placement
• For every Kafka Topic, the number of Replicas in each
broker should be the same.
• For every Kafka Topic, the number of Leader Replicas in
each broker should be the same.
• Same load on individual Kafka Brokers.
• Same hardware utilization on individual Kafka brokers
• CPU
• Storage Volume
• Network
Scenarios
• New Topic Creation
• Generate the Replica Assignment for the new topic
• Create the topic in the Kafka cluster with the above Replica Assignment
Scenarios
• Increase Partition Count: scale_number increase
• Generate the Replica Assignment for the new partitions
• Create partitions in the Kafka cluster with the above Replica Assignment
Scenarios
• Add more brokers
• Generate the Replica Assignment for partitions in new brokers
• Create partitions in the Kafka cluster with the above Replica Assignment
Scenarios
• Ingestion tra
ffi
c volume and retention changes
• no impact on the load balance of Kafka Cluster
• Remove some brokers
• data movement is unavoidable
• avoid it with cluster migration if possible
• Cluster Migration & Merge
• rebalance the cluster:
• partition reassignment
• scale_number increase
Implementation
• Current
• Implemented as a Topic Operator
• Deployed in production
• Plan
• Open a KIP in Apache Kafka Project
• Contribute back to upstream
Take Away
• Partition Placement Strategy can greatly improve the Load Balance of Kafka
Clusters
Thank you!
1 de 13

Recomendados

Building High-Throughput, Low-Latency Pipelines in Kafka por
Building High-Throughput, Low-Latency Pipelines in KafkaBuilding High-Throughput, Low-Latency Pipelines in Kafka
Building High-Throughput, Low-Latency Pipelines in Kafkaconfluent
3.5K vistas36 diapositivas
Fundamentals of Apache Kafka por
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache KafkaChhavi Parasher
1K vistas65 diapositivas
Real time data pipline with kafka streams por
Real time data pipline with kafka streamsReal time data pipline with kafka streams
Real time data pipline with kafka streamsYoni Farin
86 vistas25 diapositivas
Stacktician - CloudStack Collab Conference 2014 por
Stacktician - CloudStack Collab Conference 2014Stacktician - CloudStack Collab Conference 2014
Stacktician - CloudStack Collab Conference 2014amoghvk
434 vistas31 diapositivas
Stateful streaming and the challenge of state por
Stateful streaming and the challenge of stateStateful streaming and the challenge of state
Stateful streaming and the challenge of stateYoni Farin
40 vistas27 diapositivas
Fundamentals and Architecture of Apache Kafka por
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
322 vistas30 diapositivas

Más contenido relacionado

Similar a Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... por
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
266 vistas57 diapositivas
Tuning kafka pipelines por
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelinesSumant Tambe
4.9K vistas36 diapositivas
DevOps in Silos por
DevOps in SilosDevOps in Silos
DevOps in SilosKellyn Pot'Vin-Gorman
273 vistas30 diapositivas
Graphene – Microsoft SCOPE on Tez por
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
657 vistas25 diapositivas
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf por
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfssusere05ec21
5 vistas25 diapositivas
Kafka presentation por
Kafka presentationKafka presentation
Kafka presentationMohammed Fazuluddin
9.2K vistas18 diapositivas

Similar a Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang(20)

Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli... por Flink Forward
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Flink Forward266 vistas
Tuning kafka pipelines por Sumant Tambe
Tuning kafka pipelinesTuning kafka pipelines
Tuning kafka pipelines
Sumant Tambe4.9K vistas
Graphene – Microsoft SCOPE on Tez por DataWorks Summit
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
DataWorks Summit657 vistas
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf por ssusere05ec21
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdfimpalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
impalapresentation-130130105033-phpapp02 (1)_221220_235919.pdf
ssusere05ec215 vistas
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap... por Nitin Kumar
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Kafka meetup seattle 2019 mirus reliable, high performance replication for ap...
Nitin Kumar392 vistas
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications por Data Con LA
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Data Con LA1.1K vistas
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example por confluent
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
confluent2.7K vistas
Kafka streams decoupling with stores por Yoni Farin
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
Yoni Farin123 vistas
Consensus in Apache Kafka: From Theory to Production.pdf por Guozhang Wang
Consensus in Apache Kafka: From Theory to Production.pdfConsensus in Apache Kafka: From Theory to Production.pdf
Consensus in Apache Kafka: From Theory to Production.pdf
Guozhang Wang11 vistas
Scaling with sync_replication using Galera and EC2 por Marco Tusa
Scaling with sync_replication using Galera and EC2Scaling with sync_replication using Galera and EC2
Scaling with sync_replication using Galera and EC2
Marco Tusa2.6K vistas
Composable Futures with Akka 2.0 por Mike Slinn
Composable Futures with Akka 2.0Composable Futures with Akka 2.0
Composable Futures with Akka 2.0
Mike Slinn4.5K vistas
Introducing Venice por Yan Yan
Introducing VeniceIntroducing Venice
Introducing Venice
Yan Yan287 vistas
Apache Big Data Europe 2015: Selected Talks por Andrii Gakhov
Apache Big Data Europe 2015: Selected TalksApache Big Data Europe 2015: Selected Talks
Apache Big Data Europe 2015: Selected Talks
Andrii Gakhov716 vistas
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream... por Erik Onnen
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Erik Onnen3.9K vistas

Más de HostedbyConfluent

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams por
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsHostedbyConfluent
88 vistas26 diapositivas
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ... por
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...HostedbyConfluent
53 vistas84 diapositivas
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ... por
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...HostedbyConfluent
82 vistas97 diapositivas
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern... por
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...HostedbyConfluent
64 vistas15 diapositivas
Rule Based Asset Management Workflow Automation at Netflix por
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at NetflixHostedbyConfluent
41 vistas56 diapositivas
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML... por
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...HostedbyConfluent
72 vistas32 diapositivas

Más de HostedbyConfluent(20)

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams por HostedbyConfluent
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka Streams
HostedbyConfluent88 vistas
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ... por HostedbyConfluent
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
HostedbyConfluent53 vistas
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ... por HostedbyConfluent
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
HostedbyConfluent82 vistas
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern... por HostedbyConfluent
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
HostedbyConfluent64 vistas
Rule Based Asset Management Workflow Automation at Netflix por HostedbyConfluent
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at Netflix
HostedbyConfluent41 vistas
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML... por HostedbyConfluent
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
HostedbyConfluent72 vistas
Indeed Flex: The Story of a Revolutionary Recruitment Platform por HostedbyConfluent
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment Platform
HostedbyConfluent40 vistas
Forecasting Kafka Lag Issues with Machine Learning por HostedbyConfluent
Forecasting Kafka Lag Issues with Machine LearningForecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine Learning
HostedbyConfluent31 vistas
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U... por HostedbyConfluent
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
HostedbyConfluent43 vistas
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre... por HostedbyConfluent
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent45 vistas
Accelerating Path to Production for Generative AI-powered Applications por HostedbyConfluent
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
HostedbyConfluent74 vistas
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited... por HostedbyConfluent
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
HostedbyConfluent42 vistas
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad... por HostedbyConfluent
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
HostedbyConfluent58 vistas
Go Big or Go Home: Approaching Kafka Replication at Scale por HostedbyConfluent
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at Scale
HostedbyConfluent39 vistas
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2 por HostedbyConfluent
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
HostedbyConfluent37 vistas
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid por HostedbyConfluent
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and DruidA Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
HostedbyConfluent94 vistas
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python por HostedbyConfluent
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark PythonFrom Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
HostedbyConfluent86 vistas
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite... por HostedbyConfluent
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
HostedbyConfluent66 vistas
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K... por HostedbyConfluent
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
HostedbyConfluent83 vistas

Último

LLMs in Production: Tooling, Process, and Team Structure por
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team StructureAggregage
65 vistas77 diapositivas
AI + Memoori = AIM por
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIMMemoori
15 vistas9 diapositivas
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf por
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfMichaelOLeary82
13 vistas74 diapositivas
Telenity Solutions Brief por
Telenity Solutions BriefTelenity Solutions Brief
Telenity Solutions BriefMustafa Kuğu
14 vistas10 diapositivas
Cocktail of Environments. How to Mix Test and Development Environments and St... por
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...Aleksandr Tarasov
26 vistas135 diapositivas
Business Analyst Series 2023 - Week 4 Session 8 por
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8DianaGray10
180 vistas13 diapositivas

Último(20)

LLMs in Production: Tooling, Process, and Team Structure por Aggregage
LLMs in Production: Tooling, Process, and Team StructureLLMs in Production: Tooling, Process, and Team Structure
LLMs in Production: Tooling, Process, and Team Structure
Aggregage65 vistas
AI + Memoori = AIM por Memoori
AI + Memoori = AIMAI + Memoori = AIM
AI + Memoori = AIM
Memoori15 vistas
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf por MichaelOLeary82
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdfAdopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
Adopting Karpenter for Cost and Simplicity at Grafana Labs.pdf
MichaelOLeary8213 vistas
Cocktail of Environments. How to Mix Test and Development Environments and St... por Aleksandr Tarasov
Cocktail of Environments. How to Mix Test and Development Environments and St...Cocktail of Environments. How to Mix Test and Development Environments and St...
Cocktail of Environments. How to Mix Test and Development Environments and St...
Aleksandr Tarasov26 vistas
Business Analyst Series 2023 - Week 4 Session 8 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10180 vistas
The Power of Generative AI in Accelerating No Code Adoption.pdf por Saeed Al Dhaheri
The Power of Generative AI in Accelerating No Code Adoption.pdfThe Power of Generative AI in Accelerating No Code Adoption.pdf
The Power of Generative AI in Accelerating No Code Adoption.pdf
Saeed Al Dhaheri44 vistas
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... por ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue120 vistas
GDSC GLAU Info Session.pptx por gauriverrma4
GDSC GLAU Info Session.pptxGDSC GLAU Info Session.pptx
GDSC GLAU Info Session.pptx
gauriverrma415 vistas
NTGapps NTG LowCode Platform por Mustafa Kuğu
NTGapps NTG LowCode Platform NTGapps NTG LowCode Platform
NTGapps NTG LowCode Platform
Mustafa Kuğu474 vistas
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De... por Moses Kemibaro
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Don’t Make A Human Do A Robot’s Job! : 6 Reasons Why AI Will Save Us & Not De...
Moses Kemibaro38 vistas
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023 por BookNet Canada
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
Redefining the book supply chain: A glimpse into the future - Tech Forum 2023
BookNet Canada46 vistas
Initiating and Advancing Your Strategic GIS Governance Strategy por Safe Software
Initiating and Advancing Your Strategic GIS Governance StrategyInitiating and Advancing Your Strategic GIS Governance Strategy
Initiating and Advancing Your Strategic GIS Governance Strategy
Safe Software198 vistas
AIM102-S_Cognizant_CognizantCognitive por PhilipBasford
AIM102-S_Cognizant_CognizantCognitiveAIM102-S_Cognizant_CognizantCognitive
AIM102-S_Cognizant_CognizantCognitive
PhilipBasford23 vistas
Optimizing Communication to Optimize Human Behavior - LCBM por Yaman Kumar
Optimizing Communication to Optimize Human Behavior - LCBMOptimizing Communication to Optimize Human Behavior - LCBM
Optimizing Communication to Optimize Human Behavior - LCBM
Yaman Kumar39 vistas
Innovation & Entrepreneurship strategies in Dairy Industry por PervaizDar1
Innovation & Entrepreneurship strategies in Dairy IndustryInnovation & Entrepreneurship strategies in Dairy Industry
Innovation & Entrepreneurship strategies in Dairy Industry
PervaizDar139 vistas
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell por Fwdays
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
"Node.js vs workers — A comparison of two JavaScript runtimes", James M Snell
Fwdays14 vistas

Balance Kafka Cluster with Zero Data Movement with Haochen Li & Yaodong Yang

  • 1. Balance Kafka Cluster with Zero Data Movement Yaodong Yang (Apple), Haochen Li (Apple)
  • 2. Yaodong Yang, Apple Inc. May, 2023 Haochen Li, Apple Inc. NOT A CONTRIBUTION Balance Kafka Cluster with Zero Data Movement
  • 3. Kafka Cluster Load Balancing • Bene fi ts • High Performance • Cost E ffi ciency • Determining Factors • Kafka Partition Placement • Kafka Partition Access Pattern • Challenges • Kafka Partitions are Heterogenous • Storage Retention Requirement • Produce & Consume Tra ffi c Pattern
  • 4. Current Solution • Continuously rebalance Kafka cluster based on Load Metrics • collect the load metrics from Kafka • generate the cluster load model • compute the optimization proposal • execute the proposal • Overhead • data movement between di ff erent brokers • negative impact for producers and consumers • long time to fi nish (hours or even days) • infra cost
  • 5. Data Ingestion Use Case • Workload Pattern • Data events are randomly assigned to partitions from the kafka topic • All partitions from one topic are consumed evenly • Kafka producers and consumers don’t have strict requirement for Kafka Partition Count • Kafka Partitions from the same topic • Same data volumes produced, consumed and retained
  • 6. Kafka Partition Replica Placement • Partition Replica Placement Strategy • Partition Count • scale_number: Number of Leader Replica per broker for a topic • partition_count = scale_number * broker_count • Partition Replica Placement • For every Kafka Topic, the number of Replicas in each broker should be the same. • For every Kafka Topic, the number of Leader Replicas in each broker should be the same. • Same load on individual Kafka Brokers. • Same hardware utilization on individual Kafka brokers • CPU • Storage Volume • Network
  • 7. Scenarios • New Topic Creation • Generate the Replica Assignment for the new topic • Create the topic in the Kafka cluster with the above Replica Assignment
  • 8. Scenarios • Increase Partition Count: scale_number increase • Generate the Replica Assignment for the new partitions • Create partitions in the Kafka cluster with the above Replica Assignment
  • 9. Scenarios • Add more brokers • Generate the Replica Assignment for partitions in new brokers • Create partitions in the Kafka cluster with the above Replica Assignment
  • 10. Scenarios • Ingestion tra ffi c volume and retention changes • no impact on the load balance of Kafka Cluster • Remove some brokers • data movement is unavoidable • avoid it with cluster migration if possible • Cluster Migration & Merge • rebalance the cluster: • partition reassignment • scale_number increase
  • 11. Implementation • Current • Implemented as a Topic Operator • Deployed in production • Plan • Open a KIP in Apache Kafka Project • Contribute back to upstream
  • 12. Take Away • Partition Placement Strategy can greatly improve the Load Balance of Kafka Clusters