SlideShare una empresa de Scribd logo
1 de 47
Perfectly Balanced,
as All Streams Should Be
John Roesler
vvcephei@apache.org
https://s.apache.org/perfectly-balanced-streams
Problem 1: Cluster workload becomes skewed after adding nodes
Problem 2: Long restoration/rebalance pauses after adding nodes
builder
.stream("sentences")
.flatMapValues(whitespaceSplitter)
.groupBy((k, v) -> v)
.count()
.toStream()
.to("word-counts");
sentences
split
repartition
count
word-counts
“the quick brown fox”
“it was the best of times”
“it was the worst of times”
best brown fox it it of of quick
the the the times times was was
worst
1 best 1 brown 1 fox
2 it 2 of 1 quick
3 the 2 times 2 was
1 worst
sentences
split
repartition
count
word-counts
sentences-0
split
repartition-1
count
word-counts
sentences-1
split
repartition-0
count
repartition-2
count
sentences-0
split
repartition-1
count
word-counts
sentences-1
split
repartition-0
count
repartition-2
count
Subtopology 0
Subtopology 1
sentences-0
split
repartition-1
count
word-counts
sentences-1
split
repartition-0
count
repartition-2
count
Task 0_0
Task 1_0
Task 0_1
Task 1_1 Task 1_2
sentences-0
split
repartition-1
count
word-counts
sentences-1
split
repartition-0
count
repartition-2
count
Task 0_0
Task 1_0
Task 0_1
Task 1_1 Task 1_2
Task 0_0
Task 1_0
Task 0_1
Task 1_1 Task 1_2
Task 0_0
Task 1_0
Task 0_1
Task 1_1 Task 1_2
Task 1_2 Task 1_0 Task 1_1
Active Stateless
Active Stateful
Standby (Stateful)
Pop Quiz: Is it balanced?
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Balance Checklist
● Balance the overall number of tasks
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Balance Checklist
● Balance the overall number of tasks
● Balance the active tasks
Task 0_0
Task 0_1
Task 1_0
Task 1_1
Task 1_2
Host 1 Host 2
Task 0_2
Balance Checklist
● Balance the overall number of tasks
● Balance the active tasks
● Balance stateful tasks
Task 0_0 Task 0_1
Task 1_0 Task 1_1
Task 1_2 Task 1_2
Task 1_0 Task 1_1
Host 1 Host 2
Assignment Checklist
● Balance the overall number of tasks
● Balance the active tasks
● Balance stateful tasks
● For each task, assign standby to different hosts
than active
Task 1_0
Task 1_1
Task 1_2
Host 1 Host 2
Task 0_0
Task 0_1
Task 0_1
Assignment Checklist
● Balance the overall number of tasks
● Balance the active tasks
● Balance stateful tasks
● For each task, assign standby to different hosts
than active
● Balance partitions for each task across nodes
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Problem 1: Cluster workload becomes skewed after adding nodes
Problem 2: Long restoration/rebalance pauses after adding nodes
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Problem 1: Cluster workload becomes skewed after adding nodes
Host 3
Task 0_0
Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Problem 1: Cluster workload becomes skewed after adding nodes
Host 3
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Problem 2: Long restoration/rebalance pauses after adding nodes
Host 3
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_0Task 1_1
Host 1 Host 2
Problem 2: Long restoration/rebalance pauses after adding nodes
Host 3
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Solution: Warm up the new host before moving stateful tasks
Host 3
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Solution: Warm up the new host before moving stateful tasks
Host 3
Task 1_2
Task 1_0
(probing rebalance)
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2 Task 1_2
Task 1_0
Task 1_1
Host 1 Host 2
Solution: Warm up the new host before moving stateful tasks
Host 3
Task 1_2
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_1
Host 1 Host 2
Solution: Warm up the new host before moving stateful tasks
Host 3
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_1
Host 1 Host 2
What if we lose a node?
Host 3
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Task 1_1
Host 1 Host 2
What if we lose a node?
Host 3
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2
Host 1 Host 2
What if we lose a node?
Host 3
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2 Task 1_1
Host 1 Host 2
What if we lose a node?
Host 3
Task 1_0
Later on, Host 2 comes back
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2 Task 1_1
Host 1 Host 2
What if we lose a node? (Recovery)
Host 3
Task 1_0
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2 Task 1_1
Host 1 Host 2
What if we lose a node? (Recovery)
Host 3
Task 1_0
Task 1_2
Task 1_1
(probing rebalance)
Task 0_0 Task 0_1
Task 1_0
Task 1_1
Task 1_2
Task 1_2 Task 1_1
Host 1 Host 2
What if we lose a node? (Recovery)
Host 3
Task 1_0
Task 1_2
Task 1_1
Task 0_0 Task 0_1
Task 1_0 Task 1_2
Task 1_1
Task 1_1
Host 1 Host 2
What if we lose a node? (Recovery)
Host 3
Task 1_0
Task 1_2
Configs to care about
num_standbys
acceptable_recovery_lag
probing_rebalance_interval_ms
max_warmup_replicas
Other tips
Register a StateRestoreListener to monitor progress:
KafkaStreams#setGlobalStateRestoreListener
onRestoreStart(
TopicPartition topicPartition,
String storeName,
long startingOffset,
long endingOffset
);
onBatchRestored(
TopicPartition topicPartition,
String storeName,
long batchEndOffset,
long numRestored
);
onRestoreEnd(TopicPartition topicPartition, String storeName, long totalRestored);
Best log messages to watch out for
INFO Decided on assignment: {...} with followup probing
rebalance
INFO Scheduled a followup probing rebalance for ... ms.
INFO Finished unstable assignment of tasks, a followup probing
rebalance will be triggered.
INFO Decided on assignment: {...} with no followup probing
rebalance
INFO Finished stable assignment of tasks, no followup rebalances
required.
Kafka-Summit.org
A T T E N D S P E A K
COMMUNITY DISCOUNT
25% OFF
Use the discount code
KSA20Meetup
at
kafka-summit.org/
Submit a proposal to speak in Austin
Deadline 17 May 2020
Apply at kafka-summit.org/
John Roesler
vvcephei@apache.org
confluentcommunity.slack.com
kafka.apache.org/contact
https://s.apache.org/perfectly-balanced-streams
Questions?

Más contenido relacionado

La actualidad más candente

了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolution
maclean liu
 
Montreal User Group - Cloning Cassandra
Montreal User Group - Cloning CassandraMontreal User Group - Cloning Cassandra
Montreal User Group - Cloning Cassandra
Adam Hutson
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
Kishor Parkhe
 
Ipc: aidl sexy, not a curse
Ipc: aidl sexy, not a curseIpc: aidl sexy, not a curse
Ipc: aidl sexy, not a curse
Yonatan Levin
 

La actualidad más candente (15)

LOD and Culling Systems That Scale - Unite LA
LOD and Culling Systems That Scale  - Unite LALOD and Culling Systems That Scale  - Unite LA
LOD and Culling Systems That Scale - Unite LA
 
DataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with JavaDataStax NYC Java Meetup: Cassandra with Java
DataStax NYC Java Meetup: Cassandra with Java
 
了解Oracle rac brain split resolution
了解Oracle rac brain split resolution了解Oracle rac brain split resolution
了解Oracle rac brain split resolution
 
DDP - Meteor
DDP - MeteorDDP - Meteor
DDP - Meteor
 
Server startup
Server startupServer startup
Server startup
 
Developing and Deploying Edge Analytics with Redis
Developing and Deploying Edge Analytics with RedisDeveloping and Deploying Edge Analytics with Redis
Developing and Deploying Edge Analytics with Redis
 
Montreal User Group - Cloning Cassandra
Montreal User Group - Cloning CassandraMontreal User Group - Cloning Cassandra
Montreal User Group - Cloning Cassandra
 
Finding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQLFinding an unusual cause of max_user_connections in MySQL
Finding an unusual cause of max_user_connections in MySQL
 
Getting started with replica set in MongoDB
Getting started with replica set in MongoDBGetting started with replica set in MongoDB
Getting started with replica set in MongoDB
 
C*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache CassandraC*ollege Credit: Data Modeling for Apache Cassandra
C*ollege Credit: Data Modeling for Apache Cassandra
 
Ac cuda c_3
Ac cuda c_3Ac cuda c_3
Ac cuda c_3
 
IPC: AIDL is sexy, not a curse
IPC: AIDL is sexy, not a curseIPC: AIDL is sexy, not a curse
IPC: AIDL is sexy, not a curse
 
Ipc: aidl sexy, not a curse
Ipc: aidl sexy, not a curseIpc: aidl sexy, not a curse
Ipc: aidl sexy, not a curse
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
 

Más de confluent

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 

Último (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Kafka Streams: Perfectly Balanced as all things should be