Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

A Bridge over Troubled Water - Implementing
Exactly-Once Semantics and Escaping Kafka
Rebalance Storms
Antonovsky Yulia
© 2023 Akamai
2
© 2023 Akamai
3
© 2023 Akamai
4
About Me
Senior Software Engineer II at Akamai Technologies since 2020
Big Data Engineering experience since 2016
Started career as student intern at SAP Labs Israel in 2007
yulia-antonovsky
© 2023 Akamai
5
Agenda
➢ Introduction
➢ CSI Ingest architecture
➢ Managing Kafka Transactions
➢ Avoid Kafka endless rebalancing
➢ Q&A
© 2023 Akamai
6
About Akamai Technologies
Akamai Technologies is the largest content delivery network (CDN) services
provider in the world that also offers cloud and security services.
In numbers:
● 350K servers across the world
● 8B requests per day
● ~ 30% of the global internet traffic
We power and protect life online
© 2023 Akamai
7
About CSI Group (Cloud Security Intelligence)
Our team is responsible for the ongoing development and maintenance of a platform
designed to collect, analyze, and distill high-quality security intelligence information. We
handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data
events per day.
CSI Cluster
© 2023 Akamai
8
CSI Ingest Architecture
© 2023 Akamai
9
Drill Down
Standart iteration flow:
1. Consume kafka messages
2. Read files from Blob
3. Process the data
4. Write to Blob results
5. Produce kafka messages
© 2023 Akamai
10
Guardians of the Data
Just like the Guardians of the
Galaxy protect the universe, we
are dedicated to protecting the
accuracy of our customers' data
How can we prevent data loss or duplication when application pods are
continuously scaled in and out to handle data traffic?
© 2023 Akamai
11
Managing Kafka Transactions
● We actively manage partition offsets to ensure that we consume data from Kafka exactly
once.
● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data
even in the event of failures or retries.
● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all
writes either succeed or fail together.
© 2023 Akamai
12
KafkaTransactionManager
● It supports seamless processing of transactional data across one or more source and target
topics.
● The component handles the entire process from message consumption to committing or
aborting Kafka transactions.
● To simplify the use of Kafka transactions across all our applications, we developed a
component called KafkaTransactionManager.
© 2023 Akamai
13
KafkaTransactionManager API
© 2023 Akamai
14
KafkaTransactionManager API
kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets
kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics,
returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction
to retrieve additional messages.
kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target
topics. This method can be called multiple times within the same transaction to send additional messages.
kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated
consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure
occurs, the abortTransaction API must be called to ensure that the transaction is rolled back.
kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by
executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka
producer client is closed, and a new one is created.
© 2023 Akamai
15
Kafka Clients’ “Transactional” Configurations
● Kafka consumer client configurations:
○ enable.auto.commit = false
○ isolation.level = read_committed
● Kafka producer client configurations:
○ transactional.id = randomUUID()
○ transaction.timeout.ms - depends on application
© 2023 Akamai
16
Avoid Kafka endless rebalancing
Within a consumer group, Kafka changes the ownership of partition from one consumer to
another at certain events. The process of changing partition ownership across the
consumers is called a rebalance.
© 2023 Akamai
17
What Triggers a Rebalancing?
● The topic partition or partition replica count changes
● Consumer group properties are changed
● Consumer joins or leaves a group
Why it can rebalance forever?
● Networking issues
● System complexity
● Inappropriate configurations
● Scale up/down, k8s moves pods
● Application/pod restarts
● Not all pods start synchronously
© 2023 Akamai
18
Kafka “Rebalance” Configurations
All of the related configurations are Kafka consumer configurations.
session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for
a heartbeat signal from a consumer before removing it from the group.
heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to
the consumer coordinator.
max.poll.interval.ms: This setting determines the maximum delay between invocations of poll()
when using consumer group management.
group.instance.id: A unique identifier provided by the end-user for the consumer instance.
partition.assignment.strategy: A list of class names or class types, ordered by preference, of
supported partition assignment strategies that the client will use to distribute partition ownership.
© 2023 Akamai
19
Partition Assignment Strategy
CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative
rebalancing. Available since version 2.4.
RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a
contiguous range of partitions.
RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion.
StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as
many existing partition assignments as possible.
© 2023 Akamai
20
Kafka Rebalance Listener
ConsumerPartitionAssignor is a high-level interface that allows you to implement your own
custom partition assignment strategy.
● Rebalance listener can't prevent rebalancing but can minimize its impact
● Can only be triggered during polling
● In transactional iterations, it can save processing costs
ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before
and after the partition assignment.
© 2023 Akamai
21
Summary
★ Manage consumed offsets manually when using Kafka's transactional API.
★ Disable auto commit, use read committed mode on consumer client config,
and add transactional.id to producer config.
★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance.
★ Configure appropriate timeouts on consumer client and define
group.instance.id, when possible, to skip Kafka rebalances.
★ Choose a partition assignment strategy carefully, and experiment with
different strategies to determine the best fit.
© 2023 Akamai
22
Q&A
Thank you:)
Feel free to reach me out yulia-antonovsky
1 de 22

Recomendados

How Spark is Making an Impact at Goldman Sachs by Vincent Saulys por
How Spark is Making an Impact at Goldman Sachs by Vincent SaulysHow Spark is Making an Impact at Goldman Sachs by Vincent Saulys
How Spark is Making an Impact at Goldman Sachs by Vincent SaulysSpark Summit
5K vistas14 diapositivas
Trend Micro Big Data Platform and Apache Bigtop por
Trend Micro Big Data Platform and Apache BigtopTrend Micro Big Data Platform and Apache Bigtop
Trend Micro Big Data Platform and Apache BigtopEvans Ye
8.8K vistas77 diapositivas
Kafka Streams: What it is, and how to use it? por
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?confluent
2K vistas34 diapositivas
Workday's Next Generation Private Cloud por
Workday's Next Generation Private CloudWorkday's Next Generation Private Cloud
Workday's Next Generation Private CloudSilvano Buback
273 vistas26 diapositivas
Kappa vs Lambda Architectures and Technology Comparison por
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
1.2K vistas26 diapositivas
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |... por
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...HostedbyConfluent
850 vistas50 diapositivas

Más contenido relacionado

La actualidad más candente

Kafka Streams State Stores Being Persistent por
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistentconfluent
804 vistas43 diapositivas
Kafka Tutorial: Advanced Producers por
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersJean-Paul Azar
203.3K vistas152 diapositivas
FD.io Vector Packet Processing (VPP) por
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)Kirill Tsym
8.2K vistas35 diapositivas
Introducing Kafka's Streams API por
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams APIconfluent
4.9K vistas118 diapositivas
How to tune Kafka® for production por
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for productionconfluent
1.5K vistas57 diapositivas
Apache Kafka Best Practices por
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best PracticesDataWorks Summit/Hadoop Summit
66K vistas35 diapositivas

La actualidad más candente(20)

Kafka Streams State Stores Being Persistent por confluent
Kafka Streams State Stores Being PersistentKafka Streams State Stores Being Persistent
Kafka Streams State Stores Being Persistent
confluent804 vistas
Kafka Tutorial: Advanced Producers por Jean-Paul Azar
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
Jean-Paul Azar203.3K vistas
FD.io Vector Packet Processing (VPP) por Kirill Tsym
FD.io Vector Packet Processing (VPP)FD.io Vector Packet Processing (VPP)
FD.io Vector Packet Processing (VPP)
Kirill Tsym8.2K vistas
Introducing Kafka's Streams API por confluent
Introducing Kafka's Streams APIIntroducing Kafka's Streams API
Introducing Kafka's Streams API
confluent4.9K vistas
How to tune Kafka® for production por confluent
How to tune Kafka® for productionHow to tune Kafka® for production
How to tune Kafka® for production
confluent1.5K vistas
Tips & Tricks for Apache Kafka® por confluent
Tips & Tricks for Apache Kafka®Tips & Tricks for Apache Kafka®
Tips & Tricks for Apache Kafka®
confluent1.6K vistas
Handle Large Messages In Apache Kafka por Jiangjie Qin
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
Jiangjie Qin46.7K vistas
ksqlDB - Stream Processing simplified! por Guido Schmutz
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
Guido Schmutz1.1K vistas
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp por HostedbyConfluent
Running Kafka as a Native Binary Using GraalVM with Ozan GünalpRunning Kafka as a Native Binary Using GraalVM with Ozan Günalp
Running Kafka as a Native Binary Using GraalVM with Ozan Günalp
HostedbyConfluent403 vistas
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF... por confluent
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
Eventing Things - A Netflix Original! (Nitin Sharma, Netflix) Kafka Summit SF...
confluent5.5K vistas
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ... por HostedbyConfluent
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
HostedbyConfluent1.2K vistas
Introduction to Kafka Cruise Control por Jiangjie Qin
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
Jiangjie Qin25.8K vistas
Kafka Connect and Streams (Concepts, Architecture, Features) por Kai Wähner
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
Kai Wähner1.8K vistas
Data integration with Apache Kafka por confluent
Data integration with Apache KafkaData integration with Apache Kafka
Data integration with Apache Kafka
confluent6K vistas
RDB開発者のためのApache Cassandra データモデリング入門 por Yuki Morishita
RDB開発者のためのApache Cassandra データモデリング入門RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門
Yuki Morishita7.4K vistas
A visual introduction to Apache Kafka por Paul Brebner
A visual introduction to Apache KafkaA visual introduction to Apache Kafka
A visual introduction to Apache Kafka
Paul Brebner4.7K vistas

Similar a Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ... por
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
5.7K vistas49 diapositivas
Cluster_Performance_Apache_Kafak_vs_RabbitMQ por
Cluster_Performance_Apache_Kafak_vs_RabbitMQCluster_Performance_Apache_Kafak_vs_RabbitMQ
Cluster_Performance_Apache_Kafak_vs_RabbitMQShameera Rathnayaka
322 vistas11 diapositivas
Kafka-and-event-driven-architecture-OGYatra20.ppt por
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.pptInam Bukhary
12 vistas60 diapositivas
Insta clustr seattle kafka meetup presentation bb por
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bbNitin Kumar
256 vistas22 diapositivas
A Primer Towards Running Kafka on Top of Kubernetes.pdf por
A Primer Towards Running Kafka on Top of Kubernetes.pdfA Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdfAvinashUpadhyaya3
8 vistas17 diapositivas
Kafka aws por
Kafka awsKafka aws
Kafka awsAriel Moskovich
4.6K vistas11 diapositivas

Similar a Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky(20)

Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ... por confluent
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
confluent5.7K vistas
Kafka-and-event-driven-architecture-OGYatra20.ppt por Inam Bukhary
Kafka-and-event-driven-architecture-OGYatra20.pptKafka-and-event-driven-architecture-OGYatra20.ppt
Kafka-and-event-driven-architecture-OGYatra20.ppt
Inam Bukhary12 vistas
Insta clustr seattle kafka meetup presentation bb por Nitin Kumar
Insta clustr seattle kafka meetup presentation   bbInsta clustr seattle kafka meetup presentation   bb
Insta clustr seattle kafka meetup presentation bb
Nitin Kumar256 vistas
A Primer Towards Running Kafka on Top of Kubernetes.pdf por AvinashUpadhyaya3
A Primer Towards Running Kafka on Top of Kubernetes.pdfA Primer Towards Running Kafka on Top of Kubernetes.pdf
A Primer Towards Running Kafka on Top of Kubernetes.pdf
Building Data Streaming Platforms using OpenShift and Kafka por Nenad Bogojevic
Building Data Streaming Platforms using OpenShift and KafkaBuilding Data Streaming Platforms using OpenShift and Kafka
Building Data Streaming Platforms using OpenShift and Kafka
Nenad Bogojevic389 vistas
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes por Kai Wähner
Confluent Operator as Cloud-Native Kafka Operator for KubernetesConfluent Operator as Cloud-Native Kafka Operator for Kubernetes
Confluent Operator as Cloud-Native Kafka Operator for Kubernetes
Kai Wähner6.1K vistas
Redpanda and ClickHouse por Altinity Ltd
Redpanda and ClickHouseRedpanda and ClickHouse
Redpanda and ClickHouse
Altinity Ltd781 vistas
Kafka and event driven architecture -apacoug20 por Vinay Kumar
Kafka and event driven architecture -apacoug20Kafka and event driven architecture -apacoug20
Kafka and event driven architecture -apacoug20
Vinay Kumar96 vistas
Kafka and event driven architecture -og yatra20 por Vinay Kumar
Kafka and event driven architecture -og yatra20Kafka and event driven architecture -og yatra20
Kafka and event driven architecture -og yatra20
Vinay Kumar230 vistas
Apache Kafka: Next Generation Distributed Messaging System por Edureka!
Apache Kafka: Next Generation Distributed Messaging SystemApache Kafka: Next Generation Distributed Messaging System
Apache Kafka: Next Generation Distributed Messaging System
Edureka!1.1K vistas
Leveraging the power of the unbundled database por Alex Silva
Leveraging the power of the unbundled databaseLeveraging the power of the unbundled database
Leveraging the power of the unbundled database
Alex Silva151 vistas
MuleSoft Meetup Singapore #8 March 2021 por Julian Douch
MuleSoft Meetup Singapore #8 March 2021MuleSoft Meetup Singapore #8 March 2021
MuleSoft Meetup Singapore #8 March 2021
Julian Douch366 vistas
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre... por HostedbyConfluent
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
Maximizing Real-Time Data Processing with Apache Kafka and InfluxDB: A Compre...
HostedbyConfluent45 vistas
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre... por HostedbyConfluent
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
Kubernetes connectivity to Cloud Native Kafka | Evan Shortiss and Hugo Guerre...
HostedbyConfluent330 vistas
Event Driven Architectures with Apache Kafka por Matt Masuda
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
Matt Masuda93 vistas
Comparison of Current Service Mesh Architectures por Mirantis
Comparison of Current Service Mesh ArchitecturesComparison of Current Service Mesh Architectures
Comparison of Current Service Mesh Architectures
Mirantis1.6K vistas
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C... por Capgemini
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...
Transform into a Cloud-First Business with SAP on AWS and Capgemini’s Cloud C...
Capgemini3.8K vistas

Más de HostedbyConfluent

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams por
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsHostedbyConfluent
88 vistas26 diapositivas
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ... por
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...HostedbyConfluent
52 vistas84 diapositivas
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ... por
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...HostedbyConfluent
79 vistas97 diapositivas
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern... por
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...HostedbyConfluent
64 vistas15 diapositivas
Rule Based Asset Management Workflow Automation at Netflix por
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at NetflixHostedbyConfluent
41 vistas56 diapositivas
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML... por
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...HostedbyConfluent
71 vistas32 diapositivas

Más de HostedbyConfluent(20)

Build Real-time Machine Learning Apps on Generative AI with Kafka Streams por HostedbyConfluent
Build Real-time Machine Learning Apps on Generative AI with Kafka StreamsBuild Real-time Machine Learning Apps on Generative AI with Kafka Streams
Build Real-time Machine Learning Apps on Generative AI with Kafka Streams
HostedbyConfluent88 vistas
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ... por HostedbyConfluent
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
When Only the Last Writer Wins We All Lose: Active-Active Geo-Replication in ...
HostedbyConfluent52 vistas
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ... por HostedbyConfluent
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
Apache Kafka's Next-Gen Rebalance Protocol: Towards More Stable and Scalable ...
HostedbyConfluent79 vistas
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern... por HostedbyConfluent
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
Using Kafka at Scale - A Case Study of Micro Services Data Pipelines at Evern...
HostedbyConfluent64 vistas
Rule Based Asset Management Workflow Automation at Netflix por HostedbyConfluent
Rule Based Asset Management Workflow Automation at NetflixRule Based Asset Management Workflow Automation at Netflix
Rule Based Asset Management Workflow Automation at Netflix
HostedbyConfluent41 vistas
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML... por HostedbyConfluent
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
Scalable E-Commerce Data Pipelines with Kafka: Real-Time Analytics, Batch, ML...
HostedbyConfluent71 vistas
Indeed Flex: The Story of a Revolutionary Recruitment Platform por HostedbyConfluent
Indeed Flex: The Story of a Revolutionary Recruitment PlatformIndeed Flex: The Story of a Revolutionary Recruitment Platform
Indeed Flex: The Story of a Revolutionary Recruitment Platform
HostedbyConfluent40 vistas
Forecasting Kafka Lag Issues with Machine Learning por HostedbyConfluent
Forecasting Kafka Lag Issues with Machine LearningForecasting Kafka Lag Issues with Machine Learning
Forecasting Kafka Lag Issues with Machine Learning
HostedbyConfluent31 vistas
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U... por HostedbyConfluent
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
Getting Under the Hood of Kafka Streams: Optimizing Storage Engines to Tune U...
HostedbyConfluent42 vistas
Accelerating Path to Production for Generative AI-powered Applications por HostedbyConfluent
Accelerating Path to Production for Generative AI-powered ApplicationsAccelerating Path to Production for Generative AI-powered Applications
Accelerating Path to Production for Generative AI-powered Applications
HostedbyConfluent74 vistas
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited... por HostedbyConfluent
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
Optimize Costs and Scale Your Streaming Applications with Virtually Unlimited...
HostedbyConfluent42 vistas
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad... por HostedbyConfluent
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
Don’t Let Degradation Bring You Down: Automatically Detect & Remediate Degrad...
HostedbyConfluent58 vistas
Go Big or Go Home: Approaching Kafka Replication at Scale por HostedbyConfluent
Go Big or Go Home: Approaching Kafka Replication at ScaleGo Big or Go Home: Approaching Kafka Replication at Scale
Go Big or Go Home: Approaching Kafka Replication at Scale
HostedbyConfluent39 vistas
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2 por HostedbyConfluent
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
What's in store? Part Deux; Creating Custom Queries with Kafka Streams IQv2
HostedbyConfluent37 vistas
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid por HostedbyConfluent
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and DruidA Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
A Trifecta of Real-Time Applications: Apache Kafka, Flink, and Druid
HostedbyConfluent92 vistas
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python por HostedbyConfluent
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark PythonFrom Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
From Raw Data to an Interactive Data App in an Hour: Powered by Snowpark Python
HostedbyConfluent86 vistas
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite... por HostedbyConfluent
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
Beyond Monoliths: Thrivent’s Lessons in Building a Modern Integration Archite...
HostedbyConfluent66 vistas
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K... por HostedbyConfluent
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
Exactly-Once Semantics Revisited: Distributed Transactions across Flink and K...
HostedbyConfluent82 vistas
From the Battlefield: Squeezing the Most From Your Kafka Infrastructure por HostedbyConfluent
From the Battlefield: Squeezing the Most From Your Kafka InfrastructureFrom the Battlefield: Squeezing the Most From Your Kafka Infrastructure
From the Battlefield: Squeezing the Most From Your Kafka Infrastructure
HostedbyConfluent55 vistas

Último

DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... por
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...ShapeBlue
139 vistas29 diapositivas
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue por
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueShapeBlue
135 vistas13 diapositivas
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... por
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...ShapeBlue
180 vistas18 diapositivas
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc
170 vistas29 diapositivas
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...ShapeBlue
161 vistas13 diapositivas
The Power of Heat Decarbonisation Plans in the Built Environment por
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built EnvironmentIES VE
79 vistas20 diapositivas

Último(20)

DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... por ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue139 vistas
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue por ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlueCloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
CloudStack Managed User Data and Demo - Harikrishna Patnala - ShapeBlue
ShapeBlue135 vistas
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha... por ShapeBlue
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
Mitigating Common CloudStack Instance Deployment Failures - Jithin Raju - Sha...
ShapeBlue180 vistas
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f... por TrustArc
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc Webinar - Managing Online Tracking Technology Vendors_ A Checklist f...
TrustArc170 vistas
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... por ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue161 vistas
The Power of Heat Decarbonisation Plans in the Built Environment por IES VE
The Power of Heat Decarbonisation Plans in the Built EnvironmentThe Power of Heat Decarbonisation Plans in the Built Environment
The Power of Heat Decarbonisation Plans in the Built Environment
IES VE79 vistas
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online por ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue221 vistas
"Surviving highload with Node.js", Andrii Shumada por Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays56 vistas
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And... por ShapeBlue
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
Enabling DPU Hardware Accelerators in XCP-ng Cloud Platform Environment - And...
ShapeBlue106 vistas
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue por ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
2FA and OAuth2 in CloudStack - Andrija Panić - ShapeBlue
ShapeBlue147 vistas
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool por ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue123 vistas
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or... por ShapeBlue
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
Zero to Cloud Hero: Crafting a Private Cloud from Scratch with XCP-ng, Xen Or...
ShapeBlue198 vistas
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue por ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlueElevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
Elevating Privacy and Security in CloudStack - Boris Stoyanov - ShapeBlue
ShapeBlue222 vistas
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas... por Bernd Ruecker
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
iSAQB Software Architecture Gathering 2023: How Process Orchestration Increas...
Bernd Ruecker54 vistas
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue por ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlueWhat’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
What’s New in CloudStack 4.19 - Abhishek Kumar - ShapeBlue
ShapeBlue263 vistas
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ... por ShapeBlue
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
Backup and Disaster Recovery with CloudStack and StorPool - Workshop - Venko ...
ShapeBlue184 vistas
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T por ShapeBlue
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&TCloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
CloudStack and GitOps at Enterprise Scale - Alex Dometrius, Rene Glover - AT&T
ShapeBlue152 vistas
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ... por ShapeBlue
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
Live Demo Showcase: Unveiling Dell PowerFlex’s IaaS Capabilities with Apache ...
ShapeBlue126 vistas
Business Analyst Series 2023 - Week 4 Session 8 por DianaGray10
Business Analyst Series 2023 -  Week 4 Session 8Business Analyst Series 2023 -  Week 4 Session 8
Business Analyst Series 2023 - Week 4 Session 8
DianaGray10123 vistas

Implementing Exactly-once Delivery and Escaping Kafka Rebalance Storms with Yulia Antonovsky

  • 1. A Bridge over Troubled Water - Implementing Exactly-Once Semantics and Escaping Kafka Rebalance Storms Antonovsky Yulia
  • 4. © 2023 Akamai 4 About Me Senior Software Engineer II at Akamai Technologies since 2020 Big Data Engineering experience since 2016 Started career as student intern at SAP Labs Israel in 2007 yulia-antonovsky
  • 5. © 2023 Akamai 5 Agenda ➢ Introduction ➢ CSI Ingest architecture ➢ Managing Kafka Transactions ➢ Avoid Kafka endless rebalancing ➢ Q&A
  • 6. © 2023 Akamai 6 About Akamai Technologies Akamai Technologies is the largest content delivery network (CDN) services provider in the world that also offers cloud and security services. In numbers: ● 350K servers across the world ● 8B requests per day ● ~ 30% of the global internet traffic We power and protect life online
  • 7. © 2023 Akamai 7 About CSI Group (Cloud Security Intelligence) Our team is responsible for the ongoing development and maintenance of a platform designed to collect, analyze, and distill high-quality security intelligence information. We handle a daily traffic of about 10GB/s, processing approximately 150 billion raw data events per day. CSI Cluster
  • 8. © 2023 Akamai 8 CSI Ingest Architecture
  • 9. © 2023 Akamai 9 Drill Down Standart iteration flow: 1. Consume kafka messages 2. Read files from Blob 3. Process the data 4. Write to Blob results 5. Produce kafka messages
  • 10. © 2023 Akamai 10 Guardians of the Data Just like the Guardians of the Galaxy protect the universe, we are dedicated to protecting the accuracy of our customers' data How can we prevent data loss or duplication when application pods are continuously scaled in and out to handle data traffic?
  • 11. © 2023 Akamai 11 Managing Kafka Transactions ● We actively manage partition offsets to ensure that we consume data from Kafka exactly once. ● We rely on Kafka Transactional API support of idempotent writes in preventing duplicate data even in the event of failures or retries. ● We leverage Kafka's Transactional API to write data to multiple Kafka topics, ensuring that all writes either succeed or fail together.
  • 12. © 2023 Akamai 12 KafkaTransactionManager ● It supports seamless processing of transactional data across one or more source and target topics. ● The component handles the entire process from message consumption to committing or aborting Kafka transactions. ● To simplify the use of Kafka transactions across all our applications, we developed a component called KafkaTransactionManager.
  • 14. © 2023 Akamai 14 KafkaTransactionManager API kafkaTransactionManager.beginTransaction() starts new transaction and reset offsets kafkaTransactionManager.consumeRecords(pollTimeout) executes one poll from subscribed Kafka topics, returns consumed messages, and updates offsets if needed. It can be called multiple times during the same transaction to retrieve additional messages. kafkaTransactionManager.produceRecord(topics, key, value) produces a record on one or more target topics. This method can be called multiple times within the same transaction to send additional messages. kafkaTransactionManager.commitTransaction() this API finalizes the current transaction, sends the updated consumed offsets to the consumer group, and commits both consumed and produced messages on all topics. If a failure occurs, the abortTransaction API must be called to ensure that the transaction is rolled back. kafkaTransactionManager.abortTransaction() closes the current transaction, resets consumed offsets by executing the seek API for all assigned TopicPartitions on the Kafka consumer client. If abort transaction fails, the Kafka producer client is closed, and a new one is created.
  • 15. © 2023 Akamai 15 Kafka Clients’ “Transactional” Configurations ● Kafka consumer client configurations: ○ enable.auto.commit = false ○ isolation.level = read_committed ● Kafka producer client configurations: ○ transactional.id = randomUUID() ○ transaction.timeout.ms - depends on application
  • 16. © 2023 Akamai 16 Avoid Kafka endless rebalancing Within a consumer group, Kafka changes the ownership of partition from one consumer to another at certain events. The process of changing partition ownership across the consumers is called a rebalance.
  • 17. © 2023 Akamai 17 What Triggers a Rebalancing? ● The topic partition or partition replica count changes ● Consumer group properties are changed ● Consumer joins or leaves a group Why it can rebalance forever? ● Networking issues ● System complexity ● Inappropriate configurations ● Scale up/down, k8s moves pods ● Application/pod restarts ● Not all pods start synchronously
  • 18. © 2023 Akamai 18 Kafka “Rebalance” Configurations All of the related configurations are Kafka consumer configurations. session.timeout.ms: specifies the maximum time duration that the consumer coordinator will wait for a heartbeat signal from a consumer before removing it from the group. heartbeat.interval.ms: This configuration specifies the expected time between heartbeats sent to the consumer coordinator. max.poll.interval.ms: This setting determines the maximum delay between invocations of poll() when using consumer group management. group.instance.id: A unique identifier provided by the end-user for the consumer instance. partition.assignment.strategy: A list of class names or class types, ordered by preference, of supported partition assignment strategies that the client will use to distribute partition ownership.
  • 19. © 2023 Akamai 19 Partition Assignment Strategy CooperativeStickyAssignor - Follows the same StickyAssignor logic, but allows for cooperative rebalancing. Available since version 2.4. RangeAssignor - Assigns partitions on a per-topic basis, where each consumer is assigned a contiguous range of partitions. RoundRobinAssignor - Assigns partitions to consumers in a round-robin fashion. StickyAssignor - Guarantees an assignment that is maximally balanced while preserving as many existing partition assignments as possible.
  • 20. © 2023 Akamai 20 Kafka Rebalance Listener ConsumerPartitionAssignor is a high-level interface that allows you to implement your own custom partition assignment strategy. ● Rebalance listener can't prevent rebalancing but can minimize its impact ● Can only be triggered during polling ● In transactional iterations, it can save processing costs ConsumerRebalanceListener is a low-level interface that allows you to receive notifications before and after the partition assignment.
  • 21. © 2023 Akamai 21 Summary ★ Manage consumed offsets manually when using Kafka's transactional API. ★ Disable auto commit, use read committed mode on consumer client config, and add transactional.id to producer config. ★ Use ConsumerRebalanceListener to minimize the impact of Kafka rebalance. ★ Configure appropriate timeouts on consumer client and define group.instance.id, when possible, to skip Kafka rebalances. ★ Choose a partition assignment strategy carefully, and experiment with different strategies to determine the best fit.
  • 22. © 2023 Akamai 22 Q&A Thank you:) Feel free to reach me out yulia-antonovsky