SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Exactly-once Stream Processing
Matthias J. Sax, Software Engineer
Apache Kafka committer and PMC member
matthias@confluent.io | @MatthiasJSax
@MatthiasJSax
Exactly-once: Delivery vs Semantics
Exactly-once Delivery
• Academic distributed system problem:
• Can we send a message an ensure it’s delivered to the receiver exactly once?
• Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault)
• Provable not possible!
Deliver != Semantics
2
@MatthiasJSax
Take input record, process it, update result, and record progress.
No Error. No Problem.
What is Exactly-once Semantics About?
3
@MatthiasJSax
What happens if something goes wrong?
Error during read, processing, write, or record progress.
We retry!
But is it safe?
What is Exactly-once Semantics About?
4
@MatthiasJSax
5
Are retries safe? With exactly-once, yes!
Exactly-once is about masking errors via safe retries.
The result of an exactly-once retry,
is semantically the same as if no error had occurred.
What is Exactly-once Semantics About?
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
6
@MatthiasJSax
There is no* Write-only Exactly-once!
(*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics:
• Upstream write-only producer!
• Downstream read-only consumer!
8
@MatthiasJSax
There is NO Read-only Exactly-once!
@MatthiasJSax
Common Misconceptions
Kafka as an intermediate
• Pattern: Produce -> Kafka -> Consume
• No exactly-once semantics.
Kafka for processing
• Pattern: Consume -> Process -> Produce
• Built-in exactly-once via Kafka Streams (or DIY).
• Also possible with external source/target system!
10
@MatthiasJSax
Let’s Break it Down
Steps in a Processing Pipeline
• Read input:
• Does not modify state; re-reading is always safe.
• Process data:
• Stateless re-processing (filter, map etc) is always safe.
• Stateful re-processing: need to roll-back state before we can retry.
• Update result:
• Need to “retract” (partial) results.
• Or: rely on idempotent updates. (There are dragons!)
• Record progress:
• Modifies state in the source system (or does it?)
11
@MatthiasJSax
Exactly-once
==
At-least-once + Idempotency
It depends…
@MatthiasJSax
Idempotent Updates (Internal State)?
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
13
Cnt: 73 Cnt: 74
73+1
Cnt: 74 Cnt: 75
74+1
Retry: L
@MatthiasJSax
Idempotent Updates? Maybe…
Stateful processing
Stateful processing is usually a “read and modify” pattern, e.g., increase a counter.
• It’s context sensitive!
• Idempotency requires context agnostic state modifications, e.g., set a new address.
14
City: LA City: NY
Set “NY”
City: NY City: NY
Set “NY”
Retry: J
@MatthiasJSax
Idempotent Updates (External State)
The issue of time travel…
15
City: LA City: NY
Set “NY”
City: BO
Set “BO”
Read: NY Read: BO
Read: LA
@MatthiasJSax
Idempotent Updates (External State)
Retrying a sequence of updates:
16
City: BO City: NY
Set “NY”
City: BO
Set “BO”
Read: NY L
Read: BO J Read: BO J
@MatthiasJSax
Idempotency is not enough.
All State Changes must be Atomic!
@MatthiasJSax
All State Changes must be Atomic
What is ”state”?
• Internal processing state.
• External state, i.e., result state.
• External state, i.e., source progress.
Transactions for the rescue!
Do we want to (can we) do a cross-system distributed transaction?
Good news: we don’t have to…
18
@MatthiasJSax
Exactly-Once with Kafka and External Systems
19
Example: Downstream target RDBMS
(Async) offset update
(not part of the transaction)
Atomic write via
ACID transaction
State
Result
Offsets
@MatthiasJSax
Exactly-Once with Kafka and External Systems
20
Example: Downstream target RDBMS
State
Result
Offsets
Reset offsets
and retry
@MatthiasJSax
Kafka Connect (Part 1)
Exactly-once Sink
• Has “nothing” to do with Kafka:
• Kafka provides source system progress tracking via offsets.
• Connect provide API to fetch start offsets from target system.
• Depends on targe system properties / features.
• Each individual connector must implement it.
21
@MatthiasJSax
How does Kafka Tackle Exactly-once?
22
Kafka Transactions
Multi-partition/multi-topic atomic write:
0 0
0 0 0
1 1 1 1
2
2
2
3
4
3
1
2
t
1
-
p
0
t
1
-
p
1
t
2
-
p
0
t
2
-
p
1
t
2
-
p
2
2
3
@MatthiasJSax
How does Kafka Tackle Exactly-once?
23
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Exactly-Once with Kafka
24
Kafka as Sink
Requirement: ability to track source system progress.
result
state (via changelogs)
source progress (via custom metadata topic)
@MatthiasJSax
Kafka Connect (Part 2)
•
•
•
•
•
Exactly-once Source
• “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors”
• Tomorrow: 2pm
• Chris Egerton, Aiven
• KIP-618 (Apache Kafka 3.3):
• https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors
25
@MatthiasJSax
Kafka Streams
26
Kafka Transactions
Atomic read-process-write pattern:
@MatthiasJSax
Kafka Streams
27
__consumer_offsets
changelogs
result
Kafka Transactions
Multi-partition/multi-topic atomic write:
@MatthiasJSax
Kafka Streams
28
Kafka Transactions
Multi-partition/multi-topic atomic write:
producer.beginTransaction();
// state updates (changelogs + result)
producer.send(…);
producer.send(…);
…
producer.addOffsetsToTransaction(…);
producer.commitTransaction(); // or .abortTransaction()
@MatthiasJSax
Kafka Streams
Single vs Multi-cluster
Kafka Streams (current) only works against a single broker cluster:
• Does not really matter. We still rely on the brokers as target system.
• Need source offsets but commit them via the producer.
• Single broker cluster only avoids “dual” commit of source offsets.
Supporting cross-cluster EOS with Kafka Streams is possible:
• Add custom metadata topic to targe cluster.
• Replace addOffsetsToTransaction() with send().
• Fetch consumer offset manually from metadata topic.
• Issues:
• EOS v2 implementation (producer per thread) not possible.
• Limited to single target cluster.
29
@MatthiasJSax
The Big Challenge
Error Handling in a (Distributed) Application
Kafka transaction allow to fence “zombie” producers.
Any EOS target system needs to support something similar (or rely on idempotency if possible).
Kafka Connect Sink Connectors:
• Idempotency or sink system fencing required—Connect framework cannot help at all.
Kafka Connect Source Connectors:
• Relies on producer fencing.
• Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation).
Kafka Streams:
• Relies on producer fencing (EOS v1) or consumer fencing (EOS v2).
• EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster.
30
@MatthiasJSax
What to do in Practice?
Publishing with producer-only app?
The important thing is to figure out where to resume on restart:
• Is there any “source progress” information you can store?
• You need to add a consumer to your app!
• On app restart:
• Initialize producer to fence potential zombie and to force any pending TX to complete.
• Use consumer (in read-committed mode) to inspect the target cluster’s data.
Reading with consumer-only app?
• If there is no target data system, only idempotency can help.
• With no target data system, everything is basically a side-effect.
31
@MatthiasJSax
Exactly-once Key Takeaways
(A) no producer-only EOS
(B) no consumer-only EOS
(C) read-process-write pattern
(1) need ability to track source system read progress
(2) require target system atomic write (plus fencing)
(3) source system progress is recorded in target system
Kafka built-in support via transactions + Zero coding with Kafka Streams
✅
@MatthiasJSax

Más contenido relacionado

Similar a Exactly-once Stream Processing Done Right with Matthias J Sax

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applicationsDing Li
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with storesYoni Farin
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaApurva Mehta
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafkaconfluent
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverstonbcoverston
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemFlorent Ramiere
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformApache Apex
 
Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Sandeep Dobariya
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Databricks
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And DesignYaroslav Tkachenko
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 

Similar a Exactly-once Stream Processing Done Right with Matthias J Sax (20)

Software architecture for data applications
Software architecture for data applicationsSoftware architecture for data applications
Software architecture for data applications
 
Kafka streams decoupling with stores
Kafka streams decoupling with storesKafka streams decoupling with stores
Kafka streams decoupling with stores
 
CAP: Scaling, HA
CAP: Scaling, HACAP: Scaling, HA
CAP: Scaling, HA
 
Introducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache KafkaIntroducing Exactly Once Semantics To Apache Kafka
Introducing Exactly Once Semantics To Apache Kafka
 
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache KafkaKafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
Kafka Summit NYC 2017 - Introducing Exactly Once Semantics in Apache Kafka
 
Open west 2015 talk ben coverston
Open west 2015 talk ben coverstonOpen west 2015 talk ben coverston
Open west 2015 talk ben coverston
 
kafka for db as postgres
kafka for db as postgreskafka for db as postgres
kafka for db as postgres
 
JHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka EcosystemJHipster conf 2019 - Kafka Ecosystem
JHipster conf 2019 - Kafka Ecosystem
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Introduction to Go
Introduction to GoIntroduction to Go
Introduction to Go
 
Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610Hyperbatch danielpeter-161117095610
Hyperbatch danielpeter-161117095610
 
HyperBatch
HyperBatchHyperBatch
HyperBatch
 
Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...Building Continuous Application with Structured Streaming and Real-Time Data ...
Building Continuous Application with Structured Streaming and Real-Time Data ...
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Akka Microservices Architecture And Design
Akka Microservices Architecture And DesignAkka Microservices Architecture And Design
Akka Microservices Architecture And Design
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 

Más de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonHostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolHostedbyConfluent
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesHostedbyConfluent
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaHostedbyConfluent
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonHostedbyConfluent
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonHostedbyConfluent
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyHostedbyConfluent
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...HostedbyConfluent
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...HostedbyConfluent
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersHostedbyConfluent
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformHostedbyConfluent
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubHostedbyConfluent
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonHostedbyConfluent
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLHostedbyConfluent
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceHostedbyConfluent
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondHostedbyConfluent
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsHostedbyConfluent
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemHostedbyConfluent
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksHostedbyConfluent
 

Más de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

Exactly-once Stream Processing Done Right with Matthias J Sax

  • 1. Exactly-once Stream Processing Matthias J. Sax, Software Engineer Apache Kafka committer and PMC member matthias@confluent.io | @MatthiasJSax
  • 2. @MatthiasJSax Exactly-once: Delivery vs Semantics Exactly-once Delivery • Academic distributed system problem: • Can we send a message an ensure it’s delivered to the receiver exactly once? • Two Generals’ Problem (https://en.wikipedia.org/wiki/Byzantine_fault) • Provable not possible! Deliver != Semantics 2
  • 3. @MatthiasJSax Take input record, process it, update result, and record progress. No Error. No Problem. What is Exactly-once Semantics About? 3
  • 4. @MatthiasJSax What happens if something goes wrong? Error during read, processing, write, or record progress. We retry! But is it safe? What is Exactly-once Semantics About? 4
  • 5. @MatthiasJSax 5 Are retries safe? With exactly-once, yes! Exactly-once is about masking errors via safe retries. The result of an exactly-once retry, is semantically the same as if no error had occurred. What is Exactly-once Semantics About?
  • 6. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! 6
  • 7. @MatthiasJSax There is no* Write-only Exactly-once! (*) Write-only exactly-once is possible for idempotent updates (but Kafka is append-only…)
  • 8. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics: • Upstream write-only producer! • Downstream read-only consumer! 8
  • 9. @MatthiasJSax There is NO Read-only Exactly-once!
  • 10. @MatthiasJSax Common Misconceptions Kafka as an intermediate • Pattern: Produce -> Kafka -> Consume • No exactly-once semantics. Kafka for processing • Pattern: Consume -> Process -> Produce • Built-in exactly-once via Kafka Streams (or DIY). • Also possible with external source/target system! 10
  • 11. @MatthiasJSax Let’s Break it Down Steps in a Processing Pipeline • Read input: • Does not modify state; re-reading is always safe. • Process data: • Stateless re-processing (filter, map etc) is always safe. • Stateful re-processing: need to roll-back state before we can retry. • Update result: • Need to “retract” (partial) results. • Or: rely on idempotent updates. (There are dragons!) • Record progress: • Modifies state in the source system (or does it?) 11
  • 13. @MatthiasJSax Idempotent Updates (Internal State)? Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! 13 Cnt: 73 Cnt: 74 73+1 Cnt: 74 Cnt: 75 74+1 Retry: L
  • 14. @MatthiasJSax Idempotent Updates? Maybe… Stateful processing Stateful processing is usually a “read and modify” pattern, e.g., increase a counter. • It’s context sensitive! • Idempotency requires context agnostic state modifications, e.g., set a new address. 14 City: LA City: NY Set “NY” City: NY City: NY Set “NY” Retry: J
  • 15. @MatthiasJSax Idempotent Updates (External State) The issue of time travel… 15 City: LA City: NY Set “NY” City: BO Set “BO” Read: NY Read: BO Read: LA
  • 16. @MatthiasJSax Idempotent Updates (External State) Retrying a sequence of updates: 16 City: BO City: NY Set “NY” City: BO Set “BO” Read: NY L Read: BO J Read: BO J
  • 17. @MatthiasJSax Idempotency is not enough. All State Changes must be Atomic!
  • 18. @MatthiasJSax All State Changes must be Atomic What is ”state”? • Internal processing state. • External state, i.e., result state. • External state, i.e., source progress. Transactions for the rescue! Do we want to (can we) do a cross-system distributed transaction? Good news: we don’t have to… 18
  • 19. @MatthiasJSax Exactly-Once with Kafka and External Systems 19 Example: Downstream target RDBMS (Async) offset update (not part of the transaction) Atomic write via ACID transaction State Result Offsets
  • 20. @MatthiasJSax Exactly-Once with Kafka and External Systems 20 Example: Downstream target RDBMS State Result Offsets Reset offsets and retry
  • 21. @MatthiasJSax Kafka Connect (Part 1) Exactly-once Sink • Has “nothing” to do with Kafka: • Kafka provides source system progress tracking via offsets. • Connect provide API to fetch start offsets from target system. • Depends on targe system properties / features. • Each individual connector must implement it. 21
  • 22. @MatthiasJSax How does Kafka Tackle Exactly-once? 22 Kafka Transactions Multi-partition/multi-topic atomic write: 0 0 0 0 0 1 1 1 1 2 2 2 3 4 3 1 2 t 1 - p 0 t 1 - p 1 t 2 - p 0 t 2 - p 1 t 2 - p 2 2 3
  • 23. @MatthiasJSax How does Kafka Tackle Exactly-once? 23 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.commitTransaction(); // or .abortTransaction()
  • 24. @MatthiasJSax Exactly-Once with Kafka 24 Kafka as Sink Requirement: ability to track source system progress. result state (via changelogs) source progress (via custom metadata topic)
  • 25. @MatthiasJSax Kafka Connect (Part 2) • • • • • Exactly-once Source • “Exactly-once, Again: Adding EOS Support for Kafka Connect Source Connectors” • Tomorrow: 2pm • Chris Egerton, Aiven • KIP-618 (Apache Kafka 3.3): • https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors 25
  • 28. @MatthiasJSax Kafka Streams 28 Kafka Transactions Multi-partition/multi-topic atomic write: producer.beginTransaction(); // state updates (changelogs + result) producer.send(…); producer.send(…); … producer.addOffsetsToTransaction(…); producer.commitTransaction(); // or .abortTransaction()
  • 29. @MatthiasJSax Kafka Streams Single vs Multi-cluster Kafka Streams (current) only works against a single broker cluster: • Does not really matter. We still rely on the brokers as target system. • Need source offsets but commit them via the producer. • Single broker cluster only avoids “dual” commit of source offsets. Supporting cross-cluster EOS with Kafka Streams is possible: • Add custom metadata topic to targe cluster. • Replace addOffsetsToTransaction() with send(). • Fetch consumer offset manually from metadata topic. • Issues: • EOS v2 implementation (producer per thread) not possible. • Limited to single target cluster. 29
  • 30. @MatthiasJSax The Big Challenge Error Handling in a (Distributed) Application Kafka transaction allow to fence “zombie” producers. Any EOS target system needs to support something similar (or rely on idempotency if possible). Kafka Connect Sink Connectors: • Idempotency or sink system fencing required—Connect framework cannot help at all. Kafka Connect Source Connectors: • Relies on producer fencing. • Does use a producer per task (similarly to Kafka Streams’ EOS v1 implementation). Kafka Streams: • Relies on producer fencing (EOS v1) or consumer fencing (EOS v2). • EOS v2 implementation (producer per thread) relies on consumer/producer integration inside the same broker cluster. 30
  • 31. @MatthiasJSax What to do in Practice? Publishing with producer-only app? The important thing is to figure out where to resume on restart: • Is there any “source progress” information you can store? • You need to add a consumer to your app! • On app restart: • Initialize producer to fence potential zombie and to force any pending TX to complete. • Use consumer (in read-committed mode) to inspect the target cluster’s data. Reading with consumer-only app? • If there is no target data system, only idempotency can help. • With no target data system, everything is basically a side-effect. 31
  • 32. @MatthiasJSax Exactly-once Key Takeaways (A) no producer-only EOS (B) no consumer-only EOS (C) read-process-write pattern (1) need ability to track source system read progress (2) require target system atomic write (plus fencing) (3) source system progress is recorded in target system Kafka built-in support via transactions + Zero coding with Kafka Streams ✅