Flink Forward San Francisco 2022.
Being in the payments space, Stripe requires strict correctness and freshness guarantees. We rely on Flink as the natural solution for delivering on this in support of our Change Data Capture (CDC) infrastructure. We heavily rely on CDC as a tool for capturing data change streams from our databases without critically impacting database reliability, scalability, and maintainability. Data derived from these streams is used broadly across the business and powers many of our critical financial reporting systems totalling over $640 Billion in payment volume annually. We use many components of Flink’s flexible DataStream API to perform aggregations and abstract away the complexities of stream processing from our downstreams. In this talk, we’ll walk through our experience from the very beginning to what we have in production today. We’ll share stories around the technical details and trade-offs we encountered along the way.
by
Jeff Chao
2. An API that gets out of your way
It’s so easy, we’ve embedded a bunch of examples right
here. Copy some of these requests into your terminal and
check out what happens.
With wrappers in Ruby, PHP, Python and more, you can
get started in minutes. Learn More ➤
3. As complexity grew…
Then we had a ProblemFactory
Started out with
We had a problem, so we thought to use …
4. As data volume grew…
Database scalability is a complicated topic…
Started out with
Had to make sure it was web scale
Distributed transactions
Change Data Capture
5.
6. Squirreling Away $640 Billion
Flink Forward - San Francisco 2022
Jeff Chao
Staff Engineer / Tech Lead for Change Data Capture Infrastructure at Stripe
How Stripe Leverages Flink for Change Data Capture
7. 7
CDC at Stripe
Agenda
1 Aggregating Change Events
2 How it Started, How it Ended
3
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Change Data Capture (CDC) is widely-
used at Stripe to capture data changes
from databases without critically
impacting database reliability and
scalability. CDC powers many critical
financial use cases at Stripe such as the
Stripe Dashboard, Stripe Search, Sigma,
and Financial Reporting.
From idea to production—things may
seem straightforward at first, but the
details matter. We detail our journey of
how we leveraged Flink for Change Data
Capture at Stripe in order to uphold the
highest data quality standards. Freshness,
Coverage, and Correctness SLOs are
paramount to the success of platforms
and applications running on top of our
CDC infrastructure.
Change Event Streams are ubiquitous
across Stripe given the vast number of
applications and employees generating
datasets worldwide. Change Event
Streams are independent from one
another which leads to the typical
challenges in distributed systems. One of
the major use cases revolves around
aggregating individual change events of a
database transaction to support Stripe’s
payments infrastructure.
8. Change Data Capture (CDC) is widely-
used at Stripe to capture data changes
from databases without critically
impacting database reliability and
scalability. CDC powers many critical
financial use cases at Stripe such as the
Stripe Dashboard, Stripe Search, Sigma,
and Financial Reporting.
8
From idea to production—things may
seem straightforward at first, but the
details matter. We detail our journey of
how we leveraged Flink for Change Data
Capture at Stripe in order to uphold the
highest data quality standards. Freshness,
Coverage, and Correctness SLOs are
paramount to the success of platforms
and applications running on top of our
CDC infrastructure.
Change Event Streams are ubiquitous
across Stripe given the vast number of
applications and employees generating
datasets worldwide. Change Event
Streams are independent from one
another which leads to the typical
challenges in distributed systems. One of
the major use cases revolves around
aggregating individual change events of a
database transaction to support Stripe’s
payments infrastructure.
Agenda
CDC at Stripe
1 Aggregating Change Events
2 How it Started, How it Ended
3
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
15. Interoperable
Abstract Away Internals
Operational Excellence
15
Building a Platform
Make sure that we abstract away
database internals such as sharding
topology and ensure a datastore-agnostic
transport.
Build a high leveraged platform which
makes working with Change Events
interoperable with other systems within
the organization.
Minimal toil given as we scale the number
of datasets, ensure clean separation
between infrastructure and user issues,
create great operator experiences, reduce
control plane and data plane blast radius,
maintain good operator tooling/developer
experience/processes.
CDC at Stripe
16. 16
Agenda
CDC at Stripe
1 Aggregating Change Events
2 How it Started, How it Ended
3
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Change Data Capture (CDC) is widely-
used at Stripe to capture data changes
from databases without critically
impacting database reliability and
scalability. CDC powers many critical
financial use cases at Stripe such as the
Stripe Dashboard, Stripe Search, Sigma,
and Financial Reporting.
From idea to production—things may
seem straightforward at first, but the
details matter. We detail our journey of
how we leveraged Flink for Change Data
Capture at Stripe in order to uphold the
highest data quality standards. Freshness,
Coverage, and Correctness SLOs are
paramount to the success of platforms
and applications running on top of our
CDC infrastructure.
Change Event Streams are ubiquitous
across Stripe given the vast number of
applications and employees generating
datasets worldwide. Change Event
Streams are independent from one
another which leads to the typical
challenges in distributed systems. One of
the major use cases revolves around
aggregating individual change events of a
database transaction to support Stripe’s
payments infrastructure.
17. Why?
17
Aggregating Change Events
Product teams working with payments data use transactions
Arbitrary number of tables in a database transaction
They should be able to get transactions back out from the CDC path
They shouldn’t have to become stream processing experts
34. What is an Aggregated Change Event?
34
{
"ts_utc" : 1659375300000,
"data": [
{
"operation": "CREATE",
"transaction": { “id”: "txn1"},
"before": null,
"after": { ... },
},
{
"operation": "UPDATE",
"transaction": { “id”: "txn1"},
"before": { ... },
"after": { ... },
},
]
}
● One transaction with two events
having the same transaction ID.
● Events may arrive from an
arbitrary number of tables.
Aggregating Change Events
37. Joins elements of the same
key within the same window.
● Produces pairwise
elements
Join
37
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Event 1 BEGIN
,
Event 1 COMMIT
,
Event 2 BEGIN
,
Event 2 COMMIT
,
Event 3 BEGIN
,
Event 3 COMMIT
,
Aggregating Change Events
38. Unions multiple streams of
the same type into a single
stream.
● Requires streams of the
same type
Union
38
38
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
(No output; won’t compile because streams are of different
types)
Aggregating Change Events
39. Connect
39
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Event 1 BEGIN
, Event 2 COMMIT
,
Event 3 BEGIN
, COMMIT
,
, ,
Unions multiple streams,
potentially of different types.
● Similar to Unions
Aggregating Change Events
40. 40
Support for streams of different types
Support for flexible stream combination semantics
Don’t need pairwise outputs
Aggregating Change Events
What Do We Need?
41. Flink Job Definition
41
val mainStream =
transactionMetadataEventStream // uid and name omitted.
.connect(changeEventStream) // Union different types.
Aggregating Change Events
44. Wraps an event containing one
of two types, either from left or
right stream.
● Out-of-box
● No concept of keys
Either.left =
Either.right = null
Either
44
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Event 1
BEGIN
, Either.left = null
Either.right =
,
…
Aggregating Change Events
45. WrappedEvent.key = txn-1
WrappedEvent.left = null
WrappedEvent.right =
Custom
45
WrappedEvent.key = txn-1
WrappedEvent.left =
WrappedEvent.right = null
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Event 1
BEGIN
,
, …
Wraps an event containing one
of two types, either from left or
right stream, and a common
key among both events.
● Small and simple code
addition
● Need to extract keys
Aggregating Change Events
46. 46
Wrap elements of a connected stream
Be able to identify keys to support
aggregations later
Aggregating Change Events
What Do We Need?
47. Flink Job Definition
47
val mainStream =
transactionMetadataEventStream // uid and name omitted.
.connect(changeEventStream) // Union different types.
.flatMap(new WrappedEventFunction) // Like Either type, but
with extra fields.
.keyBy(_.key) //
Group events with the same transaction ID.
Aggregating Change Events
49. Aggregation Characteristics
Arbitrary number of Change Event Streams
One Transaction Metadata Event Stream
Change Events must have the same
transaction IDs
Handle late arriving or duplicate Change
Events and Transaction Metadata Events
Don’t result in infinite state growth
49
Aggregating Change Events
51. Tumbling Windows
51
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Aggregating Change Events
52. Tumbling Windows
52
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
Aggregating Change Events
53. Tumbling Windows
53
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
● Late-arriving events? Add delay.
Aggregating Change Events
54. Tumbling Windows
54
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
● Late-arriving events? Add delay.
Aggregating Change Events
55. Tumbling Windows
55
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
● Late-arriving events? Add delay.
● Large delay? Trade-off: Freshness vs Correctness.
Aggregating Change Events
56. Tumbling Windows
56
Assigns elements to windows
of a fixed size.
● Windows don’t overlap
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
● Late-arriving events? Add delay.
● Large delay? Trade-off: Freshness vs Correctness.
● Not quite right…
Aggregating Change Events
57. Sliding Windows
57
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT BEGIN COMMIT
Event 3
Assigns elements to windows
of a fixed size, but with a slide
interval.
● Almost like a tumbling
window, but with windows
overlapping
Aggregating Change Events
58. Sliding Windows
58
time
Change Events
Transaction
Metadata Events
Event 1 Event 2
BEGIN COMMIT
● Late-arriving events? Same as tumbling windows.
● Slide interval? Explosion of windows
● Not quite right…
Aggregating Change Events
Assigns elements to windows
of a fixed size, but with a slide
interval.
● Almost like a tumbling
window, but with windows
overlapping
59. Session Windows
59
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
BEGIN COMMIT
Event 3
Aggregating Change Events
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
60. Session Windows
60
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
61. Session Windows
61
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
62. Session Windows
62
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
● Session gap too small? Incomplete aggregates
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
63. Session Windows
63
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
● Session gap too small? Incomplete aggregates
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
64. Session Windows
64
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
● Session gap too small? Incomplete aggregates
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
65. Session Windows
65
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
● Session gap too small? Incomplete aggregates
● Session gap too big? Trade-off: Freshness vs Correctness
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
66. Session Windows
66
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
● Session gap too small? Incomplete aggregates
● Session gap too big? Trade-off: Freshness vs Correctness
● Not quite right…
Assigns elements that are seen
relatively close to each other.
● Arbitrarily-sized windows;
no fixed start and end
● Windows don’t overlap
● Windows close based on a
defined gap of inactivity
Aggregating Change Events
67. Global Windows
67
Assigns elements to a single
window.
● Only a single window per
key
● Window never closes
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
BEGIN COMMIT
Event 3
Aggregating Change Events
68. Global Windows
68
Assigns elements to a single
window.
● Only a single window per
key
● Window never closes
time
Change Events
Transaction
Metadata Events
Event 1
BEGIN COMMIT
Event 2
BEGIN COMMIT
Event 3
● Outputs never get evaluated and materialized
● Needs more…
Aggregating Change Events
69. Global Windows + Custom Stateful Trigger
69
Assign elements to a Global Window and add a custom
stateful trigger.
● Flexibly define open/close conditions for non-
overlapping windows
● Reasonably handle late-arriving events
● Avoid infinite state growth and reduce likelihood of
incomplete aggregates
Aggregating Change Events
70. What Makes an Aggregation Complete?
70
Aggregating Change Events
BEGIN transaction marker seen
COMMIT transaction marker seen
All Change Events of the transaction seen
All Change Events are globally and locally ordered
71. Custom Stateful Trigger:
TransactionBoundaryTrigger
71
if transaction metadata event:
if begin transaction marker:
update begin marker state
else:
update commit marker state
update bitmap state
using commit marker’s total event count
set timeout state and register event time timer
else:
update bitmap state
with change event’s global position
set timeout state and register event time timer
if should trigger(begin, commit, total events):
clear window
TriggerResult.FIRE_AND_PURGE
else:
TriggerResult.CONTINUE
Reference
Aggregating Change Events
// ChangeEvent#transaction
{
"id" : "transaction-id",
"global_position": 1,
"source_position": 1,
}
// TransactionMetadataEvent
{
"id" : "transaction-id",
"ts_utc": 1659375300000,
"marker": "COMMIT",
"total_events": 3,
"per_source_event_counts": [{ ... }],
}
72. val mainStream =
transactionMetadataEventStream // uid and name omitted.
.connect(changeEventStream) // Union different types.
.flatMap(new WrappedEventFunction) // Like Either type, but
with extra fields.
.keyBy(_.key) //
Group events with the same transaction ID.
Flink Job Definition
72
.window(GlobalWindows.create)
.trigger(new TransactionBoundaryTrigger(...)) // Flexible windowing semantics.
.process(new KeyedProcessor(...))
Aggregating Change Events
74. val mainStream =
transactionMetadataEventStream // uid and name omitted.
.connect(changeEventStream) // Union different types.
.flatMap(new WrappedEventFunction) // Like Either type, but
with extra fields.
.keyBy(_.key) //
Group events with the same transaction ID.
.window(GlobalWindows.create)
.trigger(new TransactionBoundaryTrigger(...)) // Flexible windowing semantics.
.process(new KeyedProcessor(...))
Flink Job Definition
74
mainStream //
Side output to DLQ.
.getSideOutput(...)
.addSink(...)
mainStream //
Output aggregated change events.
.addSink(...)
Aggregating Change Events
75. 75
Agenda
CDC at Stripe
1 Aggregating Change Events
2 How it Started, How it Ended
3
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Change Data Capture (CDC) is widely-
used at Stripe to capture data changes
from databases without critically
impacting database reliability and
scalability. CDC powers many critical
financial use cases at Stripe such as the
Stripe Dashboard, Stripe Search, Sigma,
and Financial Reporting.
From idea to production—things may
seem straightforward at first, but the
details matter. We detail our journey of
how we leveraged Flink for Change Data
Capture at Stripe in order to uphold the
highest data quality standards. Freshness,
Coverage, and Correctness SLOs are
paramount to the success of platforms
and applications running on top of our
CDC infrastructure.
Change Event Streams are ubiquitous
across Stripe given the vast number of
applications and employees generating
datasets worldwide. Change Event
Streams are independent from one
another which leads to the typical
challenges in distributed systems. One of
the major use cases revolves around
aggregating individual change events of a
database transaction to support Stripe’s
payments infrastructure.
76. From Idea to Production
76
Coverage
Platform
State
How it Started, How it Ended
80. Infinite keys due to continuous stream of new transactions
Observations
80
How it Started, How it Ended
Using a Global Window; possible windows not closing properly
No trigger timeouts firing
No watermarks being generated
82. Fix
82
Fixed an upstream issue where transaction IDs were getting mixed up
Reduce parallelism on Source Sub Tasks for all streams
Make sure parallelism ≤ ∑ Topic Partitions
Generally, check with SplitEnumerator classes
How it Started, How it Ended
85. State size still growing, but slower
Observations
85
How it Started, How it Ended
Event time timers firing, sometimes
Watermarks are being generated, but not for all sub tasks
86. New Observations
86
charges
(partitions = 2)
Transaction
Metadata Events
audits
(partitions = 1)
disputes
(partitions = 1)
Source Sub Tasks
Low volume stream
How it Started, How it Ended
87. Possible Fix
87
Switch from event time to processing time
Less precise
Could cause premature trigger firing, resulting in incomplete aggregates
How it Started, How it Ended
88. Actual Fix
88
Add idleness property on sources
Can still use event time
More precise
Not perfect; can still result in incomplete aggregates in edge cases
That’s the reality of streaming
How it Started, How it Ended
92. Don’t want to redeploy every time a new dataset (Kafka Topic) is added
Observations
92
How it Started, How it Ended
Blows away Freshness SLO’s error budget
Poor developer onboarding experience
93. Fix
93
Instead of Kafka Topic List Subscriber, use Regex Subscriber
Subscribe to all topics (for a keyspace) by default
Control plane (external) service produces an event to Broadcast Stream
On broadcast element, use Broadcast State to keep onboarded datasets in state
On element, check Broadcast State and filter for onboarded datasets
How it Started, How it Ended
97. Observations
Incomplete aggregates still happening, but not frequently
97
How it Started, How it Ended
Kafka by default is at-least-once delivery
Many independent streams operating at different speeds
98. Storage will be expensive. Trade-off between confidence and cost-
efficiency: KV store or bloom filter
Move incomplete aggregate measurement out of the Flink Job and into a
system downstream
Fix
98
How it Started, How it Ended
New system needs to dedupe events… for all time?
101. 101
Agenda
CDC at Stripe
1 Aggregating Change Events
2 How it Started, How it Ended
3
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Change Data Capture (CDC) is widely-
used at Stripe to capture data changes
from databases without critically
impacting database reliability and
scalability. CDC powers many critical
financial use cases at Stripe such as the
Stripe Dashboard, Stripe Search, Sigma,
and Financial Reporting.
From idea to production – things may
seem straightforward at first, but the
details matter. We detail our journey of
how we leveraged Flink for Change Data
Capture at Stripe in order to uphold the
highest data quality standards. Freshness,
Coverage, and Correctness SLOs are
paramount to the success of platforms
and applications running on top of our
CDC infrastructure.
Change Event Streams are ubiquitous
across Stripe given the vast number of
applications and employees generating
datasets worldwide. Change Event
Streams are independent from one
another which leads to the typical
challenges in distributed systems. One of
the major use cases revolves around
aggregating individual change events of a
database transaction to support Stripe’s
payments infrastructure.
102. Aggregating Change Events is relatively
straightforward, but the details matter
Squirreling Away $640 Billion: How Stripe Leverages Flink for Change Data Capture
Wrap Up
102
Change Data Capture (CDC) is widely-used at
Stripe to improve database reliability and scalability
Flink is a critical component in Stripe’s CDC
infrastructure that allows us to work with financial
streaming data with high data quality guarantees
At what scale? $640B annual in payment volume. Challenging…
Many products, many apps and services, many datasets.
Across many databases of different types. Mongo, MySQL. Multi-region, databases have many shards which are split as volume grows.
Watermarks per partition, not per key. Perhaps note an upstream issue, nonetheless, could have manifested by testing out late events.
Watermark = min parallelism
Keys can go to the same partition, one key could be late, another could not. Watermark will progress. Timeout will fire - incomplete aggregate. Late key comes in and is treated as incomplete aggregate again.
Connect with broadcast stream.
processElement -> check broadcast state
processBroadcastElement -> update state
Union or join. Streams are independent and any one stream can have duplicate. If duplicate, will result in incomplete aggregate for that key. It won’t unless all streams have the same number of duplicates for that key, but unlikely.
Imagine an aggregate was just completed for a key. Then, dup happens and event sits in state until timed out.