This document discusses using Apache Kafka for event streaming and summarizes some key challenges and solutions. It addresses how to get data into Kafka, ensure completeness of events from multiple data sources, handle asynchronous workflows, detect absence of expected events, and track complex workflows with a routing slip pattern.
16. MySQL
inserts: op_type: I,
after struct
updates: op_type: U,
before,after struct
Debezium
16
@jbfletch_
Change Data Capture (CDC)
17. MySQL
inserts: op_type: I,
after struct
updates: op_type: U,
before,after struct
deletes: op_type: D,
before struct
Plus optional tombstone
message
Debezium
17
@jbfletch_
Change Data Capture (CDC)
19. @jbfletch_
● How do I delay my data by an
hour?
● Is it bad to have 14 retry
topics?
RED FLAGS!
Problem 2. Moving to Async from
Existing Synchronous Workflows
28. @jbfletch_
Embrace Completion Criteria
● Order Placed event
needs header and items
● Order Filled event needs
header, items, and stock
details
● Order Shipped event
needs header, items,
stock, address and
shipping details
Mainframe
+
+
29. @jbfletch_
Embrace Completion Criteria
1. Define the
information you need
for completeness.
2. Enumerate and map
those sources.
3. Employ a known best
practice to stop
using time and start
using completion.
31. @jbfletch_
Change Data Capture (CDC)...Multiple Tables Uh Oh..
What do I do when my
commit is across multiple
tables..how do I handle
transactions?
34. Becoming an event profiler
Trigger the event
in the source
system
Review the CDC
Messages
generated during
the event
Find the event
fingerprint that
signifies
completeness
WANTED
34
@jbfletch_
35. Becoming an event profiler
Trigger the event
in the source
system
Review the CDC
Messages
generated during
the event
Find the event
fingerprint that
signifies
completeness
WANTED
35
@jbfletch_
36. Becoming an event profiler
Trigger the event
in the source
system
Review the CDC
Messages
generated during
the event
Find the event
fingerprint that
signifies
completeness
WANTED
36
@jbfletch_
40. Plan of attack
40
We need both of these things to be true before we
fire the event:
Order # 42
Total # Items 3
Order Event Aggregate
@jbfletch_
41. How do we wait for all items to arrive?
41
@jbfletch_
42. What is the minimum information we need to know to
be able to determine event completeness for items?
42
@jbfletch_
Total Number
of items per
order
44. Basic Central Event Service Setup
Filter using the event profile: op_type = “I” to create:
KTable<OrderId, Order> orderTableKeyOrderId <- Orders that
match our order created event profile
KStream<ItemId,Items> itemsKeyedByItemIdStream <- Items
that match our order created event profile
KTable<OrderId,NumberOfItems> totalNumberofItemsTable <-
Number of items in each order
44
1
2
3
45. groupBy + aggregate + join + filter
KTable<OrderId, ArrayList<Items>> preItemsTable =
itemsKeyedByItemIdStream(ItemId,Items)
.groupBy(ORDER_ID) ← OrderId: 42
.aggregate(ArrayList::new, add(Items), return null for TS)
.join(totalNumberofItemsTable) <- itemCount: 3
Optional join allows for the propagation of total order
items to each item
45
1
2
3
@jbfletch_
46. groupBy + aggregate + join + filter
KTable<OrderId, ArrayList<Items>> preItemsTable =
itemsKeyedByItemIdStream<ItemId,Items>
.groupBy(ORDER_ID) ← OrderId: 42
.aggregate(ArrayList::new, add(Items), return null for TS)
.join(totalNumberofItemsTable) <- itemCount: 3
Optional join allows for the propagation of total order
items to each item
46
1
2
3
@jbfletch_
47. groupBy + aggregate + join + filter
KTable<OrderId, ArrayList<Items>> preItemsTable =
itemsKeyedByItemIdStream(ItemId,Items)
.groupBy(ORDER_ID) ← OrderId: 42
.aggregate(ArrayList::new, add(Items), return null for TS)
.join(totalNumberofItemsTable) <- itemCount: 3
Optional join allows for the propagation of total order
items to each item
47
1
2
3
@jbfletch_
48. groupBy + aggregate + join + filter
KTable<OrderId, ArrayList<Items>> fullItemsTable =
preItemsTable
.filter((k,v)-> v.size() ==
v.get(0).get(“itemCount”).asInt()) <- This
filter will block until all 3 item lines are
in the array
48
4
@jbfletch_
49. Part Deux
49
How can we be sure the order message has arrived?
Order # 42
Total # Items 3
Order Event Aggregate
@jbfletch_
50. Thou shall not pass!!! Using inner joins
50
KTable fullOrderTable =
orderTableKeyOrderId
.join(fullItemsTable, <- only
contains orders with all items
arrived (orderNode, itemNodes) -> {
construct and return order
placed event aggregate
})
@jbfletch_
53. @jbfletch_
Absence of an Event Detection with Kafka Streams
Time Exceeded?
Order Header Delay
Event
Order Items Delay
Event
The Processor API (PAPI)
can be used to evaluate
the time an entry has
existed in a state store, if
it’s over the tolerance you
set, an event can be
emitted.