Stream Processing Fundamentals and Operations

event driven design
event sourcing and cqrs
reactive programming
stream processing
servless?
coolmeen, grinfeld

reactive programming servless?
event driven design
coolmeen, grinfeld

what is not stream processing
coolmeen, grinfeld

what is stream processing
coolmeen, grinfeld

NOT Event Storming
coolmeen, grinfeld

The Big Bang of Technology and Tools
coolmeen, grinfeld

x
Death Star Architecture
coolmeen, grinfeld

streaming vs batching
coolmeen, grinfeld

Batch Processing
Batch processing is where the processing happens of blocks of (bounded) data that have already
been stored over a period of time.
‫ד‬"‫ש‬‫לניר‬:)
coolmeen, grinfeld

Stream Processing
What is Stream? Unbounded (infinite) data flow
“a type of data processing that is designed with infinite data sets in mind”
Tyler Akidau from Google
So what is Stream Processing?
coolmeen, grinfeld

Let’s cover few terms we need for diving into Streams use cases and implementations
● Event -
from oxford dictionary: “A thing that happens or takes place”
in computer systems: “is an action or occurrence recognized by software, often originating
asynchronously from the external environment, that may be handled by the software”.
Usually, event has additional data about state of event at time it has occurred
● Event Time - the time at which events actually occurred
● Processing Time - the time at which events are observed in the system
● Wall clock time - Wall-clock time is the time that a clock on the wall (or a stopwatch in
hand) would measure as having elapsed between the start of the process and 'now'.
● Upstream - The stream processor where current stream comes from
● Downstream - The stream processor where current stream goes to
● Source - the source structure (data source) to get data from (in Kafka Streams is topic)
● Sink - the destination structure (data source) to send data to (in Kafka Streams is topic)
coolmeen, grinfeld

So what operations could be done with stream?
count
sum, min, max,....
reduce
aggregate (agg)
Aggregation
coolmeen, grinfeld

final StreamsBuilder builder = new StreamsBuilder();
builder.stream("topic-from", Consumed.with(Serdes.String(),
Serdes.Integer()))
.peek((k,v) -> log.info("get event {} with key {}", v, k))
.groupByKey(Serialized.with(Serdes.String(), Serdes.Integer()))
.count()
.toStream()
.peek((k,v) -> log.info("produce value {} with key {}", v, k))
.to("topic-to", Produced.with(Serdes.String(), Serdes.Long()));
builder.stream("topic-from", Consumed.with(Serdes.String(),
Serdes.Integer()))
.peek((k,v) -> log.info("get event {} with key {}", v, k))
.count()
.filter((k,v) -> v > 1)
.toStream()
.peek((k,v) -> log.info("produce value {} for key {}", v, k))
.to("topic-to" Produced.with(Serdes.String(), Serdes.Long()));
Input: s1->1, s2->2, s1->1, s3->3, s1->1
Output:
2019-06-01 12:10:35.499 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.510 INFO 19671 ..... : produce value 1 with key s1
Output:
counter-stream-KSTREAM-AGGREGATE-STATE-STORE-
0000000002-changelog
Aggregation
coolmeen, grinfeld

So what operations could be done with stream? Windowing
Static (Tumbling) Window - repeats at a non-overlapping interval. Every record appears only in one
window (only once)
coolmeen, grinfeld

builder
.stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer()))
.peek(((key, value) -> log.info("received {}", key, value)))
.windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(3)))
.count()
// (Windowed<String>, Long)
.toStream((key, value) -> {
log.info("{} - {}",
new Date(key.window().start()), new Date(key.window().end()));
return key.key();
})
.filter((k,v) -> v > 1)
.to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long()));
Output:
..... ..... : Sat Jun 01 19:01:40 2019 - Sat Jun 01 19:01:45 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : produce value 2 for key s1
..... ..... : received s3
..... ..... : received s1
..... ..... : Sat Jun 01 19:01:50 2019 - Sat Jun 01 19:01:55 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : received s3
..... ..... : received s1
..... ..... : Sat Jun 01 19:01:55 2019 - Sat Jun 01 19:02:00 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : received s3
..... ..... : received s1
coolmeen, grinfeld

Hopping Window - is similar to tumbling, but hopping generally has an overlapping interval. One
record could appear in more than one window
Windowing
……
…...
.windowedBy(
TimeWindows.of(TimeUnit.SECONDS.toMillis(5))
.advanceBy(TimeUnit.SECONDS.toMillis(1))
)
…...
…...
builder
.stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer()))
.peek(((key, value) -> log.info("received {}", key, value)))
.windowedBy(
TimeWindows.of(TimeUnit.SECONDS.toMillis(3)))
.advanceBy(TimeUnit.SECONDS.toMillis(1)
)
.count()
// (Windowed<String>, Long)
.toStream((key, value) -> {
log.info("{} - {}",
new Date(key.window().start()), new Date(key.window().end()));
return key.key();
})
.filter((k,v) -> v > 1)
.to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long()));
coolmeen, grinfeld

Session Window - Sessions are a special type of window that captures a period of activity in the
data that is terminated by a gap of inactivity.
……..
.windowedBy(SessionWindows.with(TimeUnit.SECONDS.toMillis(5)))
……..
coolmeen, grinfeld

So what operations could be done with stream? Windowing - Join
A Sliding Window, opposed to a tumbling window, slides over the stream of data. Because of this, a
sliding window can be overlapping.
coolmeen, grinfeld

So what operations could be done with stream? KTable
How to create KTable: any aggregate, reduce operation returns KTable:
stream.groupByKey().reduce((aggValue, newValue) -> newValue, Materialized.with(Serdes.String(), new
JSONSerde<>(MyObject.class)));
KTable - in memory table representation of stream (backed by in memory by RockDB key value store)
builder.table("my-topic", Consumed.with(Serdes.String(), new JSONSerde<>(MyObject.class)));
Or simply create from stream builder:
coolmeen, grinfeld

So what operations could be done with stream? Join
Let’s look at use case, when we send messages and receive status, shortly after sending asynchronously
First we need to define stream to receive data from “sent-message” and “status-message” topics:
KStream<String, SentMessage> sentMessageStreamBuilder = builder.stream("sent-messages",
Consumed.with(Serdes.String(), new JSONSerde<>(SentMessage.class))
)
KStream<String, MessageStatus> statusMessageStreamBuilder = builder.stream("status-messages",
Consumed.with(Serdes.String(), new JSONSerde<>(MessageStatus.class))
)
We want to define (business decision) how much time we should wait for DRs (1 hour, 6 hours…)
KStream<String, MessageStatus> joinedStream = sentMessageStreamBuilder.join(
statusMessageStreamBuilder,
(sent, dr) -> dr.toBuilder().id(sent.getId()).build();,
JoinWindows.of(TimeUnit.SECONDS.toMillis(60L)),
Joined.with(Serdes.String(), new JSONSerde<>(SentMessage.class), new JSONSerde<>(MessageStatus.class))
);
coolmeen, grinfeld

So we received following messages
message SentMessage(id=1d51a90a-fb90-4988-9636-141d43ba5865, providerId=119,
extMessageId=cc3e48f6-e641-43d6-a30e-2bbd1a33bc02, from=972544406, to=972544306, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=6)
message SentMessage(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31,
extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, from=972544403, to=972544303, status=SENT,
message SentMessage(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63,
extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, from=972544407, to=972544307, status=SENT, statusTime=1559893509893,
sentTime=1559893509888, order=7)
message SentMessage(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7,
extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, from=972544402, to=972544302, status=SENT,
message SentMessage(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44,
extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, from=972544405, to=972544305, status=SENT,
message SentMessage(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45,
extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, from=972544401, to=972544301, status=SENT, statusTime=1559893509893,
sentTime=1559893509888, order=1)
message SentMessage(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113,
extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, from=972544409, to=972544309, status=SENT,
coolmeen, grinfeld

We received following statuses
status MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403,
to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED,
statusTime=1559893509908)
status MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407,
to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED,
status MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402, to=972544302,
extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED, statusTime=1559893509908)
status MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405,
to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED,
status MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401,
to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED,
status MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409,
to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED,
coolmeen, grinfeld

And result of join operation:
final status for MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403,
to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED,
final status for MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407,
to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED,
final status for MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402,
to=972544302, extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED,
final status for MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405,
to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED,
final status for MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401,
to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED,
final status for MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409,
to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED,
coolmeen, grinfeld

Streaming 101: The world beyond batch
Streaming 102: The world beyond batch
Kafka Streams’ Take on Watermarks and Triggers
Introducing Kafka Streams: Stream Processing Made Simple
Big Data Battle : Batch Processing vs Stream Processing
Taming IoT Data: Making Sense of Sensors with SQL Streaming by Hans-Peter Grahsl
Developing Event-Driven Microservices with Event Sourcing and CQRS
Data Stream Processing Concepts and Implementations by Matthias Niehoff
Applying Reactive Programming with Rx
A pattern language for microservices
Kafka Streams - Not Looking at Facebook
Leveraging the Power of a Database Unbundled
Enabling Exactly Once in Kafka Streams
The Event Streaming Platform Explained (For Technical Leaders and Executives)
Sliding Vs Tumbling Windows
Kafka Streams: Streams DSL
Window Functions in Stream Analytics
Streams vs Serverless: Friend or Foe? by Ben Stopford
Introducing Stream Windows in Apache Flink
Flink Streaming - Tumbling and Sliding Windows
Preview of Kafka Streams
Kafka Streams – A First Impression
Kafka Stream Playground Github
coolmeen, grinfeld

Stream Processing Fundamentals and Operations

Recomendados

Recomendados

Más contenido relacionado

Similar a Stream Processing Fundamentals and Operations

Similar a Stream Processing Fundamentals and Operations (20)

Último

Último (20)

Stream Processing Fundamentals and Operations

Notas del editor