SlideShare una empresa de Scribd logo
1 de 29
coolmeen, grinfeld
event driven design
event sourcing and cqrs
reactive programming
stream processing
servless?
coolmeen, grinfeld
reactive programming servless?
event driven design
coolmeen, grinfeld
what is not stream processing
coolmeen, grinfeld
what is stream processing
coolmeen, grinfeld
coolmeen, grinfeld
NOT Event Storming
coolmeen, grinfeld
The Big Bang of Technology and Tools
coolmeen, grinfeld
x
Death Star Architecture
coolmeen, grinfeld
coolmeen, grinfeld
streaming vs batching
coolmeen, grinfeld
Batch Processing
Batch processing is where the processing happens of blocks of (bounded) data that have already
been stored over a period of time.
‫ד‬"‫ש‬‫לניר‬:)
coolmeen, grinfeld
Stream Processing
What is Stream? Unbounded (infinite) data flow
“a type of data processing that is designed with infinite data sets in mind”
Tyler Akidau from Google
So what is Stream Processing?
coolmeen, grinfeld
coolmeen, grinfeld
coolmeen, grinfeld
Let’s cover few terms we need for diving into Streams use cases and implementations
● Event -
from oxford dictionary: “A thing that happens or takes place”
in computer systems: “is an action or occurrence recognized by software, often originating
asynchronously from the external environment, that may be handled by the software”.
Usually, event has additional data about state of event at time it has occurred
● Event Time - the time at which events actually occurred
● Processing Time - the time at which events are observed in the system
● Wall clock time - Wall-clock time is the time that a clock on the wall (or a stopwatch in
hand) would measure as having elapsed between the start of the process and 'now'.
● Upstream - The stream processor where current stream comes from
● Downstream - The stream processor where current stream goes to
● Source - the source structure (data source) to get data from (in Kafka Streams is topic)
● Sink - the destination structure (data source) to send data to (in Kafka Streams is topic)
coolmeen, grinfeld
So what operations could be done with stream?
count
sum, min, max,....
reduce
aggregate (agg)
Aggregation
coolmeen, grinfeld
So what operations could be done with stream?
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("topic-from", Consumed.with(Serdes.String(),
Serdes.Integer()))
.peek((k,v) -> log.info("get event {} with key {}", v, k))
.groupByKey(Serialized.with(Serdes.String(), Serdes.Integer()))
.count()
.toStream()
.peek((k,v) -> log.info("produce value {} with key {}", v, k))
.to("topic-to", Produced.with(Serdes.String(), Serdes.Long()));
final StreamsBuilder builder = new StreamsBuilder();
builder.stream("topic-from", Consumed.with(Serdes.String(),
Serdes.Integer()))
.peek((k,v) -> log.info("get event {} with key {}", v, k))
.groupByKey(Serialized.with(Serdes.String(), Serdes.Integer()))
.count()
.filter((k,v) -> v > 1)
.toStream()
.peek((k,v) -> log.info("produce value {} for key {}", v, k))
.to("topic-to" Produced.with(Serdes.String(), Serdes.Long()));
Input: s1->1, s2->2, s1->1, s3->3, s1->1
Output:
2019-06-01 12:10:35.499 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.510 INFO 19671 ..... : produce value 1 with key s1
2019-06-01 12:10:35.525 INFO 19671 ..... : get event 2 with key s2
2019-06-01 12:10:35.526 INFO 19671 ..... : produce value 1 with key s2
2019-06-01 12:10:35.527 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.527 INFO 19671 ..... : produce value 2 with key s1
2019-06-01 12:10:35.527 INFO 19671 ..... : get event 3 with key s3
2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 1 with key s3
2019-06-01 12:10:35.528 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 3 with key s1
Output:
2019-06-01 12:10:35.499 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.525 INFO 19671 ..... : get event 2 with key s2
2019-06-01 12:10:35.527 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.527 INFO 19671 ..... : produce value 2 with key s1
2019-06-01 12:10:35.527 INFO 19671 ..... : get event 3 with key s3
2019-06-01 12:10:35.528 INFO 19671 ..... : get event 1 with key s1
2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 3 with key s1
counter-stream-KSTREAM-AGGREGATE-STATE-STORE-
0000000002-changelog
Aggregation
coolmeen, grinfeld
So what operations could be done with stream? Windowing
Static (Tumbling) Window - repeats at a non-overlapping interval. Every record appears only in one
window (only once)
coolmeen, grinfeld
final StreamsBuilder builder = new StreamsBuilder();
builder
.stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer()))
.peek(((key, value) -> log.info("received {}", key, value)))
.groupByKey(Serialized.with(Serdes.String(), Serdes.Integer()))
.windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(3)))
.count()
// (Windowed<String>, Long)
.toStream((key, value) -> {
log.info("{} - {}",
new Date(key.window().start()), new Date(key.window().end()));
return key.key();
})
.filter((k,v) -> v > 1)
.peek((k,v) -> log.info("produce value {} for key {}", v, k))
.to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long()));
Output:
..... ..... : Sat Jun 01 19:01:40 2019 - Sat Jun 01 19:01:45 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : produce value 2 for key s1
..... ..... : received s3
..... ..... : received s1
..... ..... : produce value 3 for key s1
..... ..... : Sat Jun 01 19:01:50 2019 - Sat Jun 01 19:01:55 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : produce value 2 for key s1
..... ..... : received s3
..... ..... : received s1
..... ..... : produce value 3 for key s1
..... ..... : Sat Jun 01 19:01:55 2019 - Sat Jun 01 19:02:00 2019
..... ..... : received s1
..... ..... : received s2
..... ..... : received s1
..... ..... : produce value 2 for key s1
..... ..... : received s3
..... ..... : received s1
..... ..... : produce value 3 for key s1
So what operations could be done with stream? Windowing
coolmeen, grinfeld
So what operations could be done with stream?
Hopping Window - is similar to tumbling, but hopping generally has an overlapping interval. One
record could appear in more than one window
Windowing
……
…...
.windowedBy(
TimeWindows.of(TimeUnit.SECONDS.toMillis(5))
.advanceBy(TimeUnit.SECONDS.toMillis(1))
)
…...
…...
final StreamsBuilder builder = new StreamsBuilder();
builder
.stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer()))
.peek(((key, value) -> log.info("received {}", key, value)))
.groupByKey(Serialized.with(Serdes.String(), Serdes.Integer()))
.windowedBy(
TimeWindows.of(TimeUnit.SECONDS.toMillis(3)))
.advanceBy(TimeUnit.SECONDS.toMillis(1)
)
.count()
// (Windowed<String>, Long)
.toStream((key, value) -> {
log.info("{} - {}",
new Date(key.window().start()), new Date(key.window().end()));
return key.key();
})
.filter((k,v) -> v > 1)
.peek((k,v) -> log.info("produce value {} for key {}", v, k))
.to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long()));
coolmeen, grinfeld
So what operations could be done with stream? Windowing
Session Window - Sessions are a special type of window that captures a period of activity in the
data that is terminated by a gap of inactivity.
……..
.windowedBy(SessionWindows.with(TimeUnit.SECONDS.toMillis(5)))
……..
coolmeen, grinfeld
So what operations could be done with stream? Windowing - Join
A Sliding Window, opposed to a tumbling window, slides over the stream of data. Because of this, a
sliding window can be overlapping.
coolmeen, grinfeld
So what operations could be done with stream? KTable
How to create KTable: any aggregate, reduce operation returns KTable:
stream.groupByKey().reduce((aggValue, newValue) -> newValue, Materialized.with(Serdes.String(), new
JSONSerde<>(MyObject.class)));
KTable - in memory table representation of stream (backed by in memory by RockDB key value store)
builder.table("my-topic", Consumed.with(Serdes.String(), new JSONSerde<>(MyObject.class)));
Or simply create from stream builder:
coolmeen, grinfeld
So what operations could be done with stream? Join
Let’s look at use case, when we send messages and receive status, shortly after sending asynchronously
First we need to define stream to receive data from “sent-message” and “status-message” topics:
KStream<String, SentMessage> sentMessageStreamBuilder = builder.stream("sent-messages",
Consumed.with(Serdes.String(), new JSONSerde<>(SentMessage.class))
)
KStream<String, MessageStatus> statusMessageStreamBuilder = builder.stream("status-messages",
Consumed.with(Serdes.String(), new JSONSerde<>(MessageStatus.class))
)
We want to define (business decision) how much time we should wait for DRs (1 hour, 6 hours…)
KStream<String, MessageStatus> joinedStream = sentMessageStreamBuilder.join(
statusMessageStreamBuilder,
(sent, dr) -> dr.toBuilder().id(sent.getId()).build();,
JoinWindows.of(TimeUnit.SECONDS.toMillis(60L)),
Joined.with(Serdes.String(), new JSONSerde<>(SentMessage.class), new JSONSerde<>(MessageStatus.class))
);
coolmeen, grinfeld
So what operations could be done with stream? Join
So we received following messages
message SentMessage(id=1d51a90a-fb90-4988-9636-141d43ba5865, providerId=119,
extMessageId=cc3e48f6-e641-43d6-a30e-2bbd1a33bc02, from=972544406, to=972544306, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=6)
message SentMessage(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31,
extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, from=972544403, to=972544303, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=3)
message SentMessage(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63,
extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, from=972544407, to=972544307, status=SENT, statusTime=1559893509893,
sentTime=1559893509888, order=7)
message SentMessage(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7,
extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, from=972544402, to=972544302, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=2)
message SentMessage(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44,
extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, from=972544405, to=972544305, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=5)
message SentMessage(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45,
extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, from=972544401, to=972544301, status=SENT, statusTime=1559893509893,
sentTime=1559893509888, order=1)
message SentMessage(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113,
extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, from=972544409, to=972544309, status=SENT,
statusTime=1559893509893, sentTime=1559893509888, order=9)
coolmeen, grinfeld
So what operations could be done with stream? Join
We received following statuses
status MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403,
to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED,
statusTime=1559893509908)
status MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407,
to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED,
statusTime=1559893509908)
status MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402, to=972544302,
extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED, statusTime=1559893509908)
status MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405,
to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED,
statusTime=1559893509908)
status MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401,
to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED,
statusTime=1559893509908)
status MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409,
to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED,
statusTime=1559893509908)
coolmeen, grinfeld
So what operations could be done with stream? Join
And result of join operation:
final status for MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403,
to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED,
statusTime=1559893509908)
final status for MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407,
to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED,
statusTime=1559893509908)
final status for MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402,
to=972544302, extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED,
statusTime=1559893509908)
final status for MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405,
to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED,
statusTime=1559893509908)
final status for MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401,
to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED,
statusTime=1559893509908)
final status for MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409,
to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED,
statusTime=1559893509908)
coolmeen, grinfeld
Streaming 101: The world beyond batch
Streaming 102: The world beyond batch
Kafka Streams’ Take on Watermarks and Triggers
Introducing Kafka Streams: Stream Processing Made Simple
Big Data Battle : Batch Processing vs Stream Processing
Taming IoT Data: Making Sense of Sensors with SQL Streaming by Hans-Peter Grahsl
Developing Event-Driven Microservices with Event Sourcing and CQRS
Data Stream Processing Concepts and Implementations by Matthias Niehoff
Applying Reactive Programming with Rx
A pattern language for microservices
Kafka Streams - Not Looking at Facebook
Leveraging the Power of a Database Unbundled
Enabling Exactly Once in Kafka Streams
The Event Streaming Platform Explained (For Technical Leaders and Executives)
Sliding Vs Tumbling Windows
Kafka Streams: Streams DSL
Window Functions in Stream Analytics
Streams vs Serverless: Friend or Foe? by Ben Stopford
Introducing Stream Windows in Apache Flink
Flink Streaming - Tumbling and Sliding Windows
Preview of Kafka Streams
Kafka Streams – A First Impression
Kafka Stream Playground Github
coolmeen, grinfeld

Más contenido relacionado

Similar a Stream Processing Fundamentals and Operations

OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...Altinity Ltd
 
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022HostedbyConfluent
 
Viki Big Data Meetup 2013_10
Viki Big Data Meetup 2013_10Viki Big Data Meetup 2013_10
Viki Big Data Meetup 2013_10ishanagrawal90
 
DDD meets CQRS and event sourcing
DDD meets CQRS and event sourcingDDD meets CQRS and event sourcing
DDD meets CQRS and event sourcingGottfried Szing
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King Gyula Fóra
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호Amazon Web Services Korea
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineDatabricks
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubySATOSHI TAGOMORI
 
Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Guido Schmutz
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...confluent
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsGuozhang Wang
 
Real Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaReal Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaDaria Litvinov
 
Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice Marcin Szymaniuk
 
Masterclass Webinar: Application Services and Dynamic Dashboard
Masterclass Webinar: Application Services and Dynamic DashboardMasterclass Webinar: Application Services and Dynamic Dashboard
Masterclass Webinar: Application Services and Dynamic DashboardAmazon Web Services
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)MongoSF
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)Hansol Kang
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenHuy Nguyen
 
Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013DataStax Academy
 

Similar a Stream Processing Fundamentals and Operations (20)

OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
OSA Con 2022 - Building Event Collection SDKs and Data Models - Paul Boocock ...
 
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022Patterns and Practices for Event Design With Adam Bellemare | Current 2022
Patterns and Practices for Event Design With Adam Bellemare | Current 2022
 
Viki Big Data Meetup 2013_10
Viki Big Data Meetup 2013_10Viki Big Data Meetup 2013_10
Viki Big Data Meetup 2013_10
 
DDD meets CQRS and event sourcing
DDD meets CQRS and event sourcingDDD meets CQRS and event sourcing
DDD meets CQRS and event sourcing
 
Real-time analytics as a service at King
Real-time analytics as a service at King Real-time analytics as a service at King
Real-time analytics as a service at King
 
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
AWS re:Invent re:Cap - 데이터 분석: Amazon EC2 C4 Instance + Amazon EBS - 김일호
 
Data Time Travel by Delta Time Machine
Data Time Travel by Delta Time MachineData Time Travel by Delta Time Machine
Data Time Travel by Delta Time Machine
 
Norikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In RubyNorikra: SQL Stream Processing In Ruby
Norikra: SQL Stream Processing In Ruby
 
Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?Kafka as an Event Store - is it Good Enough?
Kafka as an Event Store - is it Good Enough?
 
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
Performance Analysis and Optimizations for Kafka Streams Applications (Guozha...
 
Performance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams ApplicationsPerformance Analysis and Optimizations for Kafka Streams Applications
Performance Analysis and Optimizations for Kafka Streams Applications
 
Real Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and KafkaReal Time analytics with Druid, Apache Spark and Kafka
Real Time analytics with Druid, Apache Spark and Kafka
 
Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice Apache Spark Data intensive processing in practice
Apache Spark Data intensive processing in practice
 
Masterclass Webinar: Application Services and Dynamic Dashboard
Masterclass Webinar: Application Services and Dynamic DashboardMasterclass Webinar: Application Services and Dynamic Dashboard
Masterclass Webinar: Application Services and Dynamic Dashboard
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
Byte Sized Rust
Byte Sized RustByte Sized Rust
Byte Sized Rust
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
LSGAN - SIMPle(Simple Idea Meaningful Performance Level up)
 
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy NguyenGrokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
Grokking Engineering - Data Analytics Infrastructure at Viki - Huy Nguyen
 
Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013Cassandra at Finn.io — May 30th 2013
Cassandra at Finn.io — May 30th 2013
 

Último

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 

Último (20)

CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 

Stream Processing Fundamentals and Operations

  • 2. event driven design event sourcing and cqrs reactive programming stream processing servless? coolmeen, grinfeld
  • 3. reactive programming servless? event driven design coolmeen, grinfeld
  • 4. what is not stream processing coolmeen, grinfeld
  • 5. what is stream processing coolmeen, grinfeld
  • 8. The Big Bang of Technology and Tools coolmeen, grinfeld
  • 12. Batch Processing Batch processing is where the processing happens of blocks of (bounded) data that have already been stored over a period of time. ‫ד‬"‫ש‬‫לניר‬:) coolmeen, grinfeld
  • 13. Stream Processing What is Stream? Unbounded (infinite) data flow “a type of data processing that is designed with infinite data sets in mind” Tyler Akidau from Google So what is Stream Processing? coolmeen, grinfeld
  • 16. Let’s cover few terms we need for diving into Streams use cases and implementations ● Event - from oxford dictionary: “A thing that happens or takes place” in computer systems: “is an action or occurrence recognized by software, often originating asynchronously from the external environment, that may be handled by the software”. Usually, event has additional data about state of event at time it has occurred ● Event Time - the time at which events actually occurred ● Processing Time - the time at which events are observed in the system ● Wall clock time - Wall-clock time is the time that a clock on the wall (or a stopwatch in hand) would measure as having elapsed between the start of the process and 'now'. ● Upstream - The stream processor where current stream comes from ● Downstream - The stream processor where current stream goes to ● Source - the source structure (data source) to get data from (in Kafka Streams is topic) ● Sink - the destination structure (data source) to send data to (in Kafka Streams is topic) coolmeen, grinfeld
  • 17. So what operations could be done with stream? count sum, min, max,.... reduce aggregate (agg) Aggregation coolmeen, grinfeld
  • 18. So what operations could be done with stream? final StreamsBuilder builder = new StreamsBuilder(); builder.stream("topic-from", Consumed.with(Serdes.String(), Serdes.Integer())) .peek((k,v) -> log.info("get event {} with key {}", v, k)) .groupByKey(Serialized.with(Serdes.String(), Serdes.Integer())) .count() .toStream() .peek((k,v) -> log.info("produce value {} with key {}", v, k)) .to("topic-to", Produced.with(Serdes.String(), Serdes.Long())); final StreamsBuilder builder = new StreamsBuilder(); builder.stream("topic-from", Consumed.with(Serdes.String(), Serdes.Integer())) .peek((k,v) -> log.info("get event {} with key {}", v, k)) .groupByKey(Serialized.with(Serdes.String(), Serdes.Integer())) .count() .filter((k,v) -> v > 1) .toStream() .peek((k,v) -> log.info("produce value {} for key {}", v, k)) .to("topic-to" Produced.with(Serdes.String(), Serdes.Long())); Input: s1->1, s2->2, s1->1, s3->3, s1->1 Output: 2019-06-01 12:10:35.499 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.510 INFO 19671 ..... : produce value 1 with key s1 2019-06-01 12:10:35.525 INFO 19671 ..... : get event 2 with key s2 2019-06-01 12:10:35.526 INFO 19671 ..... : produce value 1 with key s2 2019-06-01 12:10:35.527 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.527 INFO 19671 ..... : produce value 2 with key s1 2019-06-01 12:10:35.527 INFO 19671 ..... : get event 3 with key s3 2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 1 with key s3 2019-06-01 12:10:35.528 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 3 with key s1 Output: 2019-06-01 12:10:35.499 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.525 INFO 19671 ..... : get event 2 with key s2 2019-06-01 12:10:35.527 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.527 INFO 19671 ..... : produce value 2 with key s1 2019-06-01 12:10:35.527 INFO 19671 ..... : get event 3 with key s3 2019-06-01 12:10:35.528 INFO 19671 ..... : get event 1 with key s1 2019-06-01 12:10:35.528 INFO 19671 ..... : produce value 3 with key s1 counter-stream-KSTREAM-AGGREGATE-STATE-STORE- 0000000002-changelog Aggregation coolmeen, grinfeld
  • 19. So what operations could be done with stream? Windowing Static (Tumbling) Window - repeats at a non-overlapping interval. Every record appears only in one window (only once) coolmeen, grinfeld
  • 20. final StreamsBuilder builder = new StreamsBuilder(); builder .stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer())) .peek(((key, value) -> log.info("received {}", key, value))) .groupByKey(Serialized.with(Serdes.String(), Serdes.Integer())) .windowedBy(TimeWindows.of(TimeUnit.SECONDS.toMillis(3))) .count() // (Windowed<String>, Long) .toStream((key, value) -> { log.info("{} - {}", new Date(key.window().start()), new Date(key.window().end())); return key.key(); }) .filter((k,v) -> v > 1) .peek((k,v) -> log.info("produce value {} for key {}", v, k)) .to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long())); Output: ..... ..... : Sat Jun 01 19:01:40 2019 - Sat Jun 01 19:01:45 2019 ..... ..... : received s1 ..... ..... : received s2 ..... ..... : received s1 ..... ..... : produce value 2 for key s1 ..... ..... : received s3 ..... ..... : received s1 ..... ..... : produce value 3 for key s1 ..... ..... : Sat Jun 01 19:01:50 2019 - Sat Jun 01 19:01:55 2019 ..... ..... : received s1 ..... ..... : received s2 ..... ..... : received s1 ..... ..... : produce value 2 for key s1 ..... ..... : received s3 ..... ..... : received s1 ..... ..... : produce value 3 for key s1 ..... ..... : Sat Jun 01 19:01:55 2019 - Sat Jun 01 19:02:00 2019 ..... ..... : received s1 ..... ..... : received s2 ..... ..... : received s1 ..... ..... : produce value 2 for key s1 ..... ..... : received s3 ..... ..... : received s1 ..... ..... : produce value 3 for key s1 So what operations could be done with stream? Windowing coolmeen, grinfeld
  • 21. So what operations could be done with stream? Hopping Window - is similar to tumbling, but hopping generally has an overlapping interval. One record could appear in more than one window Windowing …… …... .windowedBy( TimeWindows.of(TimeUnit.SECONDS.toMillis(5)) .advanceBy(TimeUnit.SECONDS.toMillis(1)) ) …... …... final StreamsBuilder builder = new StreamsBuilder(); builder .stream("counter-topic", Consumed.with(Serdes.String(), Serdes.Integer())) .peek(((key, value) -> log.info("received {}", key, value))) .groupByKey(Serialized.with(Serdes.String(), Serdes.Integer())) .windowedBy( TimeWindows.of(TimeUnit.SECONDS.toMillis(3))) .advanceBy(TimeUnit.SECONDS.toMillis(1) ) .count() // (Windowed<String>, Long) .toStream((key, value) -> { log.info("{} - {}", new Date(key.window().start()), new Date(key.window().end())); return key.key(); }) .filter((k,v) -> v > 1) .peek((k,v) -> log.info("produce value {} for key {}", v, k)) .to("counter-topic-to", Produced.with(Serdes.String(), Serdes.Long())); coolmeen, grinfeld
  • 22. So what operations could be done with stream? Windowing Session Window - Sessions are a special type of window that captures a period of activity in the data that is terminated by a gap of inactivity. …….. .windowedBy(SessionWindows.with(TimeUnit.SECONDS.toMillis(5))) …….. coolmeen, grinfeld
  • 23. So what operations could be done with stream? Windowing - Join A Sliding Window, opposed to a tumbling window, slides over the stream of data. Because of this, a sliding window can be overlapping. coolmeen, grinfeld
  • 24. So what operations could be done with stream? KTable How to create KTable: any aggregate, reduce operation returns KTable: stream.groupByKey().reduce((aggValue, newValue) -> newValue, Materialized.with(Serdes.String(), new JSONSerde<>(MyObject.class))); KTable - in memory table representation of stream (backed by in memory by RockDB key value store) builder.table("my-topic", Consumed.with(Serdes.String(), new JSONSerde<>(MyObject.class))); Or simply create from stream builder: coolmeen, grinfeld
  • 25. So what operations could be done with stream? Join Let’s look at use case, when we send messages and receive status, shortly after sending asynchronously First we need to define stream to receive data from “sent-message” and “status-message” topics: KStream<String, SentMessage> sentMessageStreamBuilder = builder.stream("sent-messages", Consumed.with(Serdes.String(), new JSONSerde<>(SentMessage.class)) ) KStream<String, MessageStatus> statusMessageStreamBuilder = builder.stream("status-messages", Consumed.with(Serdes.String(), new JSONSerde<>(MessageStatus.class)) ) We want to define (business decision) how much time we should wait for DRs (1 hour, 6 hours…) KStream<String, MessageStatus> joinedStream = sentMessageStreamBuilder.join( statusMessageStreamBuilder, (sent, dr) -> dr.toBuilder().id(sent.getId()).build();, JoinWindows.of(TimeUnit.SECONDS.toMillis(60L)), Joined.with(Serdes.String(), new JSONSerde<>(SentMessage.class), new JSONSerde<>(MessageStatus.class)) ); coolmeen, grinfeld
  • 26. So what operations could be done with stream? Join So we received following messages message SentMessage(id=1d51a90a-fb90-4988-9636-141d43ba5865, providerId=119, extMessageId=cc3e48f6-e641-43d6-a30e-2bbd1a33bc02, from=972544406, to=972544306, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=6) message SentMessage(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, from=972544403, to=972544303, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=3) message SentMessage(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, from=972544407, to=972544307, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=7) message SentMessage(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, from=972544402, to=972544302, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=2) message SentMessage(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, from=972544405, to=972544305, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=5) message SentMessage(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, from=972544401, to=972544301, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=1) message SentMessage(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, from=972544409, to=972544309, status=SENT, statusTime=1559893509893, sentTime=1559893509888, order=9) coolmeen, grinfeld
  • 27. So what operations could be done with stream? Join We received following statuses status MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403, to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED, statusTime=1559893509908) status MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407, to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED, statusTime=1559893509908) status MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402, to=972544302, extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED, statusTime=1559893509908) status MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405, to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED, statusTime=1559893509908) status MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401, to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED, statusTime=1559893509908) status MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409, to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED, statusTime=1559893509908) coolmeen, grinfeld
  • 28. So what operations could be done with stream? Join And result of join operation: final status for MessageStatus(id=4922c6dc-c3f3-44ee-b0c9-e66fa71e39e6, providerId=31, from=972544403, to=972544303, extMessageId=16f409b4-7dac-4625-9730-80a3523a5962, status=DELIVERED, statusTime=1559893509908) final status for MessageStatus(id=aa501317-a1c1-43e5-92c4-c0549b9a30df, providerId=63, from=972544407, to=972544307, extMessageId=17f54e6f-df15-45ee-859f-48029a3d81d5, status=DELIVERED, statusTime=1559893509908) final status for MessageStatus(id=9aed49ff-ceb4-4f80-a3f9-95a6e35400fb, providerId=7, from=972544402, to=972544302, extMessageId=721cae9a-b102-4a05-bf32-035e10ce098f, status=DELIVERED, statusTime=1559893509908) final status for MessageStatus(id=175f47bc-9fd5-49a5-bfa6-52c66253e3d0, providerId=44, from=972544405, to=972544305, extMessageId=d2a8e6d2-e0db-44b2-b5b8-08ab3f235010, status=DELIVERED, statusTime=1559893509908) final status for MessageStatus(id=ed78978b-f719-4699-9d70-5673f57ba59d, providerId=45, from=972544401, to=972544301, extMessageId=1dfbbfeb-baa7-445a-8c5d-f950bd051c95, status=DELIVERED, statusTime=1559893509908) final status for MessageStatus(id=8653b530-ac66-46a4-aaf4-fe3140addcd2, providerId=113, from=972544409, to=972544309, extMessageId=592f22ed-eb14-4015-be08-8614a44768e8, status=DELIVERED, statusTime=1559893509908) coolmeen, grinfeld
  • 29. Streaming 101: The world beyond batch Streaming 102: The world beyond batch Kafka Streams’ Take on Watermarks and Triggers Introducing Kafka Streams: Stream Processing Made Simple Big Data Battle : Batch Processing vs Stream Processing Taming IoT Data: Making Sense of Sensors with SQL Streaming by Hans-Peter Grahsl Developing Event-Driven Microservices with Event Sourcing and CQRS Data Stream Processing Concepts and Implementations by Matthias Niehoff Applying Reactive Programming with Rx A pattern language for microservices Kafka Streams - Not Looking at Facebook Leveraging the Power of a Database Unbundled Enabling Exactly Once in Kafka Streams The Event Streaming Platform Explained (For Technical Leaders and Executives) Sliding Vs Tumbling Windows Kafka Streams: Streams DSL Window Functions in Stream Analytics Streams vs Serverless: Friend or Foe? by Ben Stopford Introducing Stream Windows in Apache Flink Flink Streaming - Tumbling and Sliding Windows Preview of Kafka Streams Kafka Streams – A First Impression Kafka Stream Playground Github coolmeen, grinfeld

Notas del editor

  1. There are a lot of different solution for stream processing (mostly from Apache Foundation)
  2. streaming and batching
  3. evolving architecture
  4. For example, processing all the transaction that have been performed by a major financial firm in a week. This data contains millions of records for a day that can be stored as a file or record etc. This particular file will undergo processing at the end of the day for various analysis that firm wants to do. Obviously it will take large amount of time for that file to be processed. That would be what Batch Processing is. Program doesn’t react on some incoming event, but processes data on some limited (and huge) set of data. We have similar ETL process for reports in Charlie (implemented inside the Oracle with jobs and queries)
  5. “infinite” - we can’t bound the data by “start” and “end” since we don’t have any historical knowledge about data (opposite to batch, where we can find data boundaries by time or some other parameter)
  6. It can be confusing, since we could call almost any system “streaming”, but let’s divide it into ingestion stage, when we want as fast as possible to return OK to customer/system and inner propagation of events inside the system. Inside our system we want to react on ingested data at “near real time” form and deliver (transform) data to some end-point (external or inner). For example, when we receive message, we want as fast as possible to send it to destination or in case of ad-tech, when we receive click on some item, we want to find advertisement to show to user before he leaves the page (and without affecting time of loading page). In streaming we work with discrete elements (events). Sometimes we’ll wish to group them according to some time, for example - this is called windowing and we’ll talk about it this little bit later, sometimes we’ll want to aggregate (count), again, during some period of time (since we “unbounded”, and NOT since beginning of time) or simple to deliver further in our system pipeline (pipeline is actually good word :) )
  7. Another thing about streaming is, in my opinion, resources. Now all (the most) of systems use shared resources (CPU, MEMORY and so on). We don’t want to take and hold resources when we don’t need it, yes, yes - non blocking IO and other nice phrases, and this is applicable for the whole system architecture. It means, we want to start working only when some “event” occurs. And it takes me to another (if not the name, at least the result of) streaming definition - event driven design. We want to react on event, make minimum required work and send the task/event/job to another process which will take care about next stage and so and so on. It means, that we stream events, transform events to another events, compare to some data or/and enrich them and store them for future processing (in SQL DB, NoSQL DB, messaging queue, file system…) Event - something that happened in the past and had name, at least (usually some state which describes event at SPECIFIC time it had occurred) and usually we know what time this event occurred. Event is something that already happened and unlike science fiction, it couldn’t be changed. This makes event perfect to fit the logic of read-only append log (anybody says Kafka?). Every stream processor creates new event (based on received event) and again when downstream (see previous slide) receives event- it’s already the new one. So events by definition are immutable. Event Time - we’ll talk little bit more about this when we reach Windowing. In most cases we’ll have problems if we don’t normalize time to some agreed vector. It should be normalized (at the one of the stages) to some standard time (aka, GMT/IST..) Processing Time - the time when specific element (module) processes event (usually, it time when message received by the system, Kafka for example, but it could be time when specific micro service processes the event) Wall-clock time - since it’s time on specific computer, processing time usually set by wall-clock time. It’s important to be sure that in the system all your components have same wall clock time (as much as possible), else you’ll get time skew and it can introduce un-expected behavior in the system
  8. Flink, Kafka Streams/KSQL and Spark Streaming
  9. Let’s look at simple example where we want to count number of events per key. For every key that appears again - counter increases But it’s not so useful (usually), let’s show only those keys that appear more than twice You can see automatically created topic by Kafka Streams with aggregated data for every key, so how Kafka knows to aggregate specific value (hence, Kafka is not search engine and why it should be efficient if Kafka consume from this topic for every incoming event?) - we’ll explain this topic later (KTable)
  10. Let’s start with the simple time of window - static (tumbling). (I prefer to call this static since its boundaries don’t move and no overlapping A tumbling window has a fixed length. The next window is placed right after the end of the previous one on the time axis. Tumbling windows do not overlap and span the whole time domain, i.e. each event is assigned to exactly one window.
  11. Let’s send same input to Kafka, but 3 times with 7 seconds interval. Here is simple example of window. We count keys which appears more than 1 time during window of 5 sec We can see that count result from previous example appears 3 times (we have 3 windows ranges from input) and reset for every window Here example of Kafka topic which stores aggregated results, but per window The result (sink) topic we see only final result
  12. Like tumbling windows, hopping windows also have a fixed length. However they introduce a second configuration parameter: The hop size h. Instead of moving the window of length s forward in time by s we move it by h. A common use case for hopping windows are moving average computations. (or irate function like in Prometheus :) ) Hopping window usually confused with sliding window. In Kafka Streams there is hopping window, but not sliding (sliding exists in join operatoin) Show animation at: https://dev.to/frosnerd/window-functions-in-stream-analytics-1m6c
  13. Simple example: when we want to know how many operations use performed during time he has been logged in (in session), we can do it with session window, so every click we (aggregate) count any click until he is in session. When session expired (we can set window time same as session expiration time) and user log in again, we start counter from zero and so on
  14. We can explain the sliding window as window which starts when event with some key arrives. If you read articles and see videos about streaming and batching, you’ll be confused between hopping and sliding window definitions. Since, we are looking into Kafka Streams, we follow Kafka notion of windowing.
  15. In order to perform join “keys” in Kafka should be the same (represent same value: “id” for example) Since data from streams actually stored in memory, we should consider how long time we want to store data in-memory. The last line in join statement, defines how to serialize data from topic
  16. So what happened in our application during join: During join created Kafka Streams created 2 KTable: sent message ktable status message ktable when message comes to sent message stream, it checks if in status message ktable exists record for this in key and time of message fits defined in join window (actually, in ktable key is compound key for original key and window, where start window is time when event received and end is start + window size\length) inserts record into sent message ktable with key: original key + window based on event received time and window size\length when message comes to status message stream, happens the same as previous but in opposite direction Questions: expiration, compaction, application crash and so on….