SlideShare a Scribd company logo
1 of 79
Joining Infinity — Windowless Stream Processing with Flink
Sanjar Akhmedov, Software Engineer, ResearchGate
It started when two researchers discovered first-
hand that collaborating with a friend or colleague on
the other side of the world was no easy task. There are many variations of
passages of Lorem Ipsum
ResearchGate is a social
network for scientists.
Connect the world of science.
Make research open to all.
Structured system
There are many variations of
passages of Lorem Ipsum
We have, and are
continuing to change
how scientific
knowledge is shared and
discovered.
10,000,000
Members
102,000,000
Publications
30,000,
Visitor
Feature: Research Timeline
Feature: Research Timeline
Diverse data sources
Proxy
Frontend
Services
memcache MongoDB Solr PostgreSQL
Infinispan HBaseMongoDB Solr
Big data pipeline
Change
data
capture
Import
Hadoop cluster
Export
Data Model
Account Publication
Claim
1 *
Author
Authorship
1*
Hypothetical SQL
Publication
Authorship
1*
CREATE TABLE publications (
id SERIAL PRIMARY KEY,
author_ids INTEGER[]
);
Account
Claim
1 *
Author
CREATE TABLE accounts (
id SERIAL PRIMARY KEY,
claimed_author_ids INTEGER[]
);
CREATE MATERIALIZED VIEW account_publications
REFRESH FAST ON COMMIT
AS
SELECT
accounts.id AS account_id,
publications.id AS publication_id
FROM accounts
JOIN publications
ON ANY (accounts.claimed_author_ids) = ANY (publications.author_ids);
• Data sources are distributed across different DBs
• Dataset doesn’t fit in memory on a single machine
• Join process must be fault tolerant
• Deploy changes fast
• Up-to-date join result in near real-time
• Join result must be accurate
Challenges
Change data capture (CDC)
User Microservice DB
Request Write
Cache
Sync
Solr/ES
Sync
HBase/HDFS
Sync
Change data capture (CDC)
User Microservice DB
Request Write
Log
K2
1
K1
4
Extract
Change data capture (CDC)
User Microservice DB
Request Write
Log
K2
1
K1
4
K1
Ø
Extract
Change data capture (CDC)
User Microservice DB
Request Write
Log
K2
1
K1
4
K1
Ø
KN
42
…
Extract
Change data capture (CDC)
User Microservice DB
Cache
Request Write
Log
K2
1
K1
4
K1
Ø
KN
42
…
Extract
Sync
Change data capture (CDC)
User Microservice DB
Cache
Request Write
Log
K2
1
K1
4
K1
Ø
KN
42
…
Extract
Sync
HBase/HDFSSolr/ES
Join two CDC streams into one
NoSQL1
SQL Kafka
Kafka
Flink Streaming
Join
Kafka NoSQL2
Flink job topology
Accounts
Stream
Join(CoFlatMap)
Account
Publications
Publications
Stream
…
Author 2
Author 1
Author N
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
DataStream<Account> accounts = kafkaTopic("accounts");
DataStream<Publication> publications = kafkaTopic("publications");
DataStream<AccountPublication> result = accounts.connect(publications)
.keyBy("claimedAuthorId", "publicationAuthorId")
.flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() {
transient ValueState<String> authorAccount;
transient ValueState<String> authorPublication;
public void open(Configuration parameters) throws Exception {
authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null));
authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null));
}
public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception {
authorAccount.update(account.id);
if (authorPublication.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception {
authorPublication.update(publication.id);
if (authorAccount.value() != null) {
out.collect(new AccountPublication(authorAccount.value(), authorPublication.value()));
}
}
});
Prototype implementation
Example dataflow
Account Publications
Accounts
Alice 2
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Author 2
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Author 2
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Author 2
Alice
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Author 2
Alice
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Author 2
Alice
Author N
…
(Bob, 1)
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob
Author 2
Alice
Author N
…
(Bob, 1)
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob
Author 2
Alice
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob
Author 2
Alice
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Example dataflow
Account Publications
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Example dataflow
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
• ✔ Data sources are distributed across different DBs
• ✔ Dataset doesn’t fit in memory on a single machine
• ✔ Join process must be fault tolerant
• ✔ Deploy changes fast
• ✔ Up-to-date join result in near real-time
• ? Join result must be accurate
Challenges
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
?
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Need previous
value
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Diff with
Previous
State
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Diff with
Previous
State
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets deleted
Account Publications
K1 (Bob, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 Ø
Accounts
Stream
Join
Account
Publications
Diff with
Previous
State
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Need K1 here,
e.g. K1 = 𝒇(Bob, Paper1)
Paper1 gets updated
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets updated
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice
Author N
…
Paper1 gets updated
Account Publications
K1 (Bob, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
Paper1 gets updated
Account Publications
K1 (Bob, Paper1)
(Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
(Alice, Paper1)
Paper1 gets updated
Account Publications
K1 (Bob, Paper1)
?? (Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 2
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Bob Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Ø Paper1
Author 2
Alice Paper1
Author N
…
2. (Alice, 1)
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
K2 (Alice, Paper1)
K2 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
Alice claims Paper1 via different author
Account Publications
K1 (Bob, Paper1)
K2 (Alice, Paper1)
K1 Ø
K3 (Alice, Paper1)
K2 Ø
Accounts
Alice 2
Bob 1
Bob Ø
Alice 1
Publications
Paper1 1
Paper1 (1, 2)
Accounts
Stream
Join
Account
Publications
Publications
Stream
Author 1
Alice Paper1
Author 2
Ø Paper1
Author N
…
2. (Alice, 1)
(Alice, Paper1)
Pick correct natural IDs
e.g. K3 = 𝒇(Alice, Author1, Paper1)
• Keep previous element state to update
previous join result
• Stream elements are not domain entities
but commands such as delete or upsert
• Joined stream must have natural IDs
to propagate deletes and updates
How to solve deletes and updates
Generic join graph
Account
Publications
Accounts
Stream
Publications
Stream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Generic join graph
Operate on commands
Account
Publications
Accounts
Stream
Publications
Stream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Memory requirements
Account
Publications
Accounts
Stream
Publications
Stream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Full copy of
Accounts stream
Full copy of
Publications
stream
Full copy of
Accounts stream
on left side
Full copy of
Publications stream
on right side
Network load
Account
Publications
Accounts
Stream
Publications
Stream
Diff
Alice
Bob
…
Diff
Paper1
PaperN
…
Join
Author1
AuthorM
…
Reshuffle
Reshuffle
Network
Network
• In addition to handling Kafka traffic we need to reshuffle all
data twice over the network
• We need to keep two full copies of each joined stream in
memory
Resource considerations
Questions
We are hiring - www.researchgate.net/careers
Lorem ipsum dolor sit amet, consectetur
adipiscing elit. Mauris pharetra interdum felis, sit
amet aliquet mauris. Proin non fermentum sem.
Vivamus a ligula vel arcu convallis porttitor.
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink
Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink

More Related Content

What's hot

What's hot (20)

Getting Started with Confluent Schema Registry
Getting Started with Confluent Schema RegistryGetting Started with Confluent Schema Registry
Getting Started with Confluent Schema Registry
 
Streams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQLStreams, Tables, and Time in KSQL
Streams, Tables, and Time in KSQL
 
Deep dive into stateful stream processing in structured streaming by Tathaga...
Deep dive into stateful stream processing in structured streaming  by Tathaga...Deep dive into stateful stream processing in structured streaming  by Tathaga...
Deep dive into stateful stream processing in structured streaming by Tathaga...
 
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache SparkArbitrary Stateful Aggregations using Structured Streaming in Apache Spark
Arbitrary Stateful Aggregations using Structured Streaming in Apache Spark
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016Racing The Web - Hackfest 2016
Racing The Web - Hackfest 2016
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Introduction to Actor Model and Akka
Introduction to Actor Model and AkkaIntroduction to Actor Model and Akka
Introduction to Actor Model and Akka
 
CDC Stream Processing with Apache Flink
CDC Stream Processing with Apache FlinkCDC Stream Processing with Apache Flink
CDC Stream Processing with Apache Flink
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...HTTP Analytics for 6M requests per second using ClickHouse, by  Alexander Boc...
HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Exactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka StreamsExactly-once Stream Processing with Kafka Streams
Exactly-once Stream Processing with Kafka Streams
 

Viewers also liked

S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
Flink Forward
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
Flink Forward
 

Viewers also liked (20)

Ted Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink DriftTed Dunning-Faster and Furiouser- Flink Drift
Ted Dunning-Faster and Furiouser- Flink Drift
 
Julian Hyde - Streaming SQL
Julian Hyde - Streaming SQLJulian Hyde - Streaming SQL
Julian Hyde - Streaming SQL
 
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs
Zoltán Zvara - Advanced visualization of Flink and Spark jobs

Zoltán Zvara - Advanced visualization of Flink and Spark jobs

 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data CompanionS. Bartoli & F. Pompermaier – A Semantic Big Data Companion
S. Bartoli & F. Pompermaier – A Semantic Big Data Companion
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Flink Case Study: OKKAM
Flink Case Study: OKKAMFlink Case Study: OKKAM
Flink Case Study: OKKAM
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Flink Case Study: Amadeus
Flink Case Study: AmadeusFlink Case Study: Amadeus
Flink Case Study: Amadeus
 
Dongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of FlinkDongwon Kim – A Comparative Performance Evaluation of Flink
Dongwon Kim – A Comparative Performance Evaluation of Flink
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
 
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to FlinkTrevor Grant - Apache Zeppelin - A friendlier way to Flink
Trevor Grant - Apache Zeppelin - A friendlier way to Flink
 
Alexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrelAlexander Kolb - Flinkspector – Taming the squirrel
Alexander Kolb - Flinkspector – Taming the squirrel
 
Automatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia KalavriAutomatic Detection of Web Trackers by Vasia Kalavri
Automatic Detection of Web Trackers by Vasia Kalavri
 
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in FlinkMaxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
Maxim Fateev - Beyond the Watermark- On-Demand Backfilling in Flink
 
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
Ana M Martinez - AMIDST Toolbox- Scalable probabilistic machine learning with...
 
Eron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on MesosEron Wright - Introducing Flink on Mesos
Eron Wright - Introducing Flink on Mesos
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 

Similar to Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink

Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
HostedbyConfluent
 

Similar to Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink (20)

Apache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream ProcessingApache Kafka, and the Rise of Stream Processing
Apache Kafka, and the Rise of Stream Processing
 
Meet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUGMeet the squirrel @ #CSHUG
Meet the squirrel @ #CSHUG
 
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
[WSO2Con EU 2017] Streaming Analytics Patterns for Your Digital Enterprise
 
GraphQL Bangkok Meetup 2.0
GraphQL Bangkok Meetup 2.0GraphQL Bangkok Meetup 2.0
GraphQL Bangkok Meetup 2.0
 
Azure F#unctions
Azure F#unctionsAzure F#unctions
Azure F#unctions
 
Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19Kick your database_to_the_curb_reston_08_27_19
Kick your database_to_the_curb_reston_08_27_19
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
Building Kafka Connectors with Kotlin: A Step-by-Step Guide to Creation and D...
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data
MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your DataMongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data
MongoDB .local Toronto 2019: Using Change Streams to Keep Up with Your Data
 
Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4 Spline 0.3 and Plans for 0.4
Spline 0.3 and Plans for 0.4
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Streaming twitter data using kafka
Streaming twitter data using kafkaStreaming twitter data using kafka
Streaming twitter data using kafka
 
Office 365 Groups and Tasks API - Getting Started
Office 365 Groups and Tasks API - Getting StartedOffice 365 Groups and Tasks API - Getting Started
Office 365 Groups and Tasks API - Getting Started
 
Let's talk about NoSQL Standard
Let's talk about NoSQL StandardLet's talk about NoSQL Standard
Let's talk about NoSQL Standard
 
Let's talk about NoSQL Standard
Let's talk about NoSQL StandardLet's talk about NoSQL Standard
Let's talk about NoSQL Standard
 
JOOQ and Flyway
JOOQ and FlywayJOOQ and Flyway
JOOQ and Flyway
 
Asynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka StreamsAsynchronous stream processing with Akka Streams
Asynchronous stream processing with Akka Streams
 
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
Big Data otimizado: Arquiteturas eficientes para construção de Pipelines MapR...
 

More from Flink Forward

More from Flink Forward (20)

“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 

Recently uploaded

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
gajnagarg
 

Recently uploaded (20)

Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls Bellary Escorts ☎️9352988975 Two shot with one girl ...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
Just Call Vip call girls roorkee Escorts ☎️9352988975 Two shot with one girl ...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 

Sanjar Akhmedov - Joining Infinity – Windowless Stream Processing with Flink

  • 1. Joining Infinity — Windowless Stream Processing with Flink Sanjar Akhmedov, Software Engineer, ResearchGate
  • 2. It started when two researchers discovered first- hand that collaborating with a friend or colleague on the other side of the world was no easy task. There are many variations of passages of Lorem Ipsum ResearchGate is a social network for scientists.
  • 3. Connect the world of science. Make research open to all.
  • 4. Structured system There are many variations of passages of Lorem Ipsum We have, and are continuing to change how scientific knowledge is shared and discovered.
  • 8. Diverse data sources Proxy Frontend Services memcache MongoDB Solr PostgreSQL Infinispan HBaseMongoDB Solr
  • 10. Data Model Account Publication Claim 1 * Author Authorship 1*
  • 11. Hypothetical SQL Publication Authorship 1* CREATE TABLE publications ( id SERIAL PRIMARY KEY, author_ids INTEGER[] ); Account Claim 1 * Author CREATE TABLE accounts ( id SERIAL PRIMARY KEY, claimed_author_ids INTEGER[] ); CREATE MATERIALIZED VIEW account_publications REFRESH FAST ON COMMIT AS SELECT accounts.id AS account_id, publications.id AS publication_id FROM accounts JOIN publications ON ANY (accounts.claimed_author_ids) = ANY (publications.author_ids);
  • 12. • Data sources are distributed across different DBs • Dataset doesn’t fit in memory on a single machine • Join process must be fault tolerant • Deploy changes fast • Up-to-date join result in near real-time • Join result must be accurate Challenges
  • 13. Change data capture (CDC) User Microservice DB Request Write Cache Sync Solr/ES Sync HBase/HDFS Sync
  • 14. Change data capture (CDC) User Microservice DB Request Write Log K2 1 K1 4 Extract
  • 15. Change data capture (CDC) User Microservice DB Request Write Log K2 1 K1 4 K1 Ø Extract
  • 16. Change data capture (CDC) User Microservice DB Request Write Log K2 1 K1 4 K1 Ø KN 42 … Extract
  • 17. Change data capture (CDC) User Microservice DB Cache Request Write Log K2 1 K1 4 K1 Ø KN 42 … Extract Sync
  • 18. Change data capture (CDC) User Microservice DB Cache Request Write Log K2 1 K1 4 K1 Ø KN 42 … Extract Sync HBase/HDFSSolr/ES
  • 19. Join two CDC streams into one NoSQL1 SQL Kafka Kafka Flink Streaming Join Kafka NoSQL2
  • 21. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 22. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 23. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 24. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 25. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 26. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 27. DataStream<Account> accounts = kafkaTopic("accounts"); DataStream<Publication> publications = kafkaTopic("publications"); DataStream<AccountPublication> result = accounts.connect(publications) .keyBy("claimedAuthorId", "publicationAuthorId") .flatMap(new RichCoFlatMapFunction<Account, Publication, AccountPublication>() { transient ValueState<String> authorAccount; transient ValueState<String> authorPublication; public void open(Configuration parameters) throws Exception { authorAccount = getRuntimeContext().getState(new ValueStateDescriptor<>("authorAccount", String.class, null)); authorPublication = getRuntimeContext().getState(new ValueStateDescriptor<>("authorPublication", String.class, null)); } public void flatMap1(Account account, Collector<AccountPublication> out) throws Exception { authorAccount.update(account.id); if (authorPublication.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } public void flatMap2(Publication publication, Collector<AccountPublication> out) throws Exception { authorPublication.update(publication.id); if (authorAccount.value() != null) { out.collect(new AccountPublication(authorAccount.value(), authorPublication.value())); } } }); Prototype implementation
  • 28. Example dataflow Account Publications Accounts Alice 2 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Author 2 Author N …
  • 29. Example dataflow Account Publications Accounts Alice 2 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Author 2 Author N …
  • 30. Example dataflow Account Publications Accounts Alice 2 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Author 2 Alice Author N …
  • 31. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Author 2 Alice Author N …
  • 32. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Author 2 Alice Author N … (Bob, 1)
  • 33. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Accounts Stream Join Account Publications Publications Stream Author 1 Bob Author 2 Alice Author N … (Bob, 1)
  • 34. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Paper1 1 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Author 2 Alice Author N …
  • 35. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Paper1 1 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Author 2 Alice Author N …
  • 36. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Paper1 1 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 37. Example dataflow Account Publications Accounts Alice 2 Bob 1 Publications Paper1 1 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 38. Example dataflow Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 39. • ✔ Data sources are distributed across different DBs • ✔ Dataset doesn’t fit in memory on a single machine • ✔ Join process must be fault tolerant • ✔ Deploy changes fast • ✔ Up-to-date join result in near real-time • ? Join result must be accurate Challenges
  • 40. Paper1 gets deleted Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 41. Paper1 gets deleted Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 42. Paper1 gets deleted Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N … ?
  • 43. Paper1 gets deleted Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N … Need previous value
  • 44. Paper1 gets deleted Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Diff with Previous State Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 45. Paper1 gets deleted Account Publications K1 (Bob, Paper1) K1 Ø Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Diff with Previous State Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 46. Paper1 gets deleted Account Publications K1 (Bob, Paper1) K1 Ø Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 Ø Accounts Stream Join Account Publications Diff with Previous State Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N … Need K1 here, e.g. K1 = 𝒇(Bob, Paper1)
  • 47. Paper1 gets updated Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 2 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 48. Paper1 gets updated Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 2 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Author N …
  • 49. Paper1 gets updated Account Publications K1 (Bob, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 2 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N …
  • 50. Paper1 gets updated Account Publications K1 (Bob, Paper1) (Alice, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 2 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N … (Alice, Paper1)
  • 51. Paper1 gets updated Account Publications K1 (Bob, Paper1) ?? (Alice, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 2 Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N …
  • 52. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) Accounts Alice 2 Bob 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N …
  • 53. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) Accounts Alice 2 Bob 1 Bob Ø Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N …
  • 54. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) Accounts Alice 2 Bob 1 Bob Ø Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Bob Paper1 Author 2 Alice Paper1 Author N …
  • 55. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Ø Paper1 Author 2 Alice Paper1 Author N …
  • 56. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Ø Paper1 Author 2 Alice Paper1 Author N …
  • 57. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Ø Paper1 Author 2 Alice Paper1 Author N …
  • 58. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Ø Paper1 Author 2 Alice Paper1 Author N … 2. (Alice, 1)
  • 59. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Alice Paper1 Author 2 Ø Paper1 Author N … 2. (Alice, 1)
  • 60. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Alice Paper1 Author 2 Ø Paper1 Author N … 2. (Alice, 1) (Alice, Paper1)
  • 61. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø K2 (Alice, Paper1) K2 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Alice Paper1 Author 2 Ø Paper1 Author N … 2. (Alice, 1) (Alice, Paper1)
  • 62. Alice claims Paper1 via different author Account Publications K1 (Bob, Paper1) K2 (Alice, Paper1) K1 Ø K3 (Alice, Paper1) K2 Ø Accounts Alice 2 Bob 1 Bob Ø Alice 1 Publications Paper1 1 Paper1 (1, 2) Accounts Stream Join Account Publications Publications Stream Author 1 Alice Paper1 Author 2 Ø Paper1 Author N … 2. (Alice, 1) (Alice, Paper1) Pick correct natural IDs e.g. K3 = 𝒇(Alice, Author1, Paper1)
  • 63. • Keep previous element state to update previous join result • Stream elements are not domain entities but commands such as delete or upsert • Joined stream must have natural IDs to propagate deletes and updates How to solve deletes and updates
  • 65. Generic join graph Operate on commands Account Publications Accounts Stream Publications Stream Diff Alice Bob … Diff Paper1 PaperN … Join Author1 AuthorM …
  • 66. Memory requirements Account Publications Accounts Stream Publications Stream Diff Alice Bob … Diff Paper1 PaperN … Join Author1 AuthorM … Full copy of Accounts stream Full copy of Publications stream Full copy of Accounts stream on left side Full copy of Publications stream on right side
  • 68. • In addition to handling Kafka traffic we need to reshuffle all data twice over the network • We need to keep two full copies of each joined stream in memory Resource considerations
  • 69. Questions We are hiring - www.researchgate.net/careers
  • 70.
  • 71.
  • 72. Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris pharetra interdum felis, sit amet aliquet mauris. Proin non fermentum sem. Vivamus a ligula vel arcu convallis porttitor.