SlideShare a Scribd company logo
1 of 16
Download to read offline
Re-introducing the Stream
ProcessorA Universal Tool for Continuous Data Analytical Needs
A Universal Tool for Continuous Data Analysis
Paris Carbone
Committer @ Apache Flink
PhD Candidate @ KTH
Data Stream Processors
Data Stream
Processor
can set up any data
pipeline for you
http://edge.alluremedia.com.au/m/l/2014/10/CoolingPipes.jpg
Is this really a step forward in data processing?
A growing open-source ecosystem:
kafkaflink beam apex
e.g.
General Idea of the tech:
• Processes pipeline computation in a cluster
• Computation is continuous and parallel (like data)
• Event-processing logic <-> Application state
• It’s production-ready and aims to simplify analytics
Data Stream Processors
streams
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
1. Speed
stream
processor
1. Speed
Low-Latency Data Processing
Traditionally the sole reason stream processing was used
• No intermediate scheduling (you let it run)
• No physical blocking (pre-compute on the go)
• Copy-on-write for state and output
How do stream processors achieve low latency?
But Is this is only relevant for live data?
CEP semantics etc. are nowadays provided as additional
libraries for stream processors
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
1. Speed 2. History
stream
processor
2. History
Offline Data Processing
It is possible and better over bulk historical data analysis
• Ability to define custom state to build up models
• Large-scale support is a given (inherits cluster computing benefits)
• Separation of notions of time and out-of-order processing
What can stream processors do for historical data?
But isn’t streaming hard to deal with failures?
session
windows
event-timewindowse.g.,
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
1. Speed 2. History
3. Durability
stream
processor
3. Durability
Exactly-Once Data Processing
Traditionally streaming ~ lossy, approximate processing
This is no longer true. Forget the ‘lambda architecture’.
• Input records are durably stored and indexed in logs (e.g., Kafka)
• Systems handle state snapshotting & transactions with external
stores transparently.
• Idempontent and transactional writes to external stores
part 1 part 2 part 3 part 4
on Flink each stream computation either completes or repeats
e.g.
3. Durability
Exactly-Once Data Processing
input
streams
application
states
stream
processor
rollback
complex event proc
fast approximate
streaming
ETL
event logs
production
database
4 Aspects of Data Processing
rules
data warehouses
+ historical data
application
state
+
failover
“microservices"
complex analytics
large-scale
processing systems
interactive
queries
data science
reports
dev
user analyst
data engineer
1. Speed 2. History
3. Durability
stream
processor
4. Interactivity
4. Interactivity
Querying Data Processing State
Stream Processor ~ Inverse DBMS
Application state holds fresh knowledge we want to query:
• In some systems (e.g. Kafka-Streams) we can use the changelog
• In other systems (i.e., Flink) we can query the state externally…or
stream queries on custom query processor on-top of them*
Alice
Bob? Bob=…
*https://techblog.king.com/rbea-scalable-real-time-analytics-king/
4 Aspects of Data Processing
1. Speed 2. History
3. Durability 4. Interactivity
stream
processor
• no physical blocking/staging
• no rescheduling
• efficient pipelining
• copy-on-write data structures
• different notions of time
• flexible stateful processing
• high throughput
• durable input logging is a standard
• automated state management
• exactly-once processing
• output commit & Idempotency
• external access to state/
changelogs
• ability to ‘stream queries’ over state
@SenorCarbone
Try out Stream Processing
https://flink.apache.org/
https://kafka.apache.org/
https://beam.apache.org/

More Related Content

What's hot

Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Flink Forward
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Flink Forward
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Flink Forward
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
Stephan Ewen
 

What's hot (20)

Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache FlinkTill Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
Till Rohrmann – Fault Tolerance and Job Recovery in Apache Flink
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Stateful stream processing with Apache Flink
Stateful stream processing with Apache FlinkStateful stream processing with Apache Flink
Stateful stream processing with Apache Flink
 
Stream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming eraStream Loops on Flink - Reinventing the wheel for the streaming era
Stream Loops on Flink - Reinventing the wheel for the streaming era
 
Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.Introduction to Stateful Stream Processing with Apache Flink.
Introduction to Stateful Stream Processing with Apache Flink.
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
Keynote: Building and Operating A Serverless Streaming Runtime for Apache Bea...
 
Apache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World LondonApache Flink@ Strata & Hadoop World London
Apache Flink@ Strata & Hadoop World London
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
Francesco Versaci - Flink in genomics - efficient and scalable processing of ...
 
Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 

Similar to Reintroducing the Stream Processor: A universal tool for continuous data analysis

The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
Databricks
 

Similar to Reintroducing the Stream Processor: A universal tool for continuous data analysis (20)

Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
Modern Stream Processing With Apache Flink @ GOTO Berlin 2017
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and MLContinuous Intelligence - Intersecting Event-Based Business Logic and ML
Continuous Intelligence - Intersecting Event-Based Business Logic and ML
 
Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex Big Data Berlin v8.0 Stream Processing with Apache Apex
Big Data Berlin v8.0 Stream Processing with Apache Apex
 
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
Thomas Weise, Apache Apex PMC Member and Architect/Co-Founder, DataTorrent - ...
 
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
The Future of Data Science and Machine Learning at Scale: A Look at MLflow, D...
 
[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL[WSO2Con EU 2018] The Rise of Streaming SQL
[WSO2Con EU 2018] The Rise of Streaming SQL
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Big Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICSBig Data Analytics Platforms by KTH and RISE SICS
Big Data Analytics Platforms by KTH and RISE SICS
 
Data Infrastructure at LinkedIn
Data Infrastructure at LinkedInData Infrastructure at LinkedIn
Data Infrastructure at LinkedIn
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Streaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache KafkaStreaming Data Ingest and Processing with Apache Kafka
Streaming Data Ingest and Processing with Apache Kafka
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Using Perforce Data in Development at Tableau
Using Perforce Data in Development at TableauUsing Perforce Data in Development at Tableau
Using Perforce Data in Development at Tableau
 
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
[DSC Europe 23] Pramod Immaneni - Real-time analytics at IoT scale
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Lessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics PlatformLessons Learned from Modernizing USCIS Data Analytics Platform
Lessons Learned from Modernizing USCIS Data Analytics Platform
 
Stream Processing – Concepts and Frameworks
Stream Processing – Concepts and FrameworksStream Processing – Concepts and Frameworks
Stream Processing – Concepts and Frameworks
 

Recently uploaded

Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Recently uploaded (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 

Reintroducing the Stream Processor: A universal tool for continuous data analysis

  • 1. Re-introducing the Stream ProcessorA Universal Tool for Continuous Data Analytical Needs A Universal Tool for Continuous Data Analysis Paris Carbone Committer @ Apache Flink PhD Candidate @ KTH
  • 2. Data Stream Processors Data Stream Processor can set up any data pipeline for you http://edge.alluremedia.com.au/m/l/2014/10/CoolingPipes.jpg
  • 3. Is this really a step forward in data processing? A growing open-source ecosystem: kafkaflink beam apex e.g. General Idea of the tech: • Processes pipeline computation in a cluster • Computation is continuous and parallel (like data) • Event-processing logic <-> Application state • It’s production-ready and aims to simplify analytics Data Stream Processors streams
  • 4. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer
  • 5. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer
  • 6. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer 1. Speed stream processor
  • 7. 1. Speed Low-Latency Data Processing Traditionally the sole reason stream processing was used • No intermediate scheduling (you let it run) • No physical blocking (pre-compute on the go) • Copy-on-write for state and output How do stream processors achieve low latency? But Is this is only relevant for live data? CEP semantics etc. are nowadays provided as additional libraries for stream processors
  • 8. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer 1. Speed 2. History stream processor
  • 9. 2. History Offline Data Processing It is possible and better over bulk historical data analysis • Ability to define custom state to build up models • Large-scale support is a given (inherits cluster computing benefits) • Separation of notions of time and out-of-order processing What can stream processors do for historical data? But isn’t streaming hard to deal with failures? session windows event-timewindowse.g.,
  • 10. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer 1. Speed 2. History 3. Durability stream processor
  • 11. 3. Durability Exactly-Once Data Processing Traditionally streaming ~ lossy, approximate processing This is no longer true. Forget the ‘lambda architecture’. • Input records are durably stored and indexed in logs (e.g., Kafka) • Systems handle state snapshotting & transactions with external stores transparently. • Idempontent and transactional writes to external stores part 1 part 2 part 3 part 4 on Flink each stream computation either completes or repeats e.g.
  • 12. 3. Durability Exactly-Once Data Processing input streams application states stream processor rollback
  • 13. complex event proc fast approximate streaming ETL event logs production database 4 Aspects of Data Processing rules data warehouses + historical data application state + failover “microservices" complex analytics large-scale processing systems interactive queries data science reports dev user analyst data engineer 1. Speed 2. History 3. Durability stream processor 4. Interactivity
  • 14. 4. Interactivity Querying Data Processing State Stream Processor ~ Inverse DBMS Application state holds fresh knowledge we want to query: • In some systems (e.g. Kafka-Streams) we can use the changelog • In other systems (i.e., Flink) we can query the state externally…or stream queries on custom query processor on-top of them* Alice Bob? Bob=… *https://techblog.king.com/rbea-scalable-real-time-analytics-king/
  • 15. 4 Aspects of Data Processing 1. Speed 2. History 3. Durability 4. Interactivity stream processor • no physical blocking/staging • no rescheduling • efficient pipelining • copy-on-write data structures • different notions of time • flexible stateful processing • high throughput • durable input logging is a standard • automated state management • exactly-once processing • output commit & Idempotency • external access to state/ changelogs • ability to ‘stream queries’ over state
  • 16. @SenorCarbone Try out Stream Processing https://flink.apache.org/ https://kafka.apache.org/ https://beam.apache.org/