SlideShare una empresa de Scribd logo
1 de 30
Till Rohrmann
trohrmann@apache.org
@stsffap
Unifying Stream SQL and CEP
for Declarative Stream
Processing with Apache Flink
2
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Streams are Everywhere
 Most data is continuously produced as stream
 Processing data as it arrives
is becoming very popular
 Many diverse applications
and use cases
3
Batch Analytics
4
 The batch approach to data analytics
Streaming Analytics
 Online aggregation of streams
• No delay – Continuous results
 Stream analytics subsumes batch analytics
• Batch is a finite stream
 Demanding requirements on stream processor
• High throughput
• Exactly-once semantics & event-time support
• Advanced window support
5
Complex Event Processing
 Analyzing a stream of events and drawing conclusions
• Detect patterns and assemble new events
 Applications
• Network intrusion
• Process monitoring
• Algorithmic trading
 Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
6
Apache Flink®
 Platform for scalable stream processing
 Meets requirements of CEP and stream analytics
• Low latency and high throughput
• Exactly-once semantics
• Event-time support
• Advanced windowing
 Core DataStream API available for Java & Scala
7
Tracking an Order Process
Use Case
8
Order Process
9
Order Events
 Process is reflected in a stream of order events
 Order(orderId, tStamp, “received”)
 Shipment(orderId, tStamp, “shipped”)
 Delivery(orderId, tStamp, “delivered”)
 orderId: Identifies the order
 tStamp: Time at which the event happened
10
Aggregating Massive Streams
Stream Analytics
11
Stream Analytics
 Traditional batch analytics
• Repeated queries on finite and changing data sets
• Queries join and aggregate large data sets
 Stream analytics
• “Standing” query produces continuous results
from infinite input stream
• Query computes aggregates on high-volume streams
 How to compute aggregates on infinite streams?
12
Compute Aggregates on Streams
 Split infinite stream into finite “windows”
• Split usually by time
 Tumbling windows
• Fixed size & consecutive
 Sliding windows
• Fixed size & may overlap
 Event time mandatory for correct & consistent results!
13
Example: Count Orders by Hour
14
Example: Count Orders by Hour
15
SELECT
TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour,
COUNT(*) AS cnt
FROM events
WHERE
status = ‘received’
GROUP BY
TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
Stream SQL Architecture
 Flink features SQL on static
and streaming tables
 Parsing and optimization by
Apache Calcite
 SQL queries are translated
into native Flink programs
16
Pattern Matching on Streams
Complex Event Processing
17
Real-time Warnings
18
CEP to the Rescue
 Define processing and delivery intervals (SLAs)
 ProcessSucc(orderId, tStamp, duration)
 ProcessWarn(orderId, tStamp)
 DeliverySucc(orderId, tStamp, duration)
 DeliveryWarn(orderId, tStamp)
 orderId: Identifies the order
 tStamp: Time when the event happened
 duration: Duration of the processing/delivery
19
CEP Example
20
Processing: Order  Shipment
21
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
… and both at the same time!
Integrated Stream Analytics with CEP
22
Count Delayed Shipments
23
Compute Avg Processing Time
24
CEP + Stream SQL
25
// complex event processing result
val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption)
val deliveryWarningTable: Table = delWarn.toTable(tableEnv)
tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable)
// calculate the delayed deliveries per day
val delayedDeliveriesPerDay = tableEnv.sql(
"""SELECT
| TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day,
| COUNT(*) AS cnt
|FROM deliveryWarnings
|GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
CEP-enriched Stream SQL
26
SELECT
TUMBLE_START(tStamp, INTERVAL '1' DAY) as day,
AVG(duration) as avgDuration
FROM (
// CEP pattern
SELECT duration, tStamp
FROM inputs MATCH_RECOGNIZE (
PARTITION BY orderId ORDER BY tStamp
MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp
PATTERN (START OTHER* END)
INTERVAL '1' HOUR
DEFINE
START AS START.status = ’received’,
END AS END.status = ‘shipped’
)
)
GROUP BY
TUMBLE(tStamp, INTERVAL '1' DAY)
Conclusion
 Apache Flink handles CEP and analytical
workloads
 Apache Flink offers intuitive APIs
 New class of applications by CEP and
Stream SQL integration 
27
2
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
29
Stream Processing
and Apache Flink®'s
approach to it
@StephanEwen
Apache Flink PMC
CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN
SEPTEMBER11-13, 2017
BERLIN.FLINK-FORWARD.ORG -
We are hiring!
data-artisans.com/careers

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Druid deep dive
Druid deep diveDruid deep dive
Druid deep dive
 

Similar a Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink

Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 

Similar a Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink (20)

Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?Streaming Analytics & CEP - Two sides of the same coin?
Streaming Analytics & CEP - Two sides of the same coin?
 
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Flink System Overview
Flink System OverviewFlink System Overview
Flink System Overview
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 ActsUnlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Concurrency
ConcurrencyConcurrency
Concurrency
 

Más de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink

  • 1. Till Rohrmann trohrmann@apache.org @stsffap Unifying Stream SQL and CEP for Declarative Stream Processing with Apache Flink
  • 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 3. Streams are Everywhere  Most data is continuously produced as stream  Processing data as it arrives is becoming very popular  Many diverse applications and use cases 3
  • 4. Batch Analytics 4  The batch approach to data analytics
  • 5. Streaming Analytics  Online aggregation of streams • No delay – Continuous results  Stream analytics subsumes batch analytics • Batch is a finite stream  Demanding requirements on stream processor • High throughput • Exactly-once semantics & event-time support • Advanced window support 5
  • 6. Complex Event Processing  Analyzing a stream of events and drawing conclusions • Detect patterns and assemble new events  Applications • Network intrusion • Process monitoring • Algorithmic trading  Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support 6
  • 7. Apache Flink®  Platform for scalable stream processing  Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time support • Advanced windowing  Core DataStream API available for Java & Scala 7
  • 8. Tracking an Order Process Use Case 8
  • 10. Order Events  Process is reflected in a stream of order events  Order(orderId, tStamp, “received”)  Shipment(orderId, tStamp, “shipped”)  Delivery(orderId, tStamp, “delivered”)  orderId: Identifies the order  tStamp: Time at which the event happened 10
  • 12. Stream Analytics  Traditional batch analytics • Repeated queries on finite and changing data sets • Queries join and aggregate large data sets  Stream analytics • “Standing” query produces continuous results from infinite input stream • Query computes aggregates on high-volume streams  How to compute aggregates on infinite streams? 12
  • 13. Compute Aggregates on Streams  Split infinite stream into finite “windows” • Split usually by time  Tumbling windows • Fixed size & consecutive  Sliding windows • Fixed size & may overlap  Event time mandatory for correct & consistent results! 13
  • 14. Example: Count Orders by Hour 14
  • 15. Example: Count Orders by Hour 15 SELECT TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  • 16. Stream SQL Architecture  Flink features SQL on static and streaming tables  Parsing and optimization by Apache Calcite  SQL queries are translated into native Flink programs 16
  • 17. Pattern Matching on Streams Complex Event Processing 17
  • 19. CEP to the Rescue  Define processing and delivery intervals (SLAs)  ProcessSucc(orderId, tStamp, duration)  ProcessWarn(orderId, tStamp)  DeliverySucc(orderId, tStamp, duration)  DeliveryWarn(orderId, tStamp)  orderId: Identifies the order  tStamp: Time when the event happened  duration: Duration of the processing/delivery 19
  • 21. Processing: Order  Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  • 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  • 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  • 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT duration, tStamp FROM inputs MATCH_RECOGNIZE ( PARTITION BY orderId ORDER BY tStamp MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp PATTERN (START OTHER* END) INTERVAL '1' HOUR DEFINE START AS START.status = ’received’, END AS END.status = ‘shipped’ ) ) GROUP BY TUMBLE(tStamp, INTERVAL '1' DAY)
  • 27. Conclusion  Apache Flink handles CEP and analytical workloads  Apache Flink offers intuitive APIs  New class of applications by CEP and Stream SQL integration  27
  • 29. 29 Stream Processing and Apache Flink®'s approach to it @StephanEwen Apache Flink PMC CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN SEPTEMBER11-13, 2017 BERLIN.FLINK-FORWARD.ORG -