SlideShare una empresa de Scribd logo
Till Rohrmann
trohrmann@apache.org
@stsffap
Unifying Stream SQL and CEP
for Declarative Stream
Processing with Apache Flink
2
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Streams are Everywhere
 Most data is continuously produced as stream
 Processing data as it arrives
is becoming very popular
 Many diverse applications
and use cases
3
Batch Analytics
4
 The batch approach to data analytics
Streaming Analytics
 Online aggregation of streams
• No delay – Continuous results
 Stream analytics subsumes batch analytics
• Batch is a finite stream
 Demanding requirements on stream processor
• High throughput
• Exactly-once semantics & event-time support
• Advanced window support
5
Complex Event Processing
 Analyzing a stream of events and drawing conclusions
• Detect patterns and assemble new events
 Applications
• Network intrusion
• Process monitoring
• Algorithmic trading
 Demanding requirements on stream processor
• Low latency!
• Exactly-once semantics & event-time support
6
Apache Flink®
 Platform for scalable stream processing
 Meets requirements of CEP and stream analytics
• Low latency and high throughput
• Exactly-once semantics
• Event-time support
• Advanced windowing
 Core DataStream API available for Java & Scala
7
Tracking an Order Process
Use Case
8
Order Process
9
Order Events
 Process is reflected in a stream of order events
 Order(orderId, tStamp, “received”)
 Shipment(orderId, tStamp, “shipped”)
 Delivery(orderId, tStamp, “delivered”)
 orderId: Identifies the order
 tStamp: Time at which the event happened
10
Aggregating Massive Streams
Stream Analytics
11
Stream Analytics
 Traditional batch analytics
• Repeated queries on finite and changing data sets
• Queries join and aggregate large data sets
 Stream analytics
• “Standing” query produces continuous results
from infinite input stream
• Query computes aggregates on high-volume streams
 How to compute aggregates on infinite streams?
12
Compute Aggregates on Streams
 Split infinite stream into finite “windows”
• Split usually by time
 Tumbling windows
• Fixed size & consecutive
 Sliding windows
• Fixed size & may overlap
 Event time mandatory for correct & consistent results!
13
Example: Count Orders by Hour
14
Example: Count Orders by Hour
15
SELECT
TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour,
COUNT(*) AS cnt
FROM events
WHERE
status = ‘received’
GROUP BY
TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
Stream SQL Architecture
 Flink features SQL on static
and streaming tables
 Parsing and optimization by
Apache Calcite
 SQL queries are translated
into native Flink programs
16
Pattern Matching on Streams
Complex Event Processing
17
Real-time Warnings
18
CEP to the Rescue
 Define processing and delivery intervals (SLAs)
 ProcessSucc(orderId, tStamp, duration)
 ProcessWarn(orderId, tStamp)
 DeliverySucc(orderId, tStamp, duration)
 DeliveryWarn(orderId, tStamp)
 orderId: Identifies the order
 tStamp: Time when the event happened
 duration: Duration of the processing/delivery
19
CEP Example
20
Processing: Order  Shipment
21
val processingPattern = Pattern
.begin[Event]("received").subtype(classOf[Order])
.followedBy("shipped").where(_.status == "shipped")
.within(Time.hours(1))
val processingPatternStream = CEP.pattern(
input.keyBy("orderId"),
processingPattern)
val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] =
processingPatternStream.select {
(pP, timestamp) => // Timeout handler
ProcessWarn(pP("received").orderId, timestamp)
} {
fP => // Select function
ProcessSucc(
fP("received").orderId, fP("shipped").tStamp,
fP("shipped").tStamp – fP("received").tStamp)
}
… and both at the same time!
Integrated Stream Analytics with CEP
22
Count Delayed Shipments
23
Compute Avg Processing Time
24
CEP + Stream SQL
25
// complex event processing result
val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = …
val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption)
val deliveryWarningTable: Table = delWarn.toTable(tableEnv)
tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable)
// calculate the delayed deliveries per day
val delayedDeliveriesPerDay = tableEnv.sql(
"""SELECT
| TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day,
| COUNT(*) AS cnt
|FROM deliveryWarnings
|GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
CEP-enriched Stream SQL
26
SELECT
TUMBLE_START(tStamp, INTERVAL '1' DAY) as day,
AVG(duration) as avgDuration
FROM (
// CEP pattern
SELECT duration, tStamp
FROM inputs MATCH_RECOGNIZE (
PARTITION BY orderId ORDER BY tStamp
MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp
PATTERN (START OTHER* END)
INTERVAL '1' HOUR
DEFINE
START AS START.status = ’received’,
END AS END.status = ‘shipped’
)
)
GROUP BY
TUMBLE(tStamp, INTERVAL '1' DAY)
Conclusion
 Apache Flink handles CEP and analytical
workloads
 Apache Flink offers intuitive APIs
 New class of applications by CEP and
Stream SQL integration 
27
2
Thank you!
@stsffap
@ApacheFlink
@dataArtisans
29
Stream Processing
and Apache Flink®'s
approach to it
@StephanEwen
Apache Flink PMC
CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN
SEPTEMBER11-13, 2017
BERLIN.FLINK-FORWARD.ORG -
We are hiring!
data-artisans.com/careers

Más contenido relacionado

La actualidad más candente

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang
 

La actualidad más candente (20)

Flink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at AlibabaFlink SQL & TableAPI in Large Scale Production at Alibaba
Flink SQL & TableAPI in Large Scale Production at Alibaba
 
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
What is the State of my Kafka Streams Application? Unleashing Metrics. | Neil...
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?Kafka as an event store - is it good enough?
Kafka as an event store - is it good enough?
 
Spark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New YorkSpark Autotuning Talk - Strata New York
Spark Autotuning Talk - Strata New York
 
Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)Distributed Counters in Cassandra (Cassandra Summit 2010)
Distributed Counters in Cassandra (Cassandra Summit 2010)
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Flink Complex Event Processing
Flink Complex Event ProcessingFlink Complex Event Processing
Flink Complex Event Processing
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Apache Kafka : Monitoring vs Alerting
Apache Kafka : Monitoring vs AlertingApache Kafka : Monitoring vs Alerting
Apache Kafka : Monitoring vs Alerting
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
Spark Operator—Deploy, Manage and Monitor Spark clusters on Kubernetes
 
Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)Stream processing with Apache Flink (Timo Walther - Ververica)
Stream processing with Apache Flink (Timo Walther - Ververica)
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Flink history, roadmap and vision
Flink history, roadmap and visionFlink history, roadmap and vision
Flink history, roadmap and vision
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 

Similar a Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink

Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
Thomas Weise
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
DataWorks Summit
 

Similar a Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink (20)

Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
Fabian Hueske_Till Rohrmann - Declarative stream processing with StreamSQL an...
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Apache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya MeetupApache Flink @ Tel Aviv / Herzliya Meetup
Apache Flink @ Tel Aviv / Herzliya Meetup
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Zurich Flink Meetup
Zurich Flink MeetupZurich Flink Meetup
Zurich Flink Meetup
 
From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017From Batch to Streaming with Apache Apex Dataworks Summit 2017
From Batch to Streaming with Apache Apex Dataworks Summit 2017
 
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
From Batch to Streaming ET(L) with Apache Apex at Berlin Buzzwords 2017
 
Flink System Overview
Flink System OverviewFlink System Overview
Flink System Overview
 
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 ActsUnlocking the Power of Apache Flink: An Introduction in 4 Acts
Unlocking the Power of Apache Flink: An Introduction in 4 Acts
 
From Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache ApexFrom Batch to Streaming ET(L) with Apache Apex
From Batch to Streaming ET(L) with Apache Apex
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
Stream Processing Overview
Stream Processing OverviewStream Processing Overview
Stream Processing Overview
 
Apache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and FriendsApache Flink Overview at SF Spark and Friends
Apache Flink Overview at SF Spark and Friends
 
Concurrency
ConcurrencyConcurrency
Concurrency
 
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and TransformIntro to Apache Apex - Next Gen Platform for Ingest and Transform
Intro to Apache Apex - Next Gen Platform for Ingest and Transform
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 

Más de DataWorks Summit/Hadoop Summit

How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 

Más de DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Último

Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Último (20)

Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Designing for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at ComcastDesigning for Hardware Accessibility at Comcast
Designing for Hardware Accessibility at Comcast
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1UiPath Test Automation using UiPath Test Suite series, part 1
UiPath Test Automation using UiPath Test Suite series, part 1
 
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
Connecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAKConnecting the Dots in Product Design at KAYAK
Connecting the Dots in Product Design at KAYAK
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
FDO for Camera, Sensor and Networking Device – Commercial Solutions from VinC...
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 

Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink

  • 1. Till Rohrmann trohrmann@apache.org @stsffap Unifying Stream SQL and CEP for Declarative Stream Processing with Apache Flink
  • 2. 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 3. Streams are Everywhere  Most data is continuously produced as stream  Processing data as it arrives is becoming very popular  Many diverse applications and use cases 3
  • 4. Batch Analytics 4  The batch approach to data analytics
  • 5. Streaming Analytics  Online aggregation of streams • No delay – Continuous results  Stream analytics subsumes batch analytics • Batch is a finite stream  Demanding requirements on stream processor • High throughput • Exactly-once semantics & event-time support • Advanced window support 5
  • 6. Complex Event Processing  Analyzing a stream of events and drawing conclusions • Detect patterns and assemble new events  Applications • Network intrusion • Process monitoring • Algorithmic trading  Demanding requirements on stream processor • Low latency! • Exactly-once semantics & event-time support 6
  • 7. Apache Flink®  Platform for scalable stream processing  Meets requirements of CEP and stream analytics • Low latency and high throughput • Exactly-once semantics • Event-time support • Advanced windowing  Core DataStream API available for Java & Scala 7
  • 8. Tracking an Order Process Use Case 8
  • 10. Order Events  Process is reflected in a stream of order events  Order(orderId, tStamp, “received”)  Shipment(orderId, tStamp, “shipped”)  Delivery(orderId, tStamp, “delivered”)  orderId: Identifies the order  tStamp: Time at which the event happened 10
  • 12. Stream Analytics  Traditional batch analytics • Repeated queries on finite and changing data sets • Queries join and aggregate large data sets  Stream analytics • “Standing” query produces continuous results from infinite input stream • Query computes aggregates on high-volume streams  How to compute aggregates on infinite streams? 12
  • 13. Compute Aggregates on Streams  Split infinite stream into finite “windows” • Split usually by time  Tumbling windows • Fixed size & consecutive  Sliding windows • Fixed size & may overlap  Event time mandatory for correct & consistent results! 13
  • 14. Example: Count Orders by Hour 14
  • 15. Example: Count Orders by Hour 15 SELECT TUMBLE_START(tStamp, INTERVAL ‘1’ HOUR) AS hour, COUNT(*) AS cnt FROM events WHERE status = ‘received’ GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ HOUR)
  • 16. Stream SQL Architecture  Flink features SQL on static and streaming tables  Parsing and optimization by Apache Calcite  SQL queries are translated into native Flink programs 16
  • 17. Pattern Matching on Streams Complex Event Processing 17
  • 19. CEP to the Rescue  Define processing and delivery intervals (SLAs)  ProcessSucc(orderId, tStamp, duration)  ProcessWarn(orderId, tStamp)  DeliverySucc(orderId, tStamp, duration)  DeliveryWarn(orderId, tStamp)  orderId: Identifies the order  tStamp: Time when the event happened  duration: Duration of the processing/delivery 19
  • 21. Processing: Order  Shipment 21 val processingPattern = Pattern .begin[Event]("received").subtype(classOf[Order]) .followedBy("shipped").where(_.status == "shipped") .within(Time.hours(1)) val processingPatternStream = CEP.pattern( input.keyBy("orderId"), processingPattern) val procResult: DataStream[Either[ProcessWarn, ProcessSucc]] = processingPatternStream.select { (pP, timestamp) => // Timeout handler ProcessWarn(pP("received").orderId, timestamp) } { fP => // Select function ProcessSucc( fP("received").orderId, fP("shipped").tStamp, fP("shipped").tStamp – fP("received").tStamp) }
  • 22. … and both at the same time! Integrated Stream Analytics with CEP 22
  • 25. CEP + Stream SQL 25 // complex event processing result val delResult: DataStream[Either[DeliveryWarn, DeliverySucc]] = … val delWarn: DataStream[DeliveryWarn] = delResult.flatMap(_.left.toOption) val deliveryWarningTable: Table = delWarn.toTable(tableEnv) tableEnv.registerTable(”deliveryWarnings”, deliveryWarningTable) // calculate the delayed deliveries per day val delayedDeliveriesPerDay = tableEnv.sql( """SELECT | TUMBLE_START(tStamp, INTERVAL ‘1’ DAY) AS day, | COUNT(*) AS cnt |FROM deliveryWarnings |GROUP BY TUMBLE(tStamp, INTERVAL ‘1’ DAY)""".stripMargin)
  • 26. CEP-enriched Stream SQL 26 SELECT TUMBLE_START(tStamp, INTERVAL '1' DAY) as day, AVG(duration) as avgDuration FROM ( // CEP pattern SELECT duration, tStamp FROM inputs MATCH_RECOGNIZE ( PARTITION BY orderId ORDER BY tStamp MEASURES END.tStamp – START.tStamp as duration, END.tStamp as tStamp PATTERN (START OTHER* END) INTERVAL '1' HOUR DEFINE START AS START.status = ’received’, END AS END.status = ‘shipped’ ) ) GROUP BY TUMBLE(tStamp, INTERVAL '1' DAY)
  • 27. Conclusion  Apache Flink handles CEP and analytical workloads  Apache Flink offers intuitive APIs  New class of applications by CEP and Stream SQL integration  27
  • 29. 29 Stream Processing and Apache Flink®'s approach to it @StephanEwen Apache Flink PMC CTO @ data ArtisansFLINKFORWARD IS COMING BACKTO BERLIN SEPTEMBER11-13, 2017 BERLIN.FLINK-FORWARD.ORG -