SlideShare una empresa de Scribd logo
1 de 51
Predictive Maintenance
with
Deep Learning and Flink .
Dongwon Kim, PhD
Solution R&D center
SK Telecom
• Big Data processing engines
• MapReduce, Tez, Spark, Flink, Hive, Presto
• Recent interest
• Deep learning model serving (TensorFlow serving)
• Containerization (Docker, Kubernetes)
• Time-series data (InfluxDB, Prometheus, Grafana)
• Flink Forward 2015 Berlin
• A Comparative Performance Evaluation of Flink
About me (@eastcirclek)
Covered
in this talk
Refinery and semiconductor companies depend on equipment
Breakdown of equipment largely affects company profit
Equipment maintenance to minimize breakdown
Breakdown
Planned Maintenance
Shutdown equipment on a regular basis for
parts replacement, cleaning, adjustments, etc.
Time
2015.07 2016.07 2017.07
Predictive Maintenance (PdM)
Unplanned maintenance based on
prediction of equipment condition
Predictive
Maintenance
Planned
Maintenance
...
<equipment sensors> <predictive maintenance system>
Our approach to Predictive Maintenance
Better prediction of equipment condition
using Deep Learning
PdM
Planned
Maintenance
Breakdown
PdM
Planned
Maintenance
Breakdown
Machine Learning
& Statistics
Machine learning
& Statistics
+ Deep Learning
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Role Toolbox
My team consists of two groups
* The diagram is based on Ted Dunning’s slides (Flink Forward 2017 SF)
Data Engineers
Model Developers
*
history
data
training
DNN
model
*
live
data
prediction
&
alarm
DNN
model
Model developers give Convolutional LSTM to engineers
CNN
RNN
(LSTM)
...
...
Output (Ŷ)
expected sensor values
after 2 days
U
W3 W3
U
W3
O
U
W2
W1
Ŷ
Input (X)
1 week time-series (10080 records)
Model developer
It does not return whether
the equipment breaks down
after 2 days
...
* assuming one minute sampling rate
Data engineers apply Convolutional LSTM to live sensor data
Sensors Vector
y1
y2
y3
ym
......
Equipment
We have multi-sensor data
Data engineers apply Convolutional LSTM to live sensor data
Multi-sensor data arrive
at a fixed interval of 1 minute
Sensors Vector
y1
y2
y3
ym
......
Equipment
...
Timeline
...
Data engineers apply Convolutional LSTM to live sensor data
X (10080 records)X (10080 records)X (10080 records)X (10080 records)
We maintain a count window
over the latest 10080 records
It slides as a new record arrives
(a sliding count window)
Timeline
Vector
y1
y2
y3
ym
...
Sensors
...
Equipment
Data engineers apply Convolutional LSTM to live sensor data
2 days
...
Y
y1
y2
y3
ym
...
Ŷ
ŷ1
ŷ2
ŷ3
ŷm
...
Given one week time-series (X),
DNN returns predicted values
after 2 days (Ŷ)
Ŷ
Raise an alarm
if the distance of two vectors is
above a defined threshold
...U
W3 W3
U
W3
U
W2
W1
O
...
CNN
RNN
(LSTM)
X (10080 records)
Timeline
Whenever the sliding window moves,
we apply Convolutional LSTM to ....
Stream
source
Sliding
count
window
SinkScoreJoin
Desired streaming topology by data engineers
Apply DNN to X Ŷ
...
timeline
...
X
Requirement 1
Count window to maintain
10080 records instead of
1 week event-time window
Requirement 2
Joining of two streams
on event time
(Rendezvous)
Prediction
stream
Ŷ
Outlier
filter
Y
Measurement
stream
Proof-of-Concept of Spark Structured Streaming
Types
Score
Streaming
Dataset
Input
Streaming
Dataset
Prediction
Streaming
Dataset
joining of two streams
apply DNN to 1 week time-series,
not 10080 records
(Sliding count window is not supported)
generate an input stream
from local files
Inner join between two streaming Datasets
is not supported
in Spark Structured Streaming
Score
Dataset
Unsupported Operations on Spark Structured Streaming (v2.2)
Requirement 1
Count window to maintain
10080 records instead of
1 week event-time window
Requirement 2
Joining of two streams
on event time
(Rendezvous)
That’s why we move to Flink DataStream API
• Sliding count window : not supported
• Joining of two streams : not supported
• Micro-batch behind the scene
• Continuous processing proposed in SPARK-20928
• Sliding count window : supported
• Joining of two streams : supported
• Scalability and performance proved by other use cases
Spark Structured Streaming
Flink DataStream API
* it could be possible to use our Convolutional LSTM model using Spark Structured Stream in some other way
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
Sink
Flink can faithfully implement our streaming topology design
Stream
source
Sliding count
window
SinkScoreJoin
Apply DNN to X Ŷ
...
...
X
Y
Ŷ
Outlier
filter
<Topology design> <Flink implementation>
addSource process
countWindowAll
(custom evictor)
apply
assignTimestampsAndW
atermarks
join applywindow
...
... Ŷ
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
We read data
from MySQL
Data processing pipeline with Flink DataStream API
Stateful custom source to read from MySQL
• We assume that sensor data arrive in order
• Emit an input record and a watermark of the same time
• Increase lastTimestamp afterward (11:15  11:16)
• Exactly-once semantics
• Store lastTimestamp when taking a snapshot
• Restore lastTimestamp when restarted
addSource
lastTimestamp
(state)
2017-9-13 11:15
2017-9-13 11:16 [y1, y2, ..., y 𝑚]
JDBC Connection
SELECT timestamp, measured
FROM input
WHERE timestamp>$lastTimestamp
W
(11:16)
2017-9-13 11:13 [y1, y2, ..., y 𝑚]
2017-9-13 11:14
2017-9-13 11:15
timestamp measured
[y1, y2, ..., y 𝑚]
[y1, y2, ..., y 𝑚]
Input table in MySQL
2017-9-13 11:16 [y1, y2, ..., y 𝑚] @
11:16
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Prediction
Sink
Score
Sink
W(t) @t
filter out
outliers
maintain last N
elements
Emit outliers to
a side output
Event time window for 1 week
cannot guarantee 10080 records
as data can be missing or filtered
We define
a custom evictor
What if data is absent or filtered for a long period of time?
They look totally different!
3 missing days
We’d better start a new sliding window
for the time-series!
CustomEvictor.of(3, timeThreshold=4)
CustomEvictor evicts
all but the last one
when the last one occurs
after timeThreshold
CountEvictor.of(3)
CountEvictor evicts
elements beyond
its capacity
How to start a new sliding count window after a long break
timeline
1 2 3 4 95 6 7 8
no records for a while
CountTrigger.of(1)
fires every time
a record comes in
EvictingWindowOperator
adds a new input record to
InternalListState
92 3 4
2 3 4 9
2 3 4 9
Sliding count window
of size 3
We want to start a new window
after missing 4 timestamps
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
Working with model developers
They stick to using Python
They develop models using a Python library called Keras
I don’t want to use
Deeplearning4J
because that’s Java…
We use Keras on Python!
I want to develop our
solution on JVM!
Why don’t we
develop models using
Deeplearning4J?
How to load Keras models in Flink?
I don’t know how
to have it!
Loading Keras models in JVM
• Convert Keras models to TensorFlow SavedModel
• use tensorflow.python.saved_model.builder.SavedModelBuilder
• TensorFlow Java API (Flink TensorFlow)
• Do inference inside the JVM process
• TensorFlow Serving
• Do inference outside the JVM process
• Execute Keras through CPython inside JVM
• Do inference inside the JVM process
• Java Embedded Python (JEP) to ease the use of CPython
• https://github.com/ninia/jep
• Use KerasModelImport from Deeplearning4J
• Not mature enough
Comparison of approaches to use Keras models in JVM
TaskManager Process
RichWindowFunction
TensorFlow
Java API
Java Embedded
Python (JEP)
TaskManager Process
RichWindowFunction
TaskManager Process
TensorFlow
Serving
RichWindowFunction
TensorFlow
Native Library
TensorFlow Java API
(very thin wrapper)
Ŷ...
X
CPython
JEP Java object
JEP native code
Ŷ...
X
Saved
Model
Keras
model
Saved
Model
Keras
model
gRPC client
...
X
Ŷ
TFServing process
DynamicManager
Loader
SavedModel
v1
SavedModel
v2Keras
model
Execute Python commands
- import keras
- load a model & weights
- pass X and get Ŷ
Saved
Model
Comparison of runtime inference performance
TensorFlow Java API
77.7 milliseconds per inference
TensorFlow Serving
71.2 milliseconds per inference
Keras inside CPython
w/ TensorFlow backend
32 milliseconds per inference
(* Theano backend is extremely slow in our case)
(* We do not batch inference calls)
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
Score
Sink
W(t) @t
Input
sink
Tumbling
EventTimeWindows
(size = interval)
Joining two streams on event time
• At a certain time t,
• Y of timestamp t is arriving
• Ŷ of timestamp t+2d is arriving
• Ŷ of timestamp t has arrived two days ago
• TumblingEventTimeWindows.of( Time.seconds(timeUnit) )
• To maintain a window for a single pair of Y and Ŷ
• A window is triggered when watermarks from both streams have arrived
join windowMeasurement
stream
...
@t+2d
@t
@t
apply
assignTimestamps
AndWatermarks
Prediction
stream
Tumbling
EventTime
Windows
@tW(t)
Y
@t+2dW(t+2d)
Ŷ
+2
days
trigger!
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
Prediction
Sink
W(t) @t
Raise an alarm
if the distance of Ŷ and Y
goes beyond
a defined threshold
Input
sink
Score
Sink
Prediction
Sink
Input
sink
Data processing pipeline with Flink DataStream API
addSource process
countWindowAll
(custom evictor)
apply
assignTimestamps
AndWatermarks
join applywindow
...
timeline
...
X
Ŷ
Measurement
stream
Prediction
stream
+2
days
Outlier
sink
W(t) @t
Input, Prediction, Score sinks write records to InfluxDB
We then plot time-series using Grafana
Predicting from a single DNN is not enough!
Prediction from a single DNN Possibly biased predictionMeasurement
Prediction from
an ensemble of 10 DNNs
...
...
Measurement More reliable prediction!
mean
DNN ensemble for reliable prediction
Ŷ Ŷ Ŷ
Different Convolutional LSTMNs return slightly different prediction results
Ŷ
Timeline
Y
2 days
...
...
...
Raise an alarm
if the distance of two vectors
is above a defined threshold
X (one week time-series) ...
...
Data processing pipeline with Flink DataStream API
addSource process
join applywindow
Measurement
stream
Prediction
stream
Outlier
sink
Prediction
Sink
Score
Sink
Ŷ
…
DNN ensemble
Ŷ
Ŷ
Ŷ
Input
sink
how to implement our ensemble pipeline?
...
...
...
... ...
mean
Ŷ Ŷ Ŷ
Ŷ
…
Data processing pipeline with Flink DataStream API
addSource process
join applywindow
Measurement
stream
Prediction
stream
Outlier
sink
Prediction
Sink
Score
Sink
ŶapplykeyBy
countWindow
(custom evictor)
…
...
...
...
Ŷ
Ŷ
Ŷ
…
…
setParallelism(ensembleSize=10)
assign
Timestamps
And
Watermarks
…
Ŷ
Ŷ
Ŷ
setParallelism(1)
...
...
...
…
DNN ensemble
Ŷ
Ŷ
Ŷ
mean
Ŷ Ŷ Ŷ
Ŷ
flatMap
…
replicate
10 times
Input
sink
applywindowAll
Tumbling
EventTimeWindow
…
... ...
Distribute 10 keys evenly over 10 partitions
Carefully generate keys not to belong to the same partitions
flatMap keyBy
(murmurHash)
PARTITION_0KEY_0,
PARTITION_1KEY_1,
PARTITION_9KEY_9,
…
KEY_1, KEY_9,…
KEY_0,
replicate 10 times with different keys
PARTITION = murmurHash(KEY) / maxParallalism*(parallelism/maxParallism)
Contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Time-series prediction model
A simple software stack on top of Docker
Customer machine
MySQL
official image
Grafana
official image
Prometheus
official image
TensorFlow
Serving
official image
InfluxDB
official image
Docker engine
Flink
official image
No custom Docker image!
A single yml file is okay to deploy our software stack!
Launch JobManager & TaskManager with some changes
in the official repository of the Docker image for Flink
You need to get
flink-metrics-prometheus-1.4-SNAPSHOT.jar
by yourself
until Flink-1.4 is officially released
metrics.reporter : prom
metrics.reporter.prom.class :
org.apache.flink.metrics.prometheus.PrometheusReporter
jobmanager.heap.mb : 10240
taskmanager.heap.mb : 10240
* Every process runs inside a Docker container
Flink JobManager
TaskManager TaskManager TaskManager
Prometheus
Prometheus scrapes
HTTP endpoints of
metrics exporters
specified in configuration
Grafana
Solution
deployment
:9249/metrics
:9249/metrics
Flink runtime metrics
& Custom metrics
:9104/metrics
MySQLd Exporter
MySQL
metrics
CPU/Disk
Memory
Network
Servers
System
metrics
:9100/metrics
Node Exporter
:8080/metrics
Container
metrics
cAdvisor
TensorFlow
Serving
inference MySQL
source
InfluxDB
sink
Docker
Submits a Flink Job
to launch our pipeline
InfluxDB
Sensor
time-series
dashboard
Solution
monitoring
dashboard
* if using TFServing
Solution monitoring dashboard
* this dashboard is based on ”Docker Dashboard” by Brian Christner
Server
Mem/CPU/FS
usage
(by node exporter)
Container
CPU usage
(by cAdvisor)
Inference time
from each DNN
(custom metrics)
TaskManager
JVM memory
Usage
# records
written to sinks
(custom metrics)
Recap – contents
1
Why we use Flink
for our time-series prediction model
2
Flink pipeline design for
rendezvous and DNN ensemble
3
Solution packaging and monitoring
with Docker and Prometheus
Time-series prediction model
TF Serving
Grafana
Flink MySQL
PrometheusInfluxDB
Conclusion
• Flink helps us concentrate on the core logic
• DataStream API is just like a natural language in presenting streaming topologies
• flexible windowing mechanisms (count window and evictor)
• joining of two streams on event time
• Thanks to it, we can focus on
• implementation of custom source/sink to meet customer requirements
• interaction with DNN ensembles
• It has a nice ecosystem to help build a solution
• Docker
• Prometheus metric reporter
THE END
You cannot use TFServing with Flink-1.3.2
• Netty binary incompatbility
• flink-runtime_2.11:1.3.2
• depends on io.netty:4.0.27.Final
• grpc-netty:4.1.14
• depends on io.netty:4.1.14.Final
• You could use grpc-okhttp instead of grpc-netty
• grpc-okhttp conflicts with another library (influxdb client for Java)
• You can use TFServing with FLINK-1.4-SNAPSHOT
• [FLINK-7013] Add shaded netty dependency
• io.netty.* is recently relocated to org.apache.flink.shaded.netty4.*
From “New and noteworthy in 4.1” page of Netty project
4.1 contains multiple additions which might not be fully backward-compatible with 4.0
Looking forward to the official release of v1.4
No more 1.4-SNAPSHOT on the customer site!

Más contenido relacionado

La actualidad más candente

Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
confluent
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
Provectus
 

La actualidad más candente (20)

Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Introduction to Apache Beam
Introduction to Apache BeamIntroduction to Apache Beam
Introduction to Apache Beam
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan EwenAdvanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
Advanced Streaming Analytics with Apache Flink and Apache Kafka, Stephan Ewen
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Stream processing and managing real-time data
Stream processing and managing real-time dataStream processing and managing real-time data
Stream processing and managing real-time data
 
Enterprise Workloads on AWS
Enterprise Workloads on AWSEnterprise Workloads on AWS
Enterprise Workloads on AWS
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Unified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache SamzaUnified Batch & Stream Processing with Apache Samza
Unified Batch & Stream Processing with Apache Samza
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Empowering Your Java Applications with Quarkus. A New Era of Fast, Efficient,...
Empowering Your Java Applications with Quarkus. A New Era of Fast, Efficient,...Empowering Your Java Applications with Quarkus. A New Era of Fast, Efficient,...
Empowering Your Java Applications with Quarkus. A New Era of Fast, Efficient,...
 
Distributed Lock Manager
Distributed Lock ManagerDistributed Lock Manager
Distributed Lock Manager
 
Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®Introduction to KSQL: Streaming SQL for Apache Kafka®
Introduction to KSQL: Streaming SQL for Apache Kafka®
 
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn
 
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and ManageEnd to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
End to end Machine Learning using Kubeflow - Build, Train, Deploy and Manage
 
Docker Kubernetes Istio
Docker Kubernetes IstioDocker Kubernetes Istio
Docker Kubernetes Istio
 
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
Stream processing IoT time series data with Kafka & InfluxDB | Al Sargent, In...
 

Similar a Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink

Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 

Similar a Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink (20)

Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
So you think you can stream.pptx
So you think you can stream.pptxSo you think you can stream.pptx
So you think you can stream.pptx
 
A Brief History of Stream Processing
A Brief History of Stream ProcessingA Brief History of Stream Processing
A Brief History of Stream Processing
 
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon KinesisBDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
BDA403 How Netflix Monitors Applications in Real-time with Amazon Kinesis
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch AnalysisNoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
NoLambda: Combining Streaming, Ad-Hoc, Machine Learning and Batch Analysis
 
Influx data basic
Influx data basicInflux data basic
Influx data basic
 
Telegraph Cq English
Telegraph Cq EnglishTelegraph Cq English
Telegraph Cq English
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
From Trill to Quill and Beyond
From Trill to Quill and BeyondFrom Trill to Quill and Beyond
From Trill to Quill and Beyond
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
How Netflix Uses Amazon Kinesis Streams to Monitor and Optimize Large-scale N...
 
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
Data Transformations on Ops Metrics using Kafka Streams (Srividhya Ramachandr...
 
Stream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream SharingStream Processing with Flink and Stream Sharing
Stream Processing with Flink and Stream Sharing
 
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
Chronix: Long Term Storage and Retrieval Technology for Anomaly Detection in ...
 
strata_spark_streaming.ppt
strata_spark_streaming.pptstrata_spark_streaming.ppt
strata_spark_streaming.ppt
 
Chronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for PrometheusChronix as Long-Term Storage for Prometheus
Chronix as Long-Term Storage for Prometheus
 
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
Streaming data analytics (Kinesis, EMR/Spark) - Pop-up Loft Tel Aviv
 
Deep dive into spark streaming
Deep dive into spark streamingDeep dive into spark streaming
Deep dive into spark streaming
 

Más de Flink Forward

Más de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Welcome to the Flink Community!
Welcome to the Flink Community!Welcome to the Flink Community!
Welcome to the Flink Community!
 

Último

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Último (20)

Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 

Flink Forward Berlin 2017: Dongwon Kim - Predictive Maintenance with Apache Flink

  • 1. Predictive Maintenance with Deep Learning and Flink . Dongwon Kim, PhD Solution R&D center SK Telecom
  • 2. • Big Data processing engines • MapReduce, Tez, Spark, Flink, Hive, Presto • Recent interest • Deep learning model serving (TensorFlow serving) • Containerization (Docker, Kubernetes) • Time-series data (InfluxDB, Prometheus, Grafana) • Flink Forward 2015 Berlin • A Comparative Performance Evaluation of Flink About me (@eastcirclek) Covered in this talk
  • 3. Refinery and semiconductor companies depend on equipment Breakdown of equipment largely affects company profit
  • 4. Equipment maintenance to minimize breakdown Breakdown Planned Maintenance Shutdown equipment on a regular basis for parts replacement, cleaning, adjustments, etc. Time 2015.07 2016.07 2017.07 Predictive Maintenance (PdM) Unplanned maintenance based on prediction of equipment condition Predictive Maintenance Planned Maintenance ... <equipment sensors> <predictive maintenance system>
  • 5. Our approach to Predictive Maintenance Better prediction of equipment condition using Deep Learning PdM Planned Maintenance Breakdown PdM Planned Maintenance Breakdown Machine Learning & Statistics Machine learning & Statistics + Deep Learning
  • 6. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 7. Role Toolbox My team consists of two groups * The diagram is based on Ted Dunning’s slides (Flink Forward 2017 SF) Data Engineers Model Developers * history data training DNN model * live data prediction & alarm DNN model
  • 8. Model developers give Convolutional LSTM to engineers CNN RNN (LSTM) ... ... Output (Ŷ) expected sensor values after 2 days U W3 W3 U W3 O U W2 W1 Ŷ Input (X) 1 week time-series (10080 records) Model developer It does not return whether the equipment breaks down after 2 days ... * assuming one minute sampling rate
  • 9. Data engineers apply Convolutional LSTM to live sensor data Sensors Vector y1 y2 y3 ym ...... Equipment We have multi-sensor data
  • 10. Data engineers apply Convolutional LSTM to live sensor data Multi-sensor data arrive at a fixed interval of 1 minute Sensors Vector y1 y2 y3 ym ...... Equipment ... Timeline
  • 11. ... Data engineers apply Convolutional LSTM to live sensor data X (10080 records)X (10080 records)X (10080 records)X (10080 records) We maintain a count window over the latest 10080 records It slides as a new record arrives (a sliding count window) Timeline Vector y1 y2 y3 ym ... Sensors ... Equipment
  • 12. Data engineers apply Convolutional LSTM to live sensor data 2 days ... Y y1 y2 y3 ym ... Ŷ ŷ1 ŷ2 ŷ3 ŷm ... Given one week time-series (X), DNN returns predicted values after 2 days (Ŷ) Ŷ Raise an alarm if the distance of two vectors is above a defined threshold ...U W3 W3 U W3 U W2 W1 O ... CNN RNN (LSTM) X (10080 records) Timeline Whenever the sliding window moves, we apply Convolutional LSTM to ....
  • 13. Stream source Sliding count window SinkScoreJoin Desired streaming topology by data engineers Apply DNN to X Ŷ ... timeline ... X Requirement 1 Count window to maintain 10080 records instead of 1 week event-time window Requirement 2 Joining of two streams on event time (Rendezvous) Prediction stream Ŷ Outlier filter Y Measurement stream
  • 14. Proof-of-Concept of Spark Structured Streaming Types Score Streaming Dataset Input Streaming Dataset Prediction Streaming Dataset joining of two streams apply DNN to 1 week time-series, not 10080 records (Sliding count window is not supported) generate an input stream from local files
  • 15. Inner join between two streaming Datasets is not supported in Spark Structured Streaming Score Dataset
  • 16. Unsupported Operations on Spark Structured Streaming (v2.2) Requirement 1 Count window to maintain 10080 records instead of 1 week event-time window Requirement 2 Joining of two streams on event time (Rendezvous)
  • 17. That’s why we move to Flink DataStream API • Sliding count window : not supported • Joining of two streams : not supported • Micro-batch behind the scene • Continuous processing proposed in SPARK-20928 • Sliding count window : supported • Joining of two streams : supported • Scalability and performance proved by other use cases Spark Structured Streaming Flink DataStream API * it could be possible to use our Convolutional LSTM model using Spark Structured Stream in some other way
  • 18. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input Sink
  • 19. Flink can faithfully implement our streaming topology design Stream source Sliding count window SinkScoreJoin Apply DNN to X Ŷ ... ... X Y Ŷ Outlier filter <Topology design> <Flink implementation> addSource process countWindowAll (custom evictor) apply assignTimestampsAndW atermarks join applywindow ... ... Ŷ
  • 20. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 21. addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink We read data from MySQL Data processing pipeline with Flink DataStream API
  • 22. Stateful custom source to read from MySQL • We assume that sensor data arrive in order • Emit an input record and a watermark of the same time • Increase lastTimestamp afterward (11:15  11:16) • Exactly-once semantics • Store lastTimestamp when taking a snapshot • Restore lastTimestamp when restarted addSource lastTimestamp (state) 2017-9-13 11:15 2017-9-13 11:16 [y1, y2, ..., y 𝑚] JDBC Connection SELECT timestamp, measured FROM input WHERE timestamp>$lastTimestamp W (11:16) 2017-9-13 11:13 [y1, y2, ..., y 𝑚] 2017-9-13 11:14 2017-9-13 11:15 timestamp measured [y1, y2, ..., y 𝑚] [y1, y2, ..., y 𝑚] Input table in MySQL 2017-9-13 11:16 [y1, y2, ..., y 𝑚] @ 11:16
  • 23. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Prediction Sink Score Sink W(t) @t filter out outliers maintain last N elements Emit outliers to a side output Event time window for 1 week cannot guarantee 10080 records as data can be missing or filtered We define a custom evictor
  • 24. What if data is absent or filtered for a long period of time? They look totally different! 3 missing days We’d better start a new sliding window for the time-series!
  • 25. CustomEvictor.of(3, timeThreshold=4) CustomEvictor evicts all but the last one when the last one occurs after timeThreshold CountEvictor.of(3) CountEvictor evicts elements beyond its capacity How to start a new sliding count window after a long break timeline 1 2 3 4 95 6 7 8 no records for a while CountTrigger.of(1) fires every time a record comes in EvictingWindowOperator adds a new input record to InternalListState 92 3 4 2 3 4 9 2 3 4 9 Sliding count window of size 3 We want to start a new window after missing 4 timestamps
  • 26. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink
  • 27. Working with model developers They stick to using Python They develop models using a Python library called Keras I don’t want to use Deeplearning4J because that’s Java… We use Keras on Python! I want to develop our solution on JVM! Why don’t we develop models using Deeplearning4J?
  • 28. How to load Keras models in Flink? I don’t know how to have it!
  • 29. Loading Keras models in JVM • Convert Keras models to TensorFlow SavedModel • use tensorflow.python.saved_model.builder.SavedModelBuilder • TensorFlow Java API (Flink TensorFlow) • Do inference inside the JVM process • TensorFlow Serving • Do inference outside the JVM process • Execute Keras through CPython inside JVM • Do inference inside the JVM process • Java Embedded Python (JEP) to ease the use of CPython • https://github.com/ninia/jep • Use KerasModelImport from Deeplearning4J • Not mature enough
  • 30. Comparison of approaches to use Keras models in JVM TaskManager Process RichWindowFunction TensorFlow Java API Java Embedded Python (JEP) TaskManager Process RichWindowFunction TaskManager Process TensorFlow Serving RichWindowFunction TensorFlow Native Library TensorFlow Java API (very thin wrapper) Ŷ... X CPython JEP Java object JEP native code Ŷ... X Saved Model Keras model Saved Model Keras model gRPC client ... X Ŷ TFServing process DynamicManager Loader SavedModel v1 SavedModel v2Keras model Execute Python commands - import keras - load a model & weights - pass X and get Ŷ Saved Model
  • 31. Comparison of runtime inference performance TensorFlow Java API 77.7 milliseconds per inference TensorFlow Serving 71.2 milliseconds per inference Keras inside CPython w/ TensorFlow backend 32 milliseconds per inference (* Theano backend is extremely slow in our case) (* We do not batch inference calls)
  • 32. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink Score Sink W(t) @t Input sink Tumbling EventTimeWindows (size = interval)
  • 33. Joining two streams on event time • At a certain time t, • Y of timestamp t is arriving • Ŷ of timestamp t+2d is arriving • Ŷ of timestamp t has arrived two days ago • TumblingEventTimeWindows.of( Time.seconds(timeUnit) ) • To maintain a window for a single pair of Y and Ŷ • A window is triggered when watermarks from both streams have arrived join windowMeasurement stream ... @t+2d @t @t apply assignTimestamps AndWatermarks Prediction stream Tumbling EventTime Windows @tW(t) Y @t+2dW(t+2d) Ŷ +2 days trigger!
  • 34. Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink Prediction Sink W(t) @t Raise an alarm if the distance of Ŷ and Y goes beyond a defined threshold Input sink
  • 35. Score Sink Prediction Sink Input sink Data processing pipeline with Flink DataStream API addSource process countWindowAll (custom evictor) apply assignTimestamps AndWatermarks join applywindow ... timeline ... X Ŷ Measurement stream Prediction stream +2 days Outlier sink W(t) @t
  • 36. Input, Prediction, Score sinks write records to InfluxDB We then plot time-series using Grafana
  • 37. Predicting from a single DNN is not enough! Prediction from a single DNN Possibly biased predictionMeasurement Prediction from an ensemble of 10 DNNs ... ... Measurement More reliable prediction!
  • 38. mean DNN ensemble for reliable prediction Ŷ Ŷ Ŷ Different Convolutional LSTMNs return slightly different prediction results Ŷ Timeline Y 2 days ... ... ... Raise an alarm if the distance of two vectors is above a defined threshold X (one week time-series) ... ...
  • 39. Data processing pipeline with Flink DataStream API addSource process join applywindow Measurement stream Prediction stream Outlier sink Prediction Sink Score Sink Ŷ … DNN ensemble Ŷ Ŷ Ŷ Input sink how to implement our ensemble pipeline? ... ... ... ... ... mean Ŷ Ŷ Ŷ Ŷ …
  • 40. Data processing pipeline with Flink DataStream API addSource process join applywindow Measurement stream Prediction stream Outlier sink Prediction Sink Score Sink ŶapplykeyBy countWindow (custom evictor) … ... ... ... Ŷ Ŷ Ŷ … … setParallelism(ensembleSize=10) assign Timestamps And Watermarks … Ŷ Ŷ Ŷ setParallelism(1) ... ... ... … DNN ensemble Ŷ Ŷ Ŷ mean Ŷ Ŷ Ŷ Ŷ flatMap … replicate 10 times Input sink applywindowAll Tumbling EventTimeWindow … ... ...
  • 41. Distribute 10 keys evenly over 10 partitions Carefully generate keys not to belong to the same partitions flatMap keyBy (murmurHash) PARTITION_0KEY_0, PARTITION_1KEY_1, PARTITION_9KEY_9, … KEY_1, KEY_9,… KEY_0, replicate 10 times with different keys PARTITION = murmurHash(KEY) / maxParallalism*(parallelism/maxParallism)
  • 42. Contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus TF Serving Grafana Flink MySQL PrometheusInfluxDB Time-series prediction model
  • 43. A simple software stack on top of Docker Customer machine MySQL official image Grafana official image Prometheus official image TensorFlow Serving official image InfluxDB official image Docker engine Flink official image No custom Docker image! A single yml file is okay to deploy our software stack!
  • 44. Launch JobManager & TaskManager with some changes in the official repository of the Docker image for Flink You need to get flink-metrics-prometheus-1.4-SNAPSHOT.jar by yourself until Flink-1.4 is officially released metrics.reporter : prom metrics.reporter.prom.class : org.apache.flink.metrics.prometheus.PrometheusReporter jobmanager.heap.mb : 10240 taskmanager.heap.mb : 10240
  • 45. * Every process runs inside a Docker container Flink JobManager TaskManager TaskManager TaskManager Prometheus Prometheus scrapes HTTP endpoints of metrics exporters specified in configuration Grafana Solution deployment :9249/metrics :9249/metrics Flink runtime metrics & Custom metrics :9104/metrics MySQLd Exporter MySQL metrics CPU/Disk Memory Network Servers System metrics :9100/metrics Node Exporter :8080/metrics Container metrics cAdvisor TensorFlow Serving inference MySQL source InfluxDB sink Docker Submits a Flink Job to launch our pipeline InfluxDB Sensor time-series dashboard Solution monitoring dashboard * if using TFServing
  • 46. Solution monitoring dashboard * this dashboard is based on ”Docker Dashboard” by Brian Christner Server Mem/CPU/FS usage (by node exporter) Container CPU usage (by cAdvisor) Inference time from each DNN (custom metrics) TaskManager JVM memory Usage # records written to sinks (custom metrics)
  • 47. Recap – contents 1 Why we use Flink for our time-series prediction model 2 Flink pipeline design for rendezvous and DNN ensemble 3 Solution packaging and monitoring with Docker and Prometheus Time-series prediction model TF Serving Grafana Flink MySQL PrometheusInfluxDB
  • 48. Conclusion • Flink helps us concentrate on the core logic • DataStream API is just like a natural language in presenting streaming topologies • flexible windowing mechanisms (count window and evictor) • joining of two streams on event time • Thanks to it, we can focus on • implementation of custom source/sink to meet customer requirements • interaction with DNN ensembles • It has a nice ecosystem to help build a solution • Docker • Prometheus metric reporter
  • 50. You cannot use TFServing with Flink-1.3.2 • Netty binary incompatbility • flink-runtime_2.11:1.3.2 • depends on io.netty:4.0.27.Final • grpc-netty:4.1.14 • depends on io.netty:4.1.14.Final • You could use grpc-okhttp instead of grpc-netty • grpc-okhttp conflicts with another library (influxdb client for Java) • You can use TFServing with FLINK-1.4-SNAPSHOT • [FLINK-7013] Add shaded netty dependency • io.netty.* is recently relocated to org.apache.flink.shaded.netty4.* From “New and noteworthy in 4.1” page of Netty project 4.1 contains multiple additions which might not be fully backward-compatible with 4.0
  • 51. Looking forward to the official release of v1.4 No more 1.4-SNAPSHOT on the customer site!