This document discusses applying machine learning models to real-time stream processing using Apache Kafka. It covers building analytic models from historical data, applying those models to real-time streams without redevelopment, and techniques for online training of models. Live demos are presented using open source tools like Kafka Streams, Kafka Connect, and H2O to apply machine learning to streaming use cases like flight delay prediction. The key takeaway is that streaming platforms can leverage pre-built machine learning models to power real-time analytics and actions.
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Machine Learning and Deep Learning Applied to Real Time with Apache Kafka Streams
1. 1Confidential
Apache Kafka + Machine Learning
Analytic Models Applied to Real Time Stream Processing
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
LinkedIn
@KaiWaehner
www.kai-waehner.de
2. 2Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
3. 3Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
4. 4Apache Kafka and Machine Learning
Machine Learning
... allows computers to find hidden insights without being
explicitly programmed where to look.
5. 5Apache Kafka and Machine Learning
Real World Examples of Machine Learning
Spam Detection
Search Results +
Product Recommendation
Picture Detection
(Friends, Locations, Products)
Your Company
The Next Disruption:
Google Beats Go Champion
6. 6Apache Kafka and Machine Learning
Leverage Machine Learning to Analyze and Act on Critical Business Moments
Seconds Minutes Hours
Price
Optimization
Predictive
Maintenance
Fraud
Detection
Cross
Selling
Transportation
Rerouting
Customer
Service
Inventory
Management
Windows of Opportunity
8. 8Apache Kafka and Machine Learning
Big Data Analytics
Volume
(terabytes,
petabytes)
Variety
(social networks,
blog posts, logs,
sensors, etc.)
Velocity
(„real time“)
Value
9. 9Apache Kafka and Machine Learning
Big Data Analytics for Actionable Insights
From Insight to Action
(continuously closed loop)
10. 10Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
11. 11Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
12. 12Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
13. 13Apache Kafka and Machine Learning
Hidden Technical Debt in Machine Learning Systems
https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Writing
source code
is not the
time-consuming
task!
!
14. 14Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
15. 15Apache Kafka and Machine Learning
Data Access
Find insights to create
added business value
by correlating
various data sources!
16. 16Apache Kafka and Machine Learning
Data Preparation
http://www.slideshare.net/odsc/feature-engineering
Data Preparation
23. 23Apache Kafka and Machine Learning
Languages, Frameworks and Tools
Many more ….
Portable Format
for Analytics (PFA)
24. 24Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Models
with R, TensorFlow, Apache Spark, H2O.ai, RapidMiner
25. 25Apache Kafka and Machine Learning
Live Demo
Use Case:
Customer Churn Prediction
Machine Learning Algorithm:
Generalized Linear Model (GLM)
using Logistic Regression
Technology:
Open Source R
26. 26Apache Kafka and Machine Learning
Live Demo
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Gradient Boosted Machines (GBM)
using Decision Trees
Technology:
H2O.ai
27. 27Apache Kafka and Machine Learning
Live Demo
Use Case:
Predictive Maintenance
(Anomaly Detection in Telco Networks)
Deep Learning Algorithm:
Artificial Neural Networks (ANN)
using Autoencoders
Technology:
TensorFlow + Python API
28. 28Apache Kafka and Machine Learning
Live Demo
Use Case:
Classification
(Prediction of Titanic Survivors)
Deep Learning Algorithm:
Recurrent Neural Networks (RNN)
Technology:
RapidMiner
29. 29Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
30. 30Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
31. 31Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
32. 32Apache Kafka and Machine Learning
Definition of Stream Processsing
Data at Rest Data in Motion
36. 36Apache Kafka and Machine Learning
Stream Processing
Use Cases
• Real Time Applications
• Stateful Streaming Analytics
• Stateless “Real Time ETL”
37. 37Apache Kafka and Machine Learning
Event Processing Windows
Various Options for Windowing (Fixed, Sliding, Session, …)
38. 38Apache Kafka and Machine Learning
How to
apply analytic models
to real time processing
without redevelopment?
39. 39Apache Kafka and Machine Learning
Application of Analytic Models to Real Time without Redevelopment
Stream
Processing
H20.ai
R
Python
Spark ML
MATLAB
SAS
PMML
40. 40Apache Kafka and Machine Learning
Streaming Analytics - Processing Pipeline
APIs
Adapters /
Channels
Integration
Messaging
Stream
Ingest
Transformation
Aggregation
Enrichment
Filtering
Stream
Preprocessing
Process
Management
Analytics
(Real Time)
Applications
& APIs
Analytics /
DW Reporting
Stream
Outcomes
• Contextual Rules
• Windowing
• Patterns
• Analytics
• Machine Learning
• …
Stream
Analytics
Index / SearchNormalization
Applying an Analytic Model
is just a piece of the puzzle!
42. 42Apache Kafka and Machine Learning
Frameworks and Products
OPEN SOURCE CLOSED SOURCE
PRODUCT
FRAMEWORK
Azure Microsoft
Stream Analytics
43. 43Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
44. 44Apache Kafka and Machine Learning
When to use Kafka Streams for Stream Processing?
No need for a
Big Data cluster
Deploy in your
existing infrastructure
Kafka manages
scalability / fail-over
Focus on development
of business logic
in your department
46. 46Apache Kafka and Machine Learning
A complete streaming microservices, ready for production at large-scale
Word
Count
App configuration
Define processing
(here: WordCount)
Start processing
47. 47Apache Kafka and Machine Learning
Confluent Platform: the Free, Open-Source Streaming Platform
Open Source ExternalCommercial
Confluent Platform
Monitoring
Analytics
Custom Apps
Transformations
Real-time
Applications
…
CRM
Data Warehouse
Database
Hadoop
Data
Integration
…
Control Center
Auto-data
Balancing
Multi-Data
Center Replication
24/7 Support
Supported
Connectors
Clients
Schema
Registry
REST
Proxy
Apache Kafka
Kafka
Connect
Kafka
Streams
Kafka
Core
Database Changes Log Events loT Data Web Events …
48. 48Apache Kafka and Machine Learning
Streaming Platform
Big Data Analytics
Database
IoT Device
Streaming
Producer
…..
DWH
Data
Integration
C
O
N
N
E
C
T
C
O
N
N
E
C
T
Data Lake
Model
Building
Batch
Real
Time
Stream
Processing
REST
Interface
IoT Device
Mobile App
Streaming
Consumer
C
O
N
N
E
C
T
C
O
N
N
E
C
T
BI Tool
Messaging
Web
Application
Model
Schema Registry
/ Governance
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
49. 49Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATAANALYTICS
Oracle DB
CoaP IoT
Kafka
Java Client
…..
HP Vertica
Data
Integration
F
L
U
M
E
H2O.ai,
Spark,
TensorFlow
Batch
Real
Time
Confluent
REST Proxy
MQTT IoT
iPhone App
Kafka
Go Client
C
K O
A N
F N
K E
A C
T
H
I
V
E
Grafana
Kafka
Java EE
Web App
Hadoop
C
K O
A N
F N
K E
A C
T
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Data Producer
2) Analytics Platform
3) Streaming Platform
4) Data Consumer
50. 50Apache Kafka and Machine Learning
Live Demos with Open Source Technologies
Development of Analytic Models
with Apache Kafka Messaging, Kafka Streams, Kafka Connect, Confluent Schema Registry
51. 51Apache Kafka and Machine Learning
Live Demo
Use Case:
Airline Flight Delay Prediction
Machine Learning Algorithm:
Any! (in our example, H2O.ai GBM)
Streaming Platform:
Apache Kafka Core, Kafka Connect,
Kafka Streams, Confluent Schema Registry
52. 52Apache Kafka and Machine Learning
H2O.ai Model + Kafka Streams
Filter
Map
1) Create H2O ML model
2) Configure Kafka Streams Application
3) Apply H2O ML model to Streaming Data
4) Start Kafka Streams App
53. 53Apache Kafka and Machine Learning
End-to-End Stream Monitoring and Alerting
Confluent Control Center
Data Stream Monitoring and Alerting
Multi-cluster monitoring and management
Kafka Connect Configuration
• Message delivery?
• Delays?
• Where got it stuck?
• Lost messages?
• Broker issues?
• Performance?
http://docs.confluent.io/3.2.0/control-center/docs/monitoring.html
54. 54Apache Kafka and Machine Learning
Agenda
1) Machine Learning in the Real World
2) Building an Analytic Model
3) Applying an Analytic Model in Real Time
4) Online Training of Models
55. 55Apache Kafka and Machine Learning
Let’s improve
the analytic model
continuously…
56. 56Apache Kafka and Machine Learning
Analytical Pipeline
1. Data Access
2. Data Preparation
3. Exploratory Data Analysis
4. Model Building
5. Model Execution
6. Model Validation
7. Deployment
Online
Training
Continuously train and improve the model with every new event
57. 57Apache Kafka and Machine Learning
Online Model Training of Analytic Models
How to improve models?
1.Manual Update
2.Automated Batch
3.Real Time
58. 58Apache Kafka and Machine Learning
STREAMING PLATFORM
BIG DATAANALYTICS
F
L
U
M
E
H2O.ai,
Spark,
TensorFlow
H
I
V
E
Kafka
Hadoop
Confluent
Schema Registry
Kafka Streams
H2O.ai
Mesos
Kafka Streams
TensorFlow
Kubernetes
Avro
Avro
1) Get new Input Event
via Kafka Topic
2) Improve Model in
Big Data Cluster
3) Update deployed Model
via Kafka Topic
4) Leverage
Improved Model
for new Events
59. 59Apache Kafka and Machine Learning
Caveats for Online Model Training
• Processes and infrastructure not ready
• Validation needed before production
• Slows down the system
• Only a few ML implementations supported
• Many use cases do not need it
60. 60Apache Kafka and Machine Learning
Key Take-Aways
Ø Insights are hidden in Historical Data on Big Data Platforms
Ø Machine Learning and Big Data Analytics find these Insights by building Analytics Models
Ø Streaming Platform uses these Models (without Redevelopment) to take Action in Real Time
61. 61Apache Kafka and Machine Learning
Kai Waehner
Technology Evangelist
kontakt@kai-waehner.de
@KaiWaehner
www.kai-waehner.de
LinkedIn
Questions? Feedback?
Please contact me!