(Marcus Urbatschek, Confluent)
Presentation during Confluent’s streaming event in Munich. This three-day hands-on course focused on how to build, manage, and monitor clusters using industry best-practices developed by the world’s foremost Apache Kafka™ experts. The sessions focused on how Kafka and the Confluent Platform work, how their main subsystems interact, and how to set up, manage, monitor, and tune your cluster.
Processing Real-Time Data at Scale: A streaming platform as a central nervous system in the enterprise
1. 1
Processing Real-Time Data at Scale
A streaming platform as a central nervous
system in the enterprise
October 30th
, 2018
Marcus Urbatschek, Confluent
marcus.urbatschek@confluent.io
+49 171 77 433 83
3. 3
Survey #1
What is your key requirement for your future modern
architecture?
(Multiple answers possible)
- Digitalization
- Speed / Time to market
- Innovation and agile projects
- Real-time Insights
7. Legacy Data Infrastructure Solutions Have Architectural Flaws
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
These solutions can be
● Batch-oriented, instead of event-
oriented in real time
● Complex to scale at high
throughput
● Connected point-to-point, instead
of publish / subscribe
● Lacking data persistence and
retention
● Incapable of in-flight message
processing
App App
8. Modern Architectures are Adapting to New Data Requirements
NoSQL DBs Big Data Analytics
But how do we
revolutionize data flow
in a world of exploding,
distributed and ever
changing data?
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App App
MOM MOM
ETL
ETL
ESB
App App
9. The Solution is a Streaming Platform for Real-Time Data Processing
A Streaming Platform
provides a single source
of truth about your data
to everyone in your
organization
NoSQL DBs Big Data Analytics
App App
DWH
Transactional
Databases
Analytics
Databases
Data Flow
DB DB
App AppApp App
Streaming Platform
10. Business Digitization Trends are Revolutionizing your Data Flow
Massive volumes of
new data generated
every day
Mobile Cloud Microservices Internet of
Things
Machine
Learning
Distributed across
apps, devices,
datacenters, clouds
Structured,
unstructured
polymorphic
18. What is a Streaming Platform?
01
Messaging, Done Right
02
Foundation for ETL & Data
Integration
03
Hadoop Made Fast
Search
Stream
Processing
DWH
Hadoop
RDBMS
Apps
Real-Time
Analytics
Monitoring
K/V
21. 23
Survey #2
Where are you in your journey to establish a
modern streaming platform in your enterprise?
(One answer possible)
- 1 – Pre-Streaming (Batch, Legacy)
- 2 – Interest (first proof-of-concepts or pilots)
- 3 – Early Production (some independent projects in production)
- 4 – Integrated Streaming (streaming platform with different projects in production)
- 5 – Streaming Platform (streaming enterprise with mostly event-based applications)
Things to consider
- How many in each stage?
- Where do you want to be in 12-24 months?
- How big is your jump?
28. 34
More than 1
petabyte of
data in Kafka
Over 4.5
trillion
messages per
day
60,000+ data
streams
Source of all
data
warehouse &
Hadoop data
Over 300
billion user-
related events
per day
Apacke Kafka®: Open Source Streaming Platform Battle-Tested at Scale
The birthplace of Apache Kafka
29. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Parking assistant
Environment pointer
Ultrasonic Sensors
Parking assistant with
front and rear camera
plus environment
indicator
Crash Sensors
Front protection adaptivity
Side protection
Tail impact protection
Front Camera
Audi Active lane assistant
Speed limit indicator
Adaptive light
Infrared Camera
Rearview assistance with
Pedestrian recognition
Front and Rear
Radar Sensors
ACC with stop and go function
Side assist
30. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Ultrasonic SensorsCrash Sensors
Front Camera Infrared Camera
Front and Rear
Radar Sensors
Traffic Alerts
Hazard Alerts Personalization
Anomaly
Detection
MQTT MQTT
MQTT
MQTT MQTTMQTT
31. 37
Retail: Hypercompetitive market with a need to respond
to customer demand in real-time
● Technology Issue: Base systems in
legacy architecture built around Hadoop
with Spark & traditional ETL – slow
response times not meeting business
needs.
● Challenges to synchronize data and have
visibility across systems including online,
supply chain and vendors.
32. 38
Retail: Real-Time Customer Experience
“Wal-Mart is able to take data from your past buying patterns,
their internal stock information, your mobile phone location
data, social media as well as external weather information and
analyse all of this in seconds so it can send you a voucher for a
BBQ cleaner to your phone– but only if you own a barbeque, the
weather is nice and you currently are within a 3 miles radius of a
Wal-Mart store that has the BBQ cleaner in stock.”
Results
33. Winning in the Digital Era
doesn’t have to be hard.
Mainframes
Proprietary messaging systems
Monolithic application development
On-premises data centers
Batch-oriented, closed systems
Scalable machine clusters
No bottlenecks from message queues
Agile software development through microservices
Cloud capable, and even…
Data systems turned inside out, open & transparent
Slow speed of execution Fast and flexible
Your data infrastructure
was built for a different era
Imagine a world…
34. 49
Survey #3
What use cases do you see for a streaming platform?
(Multiple answers possible)
- New real-time applications
- IoT / Connected XYZ
- Decoupling between different legacy applications
and modern applications
- (New) Legacy Offloading
- Microservices Architecture
- Fast data processing / Real-time decisioning
- Analytics (e.g. for data science, machine learning)
- Mission critical applications (e.g. payments, fraud
detection, customer experience)
- Other?
36. 51
Confluent Delivers a Mission-Critical Streaming Platform
Apache Kafka®
Core | Connect API | Streams API
Data Compatibility
Schema Registry
Enterprise Operations
Replicator | Auto Data Balancer | Connectors | MQTT Proxy | Operator
Database
Changes
Log Events IoT Data Web Events other events
Hadoop
Database
Data
Warehouse
CRM
other
DATA INTEGRATION
Transformations
Custom Apps
Analytics
Monitoring
other
REAL-TIME APPLICATIONS
OPEN SOURCE FEATURES COMMERCIAL FEATURES
Datacenter Public Cloud Confluent Cloud
Confluent Platform
Management & Monitoring
Control Center | Security
Development & Connectivity
Clients | Connectors | REST Proxy | KSQL
CONFLUENT FULLY-MANAGEDCUSTOMER SELF-MANAGED
38. 53
KSQL
Streaming SQL Engine for Apache Kafka
ksql > CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = ‘Platinum’;
confluent.io/product/ksql
Develop real-time stream processing apps writing only SQL!
No Java, Python, or other boilerplate to wrap around it
39. 54
CREATE TABLE error_counts AS
SELECT error_code, count(*)
FROM monitoring_stream
WINDOW TUMBLING (SIZE 1 MINUTE)
WHERE type = 'ERROR'
GROUP BY error_code;
CREATE STREAM vip_actions AS
SELECT userid, page, action
FROM clickstream c
LEFT JOIN users u ON c.userid =
u.user_id
WHERE u.level = 'Platinum';
CREATE STREAM possible_fraud AS
SELECT card_number, count(*)
FROM authorization_attempts
WINDOW TUMBLING (SIZE 5 SECONDS)
GROUP BY card_number
HAVING count(*) > 3;
KSQL: the Simplest Way to Do Stream Processing
1 2 3Streaming ETL Anomaly detection Monitoring
40. 63
Survey #4
Do you use or plan to use the Apache Kafka ecosystem
in your projects?
(One answer possible)
• We already use Apache Kafka.
• We already use Apache Kafka and also leverage
Confluent components.
• We do not use Apache Kafka yet but plan to use it
soon.
• We do not plan to use Apache Kafka soon to build a
streaming platform.