2024: Domino Containers - The Next Step. News from the Domino Container commu...
Real-time processing of large amounts of data
1. Real-time processing of large amounts of data
A streaming platform as a central nervous system in the enterprise
Perry Krol, Head of Systems Engineering CEMEA, Confluent
Streaming Event, München 24.07.2019
11. 1111
Highly Scalable
Persistent
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Real-timeHighly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
12. 1212
Highly Scalable
Durable
Persistent
Maintains Order
ETL/Data Integration MessagingETL/Data Integration MessagingMessaging
Batch
Expensive
Time Consuming
Difficult to Scale
No Persistence After
Consumption
No Replay
Fast (Low Latency)Highly Scalable
Durable
Persistent
Ordered
Real-time
Event Streaming
What happened
in the world
(stored records)
What is happening
in the world
(transient messages)
What is contextually happening in the world
(data as a continually updating stream of events)
13. 1313
Event-Driven App
(Location Tracking)
Only Real-Time Events
Messaging Queues
and Event Streaming
Platforms can do this
Contextual
Event-Driven App
(ETA)
Real-Time combined
with stored data
Only Event Streaming
Platforms can do this
Where is my
driver?
When will my driver
get here?
Where is my driver? When will my driver
get here?
2
min
Why Combine Real-time
With Historical Context?
16. 16C O N F I D E N T I A L
Apache Kafka, the de-facto OSS standard for
event streaming
Real-time | Uses disk structure for constant performance at Petabyte scale
Scalable | Distributed, scales quickly and easily without downtime
Persistent | Persists messages on disks, enables intra-cluster replication
Reliable | Replicates data, auto balances consumers upon failure
In production at more
than a third of the
Fortune 500
2 trillion messages a
day at LinkedIn
500 billion events a
day (1.3 PB) at Netflix
17. 1717
I N V E S T M E N T & T I M E
VALUE
3
4
5
1
2
Event Streaming Maturity Model
17
Initial Awareness /
Pilot
Start to Build Pipeline /
Deliver 1 New Outcome
Leverage
Stream Processing
Build Contextual
Event-Driven Apps
Central Nervous
System
Product, Support, Training, Partners, Technical Account Management...
23. 23
More than 1
petabyte of
data in Kafka
Over 4.5
trillion
messages per
day
60,000+ data
streams
Source of all
data
warehouse &
Hadoop data
Over 300
billion user-
related events
per day
Apache Kafka®: Open Source Streaming Platform Battle-Tested at Scale
The birthplace of Apache Kafka
24. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Parking assistant
Environment pointer
Ultrasonic Sensors
Parking assistant with
front and rear camera
plus environment
indicator
Crash Sensors
Front protection adaptivity
Side protection
Tail impact protection
Front Camera
Audi Active lane assistant
Speed limit indicator
Adaptive light
Infrared Camera
Rearview assistance with
Pedestrian recognition
Front and Rear
Radar Sensors
ACC with stop and go function
Side assist
25. The Future of the Automotive Industry
is a Real Time Data Cluster
Front, rear and top
view cameras
Ultrasonic SensorsCrash Sensors
Front Camera Infrared Camera
Front and Rear
Radar Sensors
Traffic Alerts
Hazard Alerts Personalization
Anomaly
Detection
MQTT MQTT
MQTT
MQTT MQTTMQTT
27. Nordea was able to reduce their platform costs by
73%, drop analytics turnaround time from 16
weeks to instantaneous reporting, and is now able
to give all analysts access to trade data in real-
time so they observe important patterns in data
and respond to them in real-time.
Nordea Kafka-Powered MiFID II Compliance
28. 28
Retail: Hypercompetitive market with a need to respond
to customer demand in real-time
● Technology Issue: Base systems in
legacy architecture built around Hadoop
with Spark & traditional ETL – slow
response times not meeting business
needs.
● Challenges to synchronize data and have
visibility across systems including online,
supply chain and vendors.
29. 29
Retail: Real-Time Customer Experience
“Wal-Mart is able to take data from your past buying patterns,
their internal stock information, your mobile phone location
data, social media as well as external weather information and
analyse all of this in seconds so it can send you a voucher for a
BBQ cleaner to your phone– but only if you own a barbeque, the
weather is nice and you currently are within a 3 miles radius of a
Wal-Mart store that has the BBQ cleaner in stock.”
Results
30. 30
Severstal
Challenge: Make use of the multiple terabytes of
time series data generated weekly by industrial
equipment to reduce downtime and increase
efficiency.
Solution: Use Confluent Platform to feed machine
learning models and data analytics algorithms
with near real-time data streams of plant data.
Results
● Reduced plant downtime
● Completed initial deployment quickly
● Received support for securing in-transit data
● Achieved one-second latencies
33. 33
apache kafka: a distributed streaming platform
scalability of a
filesystem
● hundreds of MB/s
● many TBs per server
● commodity hardware
guarantees of a
database
● persistence
● ordering
distributed by
design
● replication
● partitioning
● horizontal scalability
● fault tolerance
44. 44
44C O N F I D E N T I A L
About Confluent We Are The Kafka Experts
30% of Fortune 100
Confluent founders
created Kafka
Confluent team wrote
80% of Kafka
We have over 300,000
hours of Kafka Experience
47. 47
47
This New Paradigm is the Future of Data
Infrastructure
as code
Data as a continuous
stream of events
Future of the
datacenter
Future of data
Cloud
Event
Streaming
48. Winning in the Digital Era
doesn’t have to be hard.
Mainframes
Proprietary messaging systems
Monolithic application development
On-premises data centers
Batch-oriented, closed systems
Scalable machine clusters
No bottlenecks from message queues
Agile software development through microservices
Cloud capable, and even…
Data systems turned inside out, open & transparent
Slow speed of execution Fast and flexible
Your data infrastructure
was built for a different era
Imagine a world…
49. 49
Confluent Platform
Operations and Security
Development & Stream Processing
Support,services,training&partners
Apache Kafka
Security plugins | Role-Based Access Control
Control Center | Replicator | Auto Data Balancer | Operator
Connectors
Clients | REST Proxy
MQTT Proxy | Schema Registry
KSQL
Connect Continuous Commit Log Streams
Complete Event
Streaming Platform
Mission-critical
Reliability
Freedom of Choice
Datacenter Public Cloud Confluent Cloud
Self-Managed Software Fully-Managed Service
50. 50
Deploy and Stream with Confluent Across Any Cloud.
Self-managed software Fully-managed service
Confluent Platform Confluent Cloud
Deploy on any platform on-premises or in public clouds
The Leading Distribution of Apache Kafka Cloud-native Apache Kafka
VM
Available on the leading public clouds
confluent.io/download confluent.io/confluent-cloud/
51. 51C O N F I D E N T I A L
Kafka Integration Architecture
PRODUCERCONSUMER
52. ATM Fraud Detection with Apache Kafka and KSQL
@rmoff
Confluent Hub
hub.confluent.io
One-stop place to discover and download :
• Connectors
• Transformations
• Converters
53. 53C O N F I D E N T I A L
Kafka Cluster
Connect API Stream Processing Connect API
$ cat < in.txt | grep “ksql” | tr a-z A-Z > out.txt
Stream Processing Analogy
55. 55
CREATE STREAM ATM_POSSIBLE_FRAUD_ENRICHED AS
SELECT t.account_id,
a.first_name + ’ ’ + a.last_name cust_name,
t.atm, t.amount,
TIMESTAMPTOSTRING(t.ROWTIME,’HH:mm:ss’) tx_time
FROM atm_txns t
INNER JOIN accounts a
ON t.account_id = a.account_id;
Simple SQL syntax for expressing reasoning along and across data streams.
You can write user-defined functions in Java
Stream processing with KSQL
56. 56
KSQL in Development and Production
Interactive KSQL
for development and testing
Headless KSQL
for Production
Desired KSQL queries
have been identified
REST
“Hmm, let me try
out this idea...”
58. 58
Data exploration
KSQL example use cases
Data enrichment Streaming ETL
Filter, cleanse, mask Real-time monitoring Anomaly detection
59. 59
Complete Portfolio of Products and Services Built around Kafka
Professional
Services
Enterprise
Support
Kafka Training
Software, services and support across the entire adoption lifecycle
Confluent
Cloud
Confluent
Platform
60. 60
Confluent Streaming Event Frankfurt
Steigenberger Frankfurter Hof
11. November 2019
Confluent Streaming Event Zürich
Novotel Zürich City West
13. November 2019
62. Perry Krol
Head of Systems Engineering CEMEA
Email: perry@confluent.io
LinkedIn: https://www.linkedin.com/in/perrykrol/
Questions? Feedback?
Please contact me!