SlideShare una empresa de Scribd logo
1 de 55
Descargar para leer sin conexión
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF
HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH
Ingesting and Processing IoT Data -
using MQTT, Kafka Connect and KSQL
Guido Schmutz
Kafka Summit 2018 – 16.10.2018
@gschmutz guidoschmutz.wordpress.com
Guido Schmutz
Working at Trivadis for more than 21 years
Oracle ACE Director for Fusion Middleware and SOA
Consultant, Trainer Software Architect for Java, Oracle, SOA and
Big Data / Fast Data
Head of Trivadis Architecture Board
Technology Manager @ Trivadis
More than 30 years of software development experience
Contact: guido.schmutz@trivadis.com
Blog: http://guidoschmutz.wordpress.com
Slideshare: http://www.slideshare.net/gschmutz
Twitter: gschmutz
Agenda
1. Introduction
2. IoT Logistics use case – Kafka Ecosystem "in Action”
3. Stream Data Integration – IoT Device to Kafka over MQTT
4. Stream Analytics with KSQL
5. Summary
Introduction
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Hadoop Clusterd
Hadoop Cluster
Big Data
Reference Architecture for Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
SQL
Export
Microservice State
{ }
API
Event
Stream
Event
Stream
Search
Service
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Stream
Processor
State
{ }
API
Stream Analytics
Results
DB
Two Types of Stream Processing
(from Gartner)
Stream Data Integration
• Primarily cover streaming ETL
• Integration of data source and data sinks
• Filter and transform data
• (Enrich data)
• Route data
Stream Analytics
• analytics use cases
• calculating aggregates and detecting
patterns to generate higher-level, more
relevant summary information (complex
events => used to be CEP)
• Complex events may signify threats or
opportunities that require a response
Stream Integration and Stream Analytics with Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Stream Data Integration and Stream Analytics with
Kafka
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
Hadoop Clusterd
Hadoop Cluster
Big Data
Unified Architecture for Modern Data Analytics Solutions
SQL
Search
Service
BI Tools
Enterprise Data
Warehouse
Search / Explore
File Import / SQL Import
Event
Hub
D
ata
Flow
D
ata
Flow
Change DataCapture Parallel
Processing
Storage
Storage
RawRefined
Results
SQL
Export
Microservice State
{ }
API
Stream
Processor
State
{ }
API
Event
Stream
Event
Stream
Search
Service
Stream Analytics
Microservices
Enterprise Apps
Logic
{ }
API
Edge Node
Rules
Event Hub
Storage
Bulk Source
Event Source
Location
DB
Extract
File
DB
IoT
Data
Mobile
Apps
Social
Event Stream
Telemetry
Various IoT Data Protocols
• MQTT (Message Queue Telemetry Transport)
• CoaP
• AMQP
• DDS (Data Distribution Service)
• STOMP
• REST
• WebSockets
• …
IoT Logistics use case – Kafka
Ecosystem "in Action"
Demo - IoT Logistics Use Case
Trucks are sending driving info and geo-position
data in one single message
Position &
Driving Info
Testdata-Generator originally by Hortonworks
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
{
"timestamp":1537343400827,
"truckId":87,
"driverId":13,
"routeId":987179512,
"eventType":"Normal",
"latitude":38.65,
"longitude":-90.21,
"correlationId":"-32087002637”
}
?
Stream Data Integration – IoT
Device to Kafka over MQTT
Stream Data Integration
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
(I) IoT Device sends data via MQTT
Message Queue Telemetry Transport (MQTT)
Pub/Sub architecture with Message Broker
Built in retry / QoS mechanism
Last Will and Testament (LWT)
Not all MQTT brokers are scalable
Available
Does not provide state (history)
truck/nn/
position
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
MQTT to Kafka using Confluent MQTT Connector
IoT Device sends data via MQTTs – how to get the data
into Kafka?
truck
position
truck/nn/
position
?
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
2 Ways for MQTT with Confluent Streaming Platform
Confluent MQTT Connector (Preview)
• Pull-based
• integrate with (existing) MQTT servers
• can be used both as a Source and Sink
• output is an envelope with all of the
properties of the incoming message
• Value: body of MQTT message
• Key: is the MQTT topic the message was
written to
• Can consume multiple MQTT topics and write to
one single Kafka topic
• RegexRouter SMT can be used to change topic
names
Confluent MQTT Proxy
• Push-based
• enables MQTT clients to use the MQTT
protocol to publish data directly to Kafka
• MQTT Proxy is stateless and independent
of other instances
• simple mapping scheme of MQTT topics to
Kafka topics based on regular expressions
• reduced lag in message publishing
compared to traditional MQTT brokers
(II) MQTT to Kafka using Confluent MQTT Connector
truck/nn/
position
mqtt to
kafka
truck_position kafkacat
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Confluent MQTT Connector
Currently available as a Preview on Confluent Hub
Setup plugin.path to specify the additional folder
confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview
plugin.path=/usr/share/java,/etc/kafka-connect/custom-plugins,
/usr/share/confluent-hub-components
Create an instance of Confluent MQTT Connector
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "mqtt-source",
"config": {
"connector.class": "io.confluent.connect.mqtt.MqttSourceConnector",
"tasks.max": "1",
"name": "mqtt-source",
"mqtt.server.uri": "tcp://mosquitto:1883",
"mqtt.topics": "truck/+/position",
"kafka.topic":"truck_position",
"mqtt.clean.session.enabled":"true",
"mqtt.connect.timeout.seconds":"30",
"mqtt.keepalive.interval.seconds":"60",
"mqtt.qos":"0"
}
}'
MQTTProxy
(III) MQTT to Kafka using Confluent MQTT Proxy
truck
position
engine metrics
console
consumer
Engine
Metrics
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Configure MQTT Proxy
Configure MQTT Proxy
Start MQTT Proxy
topic.regex.list=truck_position:.*position,
engine_metric:.*engine_metric
listeners=0.0.0.0:1883
bootstrap.servers=PLAINTEXT://broker-1:9092
confluent.topic.replication.factor=1
bin/kafka-mqtt-start kafka-mqtt.properties
MQTTProxy
MQTT Connector vs. MQTT Proxy
MQTT Connector
• Pull-based
• Use existing MQTT infrastructures
• Bi-directional
MQTT Proxy
• Push-based
• Does not provide all MQTT functionality
• Only uni-directional
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck
driving info
truck/nn/
position
mqtt to
kafka
truck
position
Position
Position
Position
truck/nn/
driving info
mqtt to
kafka
truck/nn/
position
mqtt to
kafka
Position
Position
Position
truck
driving info
truck
position
Position
Position
Position
REGION-1 DC
REGION-2 DC
REGION-1 DC
REGION-2 DC
Headquarter DC
Headquarter DC
(IV) MQTT to Kafka using StreamSets Data Collector
truck/nn/
position
mqtt to
kafka
truck_position
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
MQTT to Kafka using StreamSets Data Collector
MQTT
Proxy
Wait … there is more ….
truck/nn/
position
mqtt to
kafka
truck_driving
info
truck_position
console
consumer
what about some
analytics ?
console
consumer
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Position &
Driving Info
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Stream Analytics with KSQL
Stream Analytics
Source
Connector
trucking_
driver
Kafka Broker
Sink
Connector
Stream
Processing
KSQL - Terminology
Stream
• “History”
• an unbounded sequence of structured data
("facts")
• Facts in a stream are immutable
• new facts can be inserted to a stream
• existing facts can never be updated or
deleted
• Streams can be created from a Kafka topic
or derived from an existing stream
Table
• “State”
• a view of a stream, or another table, and
represents a collection of evolving facts
• Facts in a table are mutable
• new facts can be inserted to the table
• existing facts can be updated or deleted
• Tables can be created from a Kafka topic or
derived from existing streams and tables
Enables stream processing with zero coding required
The simplest way to process streams of data in real-time
(V) Create STREAM on truck_position and use it in
KSQL CLI
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
KSQL CLI
Create a STREAM on truck_driving_info
ksql> CREATE STREAM truck_driving_info_s 
(ts VARCHAR, 
truckId VARCHAR, 
driverId BIGINT, 
routeId BIGINT, 
eventType VARCHAR, 
latitude DOUBLE, 
longitude DOUBLE, 
correlationId VARCHAR) 
WITH (kafka_topic='truck_driving_info', 
value_format=‘JSON');
Message
----------------
Stream created
Create a STREAM on truck_driving_info
ksql> describe truck_position_s;
Field | Type
---------------------------------
ROWTIME | BIGINT
ROWKEY | VARCHAR(STRING)
TS | VARCHAR(STRING)
TRUCKID | VARCHAR(STRING)
DRIVERID | BIGINT
ROUTEID | BIGINT
EVENTTYPE | VARCHAR(STRING)
LATITUDE | DOUBLE
LONGITUDE | DOUBLE
CORRELATIONID | VARCHAR(STRING)
KSQL - SELECT
Selects rows from a KSQL stream or table
Result of this statement will not be persisted in a Kafka topic and will only be printed out
in the console
from_item is one of the following: stream_name, table_name
SELECT select_expr [, ...]
FROM from_item
[ LEFT JOIN join_table ON join_criteria ]
[ WINDOW window_expression ]
[ WHERE condition ]
[ GROUP BY grouping_expression ]
[ HAVING having_expression ]
[ LIMIT count ];
Use SELECT to browse from Stream
ksql> SELECT * FROM truck_driving_info_s;
1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal |
36.84 | -94.83 | -6187001306629414077
1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal |
42.04 | -88.02 | -6187001306629414077
1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal |
38.33 | -94.35 | -6187001306629414077
1539711991902 | truck/22/position | null | 22 | 26 | 1198242881 | Normal |
36.73 | -95.01 | -6187001306629414077
ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal';
1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane
Departure | 38.98 | -92.53 | -6187001306629414077
1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed
| 40.76 | -88.77 | -6187001306629414077
1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe
following distance | 38.22 | -91.18 | -6187001306629414077
(VI) – CREATE AS … SELECT …
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
CREATE STREAM … AS SELECT …
Create a new KSQL table along with the corresponding Kafka topic and stream the
result of the SELECT query as a changelog into the topic
WINDOW clause can only be used if the from_item is a stream
CREATE STREAM stream_name
[WITH ( property_name = expression [, ...] )]
AS SELECT select_expr [, ...]
FROM from_stream [ LEFT | FULL | INNER ]
JOIN [join_table | join_stream]
[ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria
[ WHERE condition ]
[PARTITION BY column_name];
INSERT INTO … AS SELECT …
Stream the result of the SELECT query into an existing stream and its underlying topic
schema and partitioning column produced by the query must match the stream’s
schema and key
If the schema and partitioning column are incompatible with the stream, then the
statement will return an error
stream_name and from_item must both
refer to a Stream. Tables are not supported!
CREATE STREAM stream_name ...;
INSERT INTO stream_name
SELECT select_expr [., ...]
FROM from_stream
[ WHERE condition ]
[ PARTITION BY column_name ];
CREATE AS … SELECT …
ksql> CREATE STREAM dangerous_driving_s 
WITH (kafka_topic= dangerous_driving_s', 
value_format='JSON') 
AS SELECT * FROM truck_position_s 
WHERE eventtype != 'Normal';
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_s;
1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 38.65 | -90.21 | -6187001306629414077
1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe
following distance | 39.1 | -94.59 | -6187001306629414077
1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane
Departure | 35.1 | -90.07 | -6187001306629414077
Windowing
streams are unbounded
need some meaningful time frames to do
computations (i.e. aggregations)
Computations over events done using
windows of data
Windows are tracked per unique key
Fixed Window Sliding Window Session Window
Time
Stream of Data Window of Data
(VII) Aggregate and Window
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
SELECT COUNT … GROUP BY
ksql> CREATE TABLE dangerous_driving_count AS 
SELECT eventType, count(*) nof 
FROM dangerous_driving_s 
WINDOW TUMBLING (SIZE 30 SECONDS) 
GROUP BY eventType;
Message
----------------------------
Table created and running
ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’),
eventType, nof
FROM dangerous_driving_count;;
2018-10-16 05:12:19.408 | Unsafe following distance | 1
2018-10-16 05:12:38.926 | Unsafe following distance | 1
2018-10-16 05:12:39.615 | Unsafe tail distance | 1
2018-10-16 05:12:43.155 | Overspeed | 1
Joining
Stream to Static (Table) Join Stream to Stream Join (one window join)
Stream to Stream Join (two window join)
Stream-to-
Static Join
Stream-to-
Stream
Join
Stream-to-
Stream
Join
TimeTime
Time
(VIII) – Join Table to enrich with Driver data
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
Join Table to enrich with Driver data
#!/bin/bash
curl -X "POST" "http://192.168.69.138:8083/connectors" 
-H "Content-Type: application/json" 
-d $'{
"name": "jdbc-driver-source",
"config": {
"connector.class": "JdbcSourceConnector",
"connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample",
"mode": "timestamp",
"timestamp.column.name":"last_update",
"table.whitelist":"driver",
"validate.non.null":"false",
"topic.prefix":"truck_",
"key.converter":"org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter":"org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"name": "jdbc-driver-source",
"transforms":"createKey,extractInt",
"transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey",
"transforms.createKey.fields":"id",
"transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key",
"transforms.extractInt.field":"id"
}
}'
Create Table with Driver State
ksql> CREATE TABLE driver_t 
(id BIGINT, 
first_name VARCHAR, 
last_name VARCHAR, 
available VARCHAR) 
WITH (kafka_topic='truck_driver', 
value_format='JSON', 
key='id');
Message
----------------
Table created
Create Table with Driver State
ksql> CREATE STREAM dangerous_driving_and_driver_s 
WITH (kafka_topic='dangerous_driving_and_driver_s', 
value_format='JSON’, partitions=8) 
AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype,
latitude, longitude 
FROM truck_position_s 
LEFT JOIN driver_t 
ON dangerous_driving_and_driver_s.driverId = driver_t.id;
Message
----------------------------
Stream created and running
ksql> select * from dangerous_driving_and_driver_s;
1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure |
39.01 | -93.85
1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following
distance | 39.0 | -93.65
(IX) – Custom UDF for calculating Geohash
Truck
Driver
kdbc-to-
kafka
truck-
driver
27, Walter, Ward, Y,
24-JUL-85, 2017-10-
02 15:19:00
Table
join dangerous-
driving & driver
Stream
Dangerous-
driving & driver
detect_dangero
us_driving
truck/nn/
position
mqtt-to-
kafka
truck-
position
Stream
Stream
Dangerous-
driving
count_by_
eventType
Table
Dangergous-
driving-count
{"id":27,"firstName":"Walter","lastName":"W
ard","available":"Y","birthdate":"24-JUL-
85","last_update":1506923052012}
{"timestamp":1537343400827,"truckId":87,
"driverId":13,"routeId":987179512,"eventType":"Normal",
,"latitude":38.65,"longitude":-90.21, "correlationId":"-
3208700263746910537"}
Position &
Driving Info
dangerous
driving by geo
Stream
dangerous-
drving-geohash
Custom UDF for calculating Geohashes
Geohash is a geocoding which encodes a
geographic location into a short string of letters
and digits
hierarchical spatial data structure which
subdivides space into buckets of grid shape
Length Area width x height
1 5,009.4km x 4,992.6km
2 1,252.3km x 624.1km
3 156.5km x 156km
4 39.1km x 19.5km
5 39.1km x 19.5km
12 3.7cm x 1.9cm
ksql> SELECT latitude, longitude, 
geohash(latitude, longitude, 4) 
FROM dangerous_driving_s;
38.31 | -91.07 | 9yz1
37.7 | -92.61 | 9ywn
34.78 | -92.31 | 9ynm
42.23 | -91.78 | 9zw8xw
...
http://geohash.gofreerange.com/
Add an UDF sample
Geohash and join to some important messages for drivers
@UdfDescription(name = "geohash",
description = "returns the geohash for a given LatLong")
public class GeoHashUDF {
@Udf(description = "encode lat/long to geohash of specified length.")
public String geohash(final double latitude, final double longitude,
int length) {
return GeoHash.encodeHash(latitude, longitude, length);
}
@Udf(description = "encode lat/long to geohash.")
public String geohash(final double latitude, final double longitude) {
return GeoHash.encodeHash(latitude, longitude);
}
}
Summary
Summary
Two ways to bring in MQTT data => MQTT Connector or MQTT Proxy
KSQL is another way to work with data in Kafka => you can (re)use some of your SQL
knowledge
• Similar semantics to SQL, but is for queries on continuous, streaming data
Well-suited for structured data (there is the "S" in KSQL)
There is more
• Stream to Stream Join
• REST API for executing KSQL
• Avro Format & Schema Registry
• Using Kafka Connect to write results to data stores
• …
Choosing the Right API
• Java, c#, c++, scala,
phyton, node.js,
go, php …
• subscribe()
• poll()
• send()
• flush()
• Anything Kafka
• Fluent Java API
• mapValues()
• filter()
• flush()
• Stream Analytics
• SQL dialect
• SELECT … FROM …
• JOIN ... WHERE
• GROUP BY
• Stream Analytics
Consumer,
Producer API
Kafka Streams KSQL
• Declarative
• Configuration
• REST API
• Out-of-the-box
connectors
• Stream Integration
Kafka Connect
Flexibility Simplicity
Source: adapted from Confluent
Technology on its own won't help you.
You need to know how to use it properly.

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!ksqlDB - Stream Processing simplified!
ksqlDB - Stream Processing simplified!
 
Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?Kafka Streams: What it is, and how to use it?
Kafka Streams: What it is, and how to use it?
 
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
Apache Kafka vs. Integration Middleware (MQ, ETL, ESB) - Friends, Enemies or ...
 
Service mesh
Service meshService mesh
Service mesh
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
Building an Interactive Query Service in Kafka Streams With Bill Bejeck | Cur...
 
Building flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on QuarkusBuilding flexible ETL pipelines with Apache Camel on Quarkus
Building flexible ETL pipelines with Apache Camel on Quarkus
 
Event Hub & Azure Stream Analytics
Event Hub & Azure Stream AnalyticsEvent Hub & Azure Stream Analytics
Event Hub & Azure Stream Analytics
 
When NOT to use Apache Kafka?
When NOT to use Apache Kafka?When NOT to use Apache Kafka?
When NOT to use Apache Kafka?
 
Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...Practical tips and tricks for Apache Kafka messages integration | Francesco T...
Practical tips and tricks for Apache Kafka messages integration | Francesco T...
 
Securing Kafka
Securing Kafka Securing Kafka
Securing Kafka
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Kubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy AgentKubernetes Security with Calico and Open Policy Agent
Kubernetes Security with Calico and Open Policy Agent
 
Westpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache KafkaWestpac Bank Tech Talk 1: Dive into Apache Kafka
Westpac Bank Tech Talk 1: Dive into Apache Kafka
 
CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton CI-CD Jenkins, GitHub Actions, Tekton
CI-CD Jenkins, GitHub Actions, Tekton
 
Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)
Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)
Do flink on web with flow - Dongwon Kim & Haemee park, SK Telecom)
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
Istio on Kubernetes
Istio on KubernetesIstio on Kubernetes
Istio on Kubernetes
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
옵저버빌러티(Observability) 확보로 서버리스 마이크로서비스 들여다보기 - 김형일 AWS 솔루션즈 아키텍트 :: AWS Summi...
 

Similar a Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams/KSQL

Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
Guido Schmutz
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
Timothy Spann
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Guido Schmutz
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Guido Schmutz
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
Guido Schmutz
 
OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009
marpierc
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
Guido Schmutz
 

Similar a Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams/KSQL (20)

Event Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data ArchitectureEvent Broker (Kafka) in a Modern Data Architecture
Event Broker (Kafka) in a Modern Data Architecture
 
Kafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around KafkaKafka Connect & Streams - the ecosystem around Kafka
Kafka Connect & Streams - the ecosystem around Kafka
 
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and FlinkDBA Fundamentals Group: Continuous SQL with Kafka and Flink
DBA Fundamentals Group: Continuous SQL with Kafka and Flink
 
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka coreKafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
Kafka Connect & Kafka Streams/KSQL - powerful ecosystem around Kafka core
 
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around KafkaKafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
Kafka Connect & Kafka Streams/KSQL - the ecosystem around Kafka
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
 
Discover How Volvo Cars Uses a Time Series Database to Become Data-Driven
Discover How Volvo Cars Uses a Time Series Database to Become Data-DrivenDiscover How Volvo Cars Uses a Time Series Database to Become Data-Driven
Discover How Volvo Cars Uses a Time Series Database to Become Data-Driven
 
Programming IoT Gateways with macchina.io
Programming IoT Gateways with macchina.ioProgramming IoT Gateways with macchina.io
Programming IoT Gateways with macchina.io
 
Event Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data ArchitectureEvent Hub (i.e. Kafka) in Modern Data Architecture
Event Hub (i.e. Kafka) in Modern Data Architecture
 
Building Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache KafkaBuilding Event-Driven (Micro) Services with Apache Kafka
Building Event-Driven (Micro) Services with Apache Kafka
 
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & KafkaSelf-Service Data Ingestion Using NiFi, StreamSets & Kafka
Self-Service Data Ingestion Using NiFi, StreamSets & Kafka
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
 
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
Apache Pulsar with MQTT for Edge Computing - Pulsar Summit Asia 2021
 
KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!KSQL - Stream Processing simplified!
KSQL - Stream Processing simplified!
 
CargoChain Brochure - Technology
CargoChain Brochure - TechnologyCargoChain Brochure - Technology
CargoChain Brochure - Technology
 
OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009OGCE Overview for SciDAC 2009
OGCE Overview for SciDAC 2009
 
Apache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing PlatformApache Kafka - A modern Stream Processing Platform
Apache Kafka - A modern Stream Processing Platform
 
Io t data streaming
Io t data streamingIo t data streaming
Io t data streaming
 

Más de confluent

Más de confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 

Ingesting and Processing IoT Data Using MQTT, Kafka Connect and Kafka Streams/KSQL

  • 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Ingesting and Processing IoT Data - using MQTT, Kafka Connect and KSQL Guido Schmutz Kafka Summit 2018 – 16.10.2018 @gschmutz guidoschmutz.wordpress.com
  • 2. Guido Schmutz Working at Trivadis for more than 21 years Oracle ACE Director for Fusion Middleware and SOA Consultant, Trainer Software Architect for Java, Oracle, SOA and Big Data / Fast Data Head of Trivadis Architecture Board Technology Manager @ Trivadis More than 30 years of software development experience Contact: guido.schmutz@trivadis.com Blog: http://guidoschmutz.wordpress.com Slideshare: http://www.slideshare.net/gschmutz Twitter: gschmutz
  • 3. Agenda 1. Introduction 2. IoT Logistics use case – Kafka Ecosystem "in Action” 3. Stream Data Integration – IoT Device to Kafka over MQTT 4. Stream Analytics with KSQL 5. Summary
  • 5. Hadoop Clusterd Hadoop Cluster Big Data Reference Architecture for Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 6. Hadoop Clusterd Hadoop Cluster Big Data Reference Architecture for Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined SQL Export Microservice State { } API Event Stream Event Stream Search Service Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File IoT Data Mobile Apps Social Event Stream Telemetry Stream Processor State { } API Stream Analytics Results DB
  • 7. Two Types of Stream Processing (from Gartner) Stream Data Integration • Primarily cover streaming ETL • Integration of data source and data sinks • Filter and transform data • (Enrich data) • Route data Stream Analytics • analytics use cases • calculating aggregates and detecting patterns to generate higher-level, more relevant summary information (complex events => used to be CEP) • Complex events may signify threats or opportunities that require a response
  • 8. Stream Integration and Stream Analytics with Kafka Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 9. Stream Data Integration and Stream Analytics with Kafka Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 10. Hadoop Clusterd Hadoop Cluster Big Data Unified Architecture for Modern Data Analytics Solutions SQL Search Service BI Tools Enterprise Data Warehouse Search / Explore File Import / SQL Import Event Hub D ata Flow D ata Flow Change DataCapture Parallel Processing Storage Storage RawRefined Results SQL Export Microservice State { } API Stream Processor State { } API Event Stream Event Stream Search Service Stream Analytics Microservices Enterprise Apps Logic { } API Edge Node Rules Event Hub Storage Bulk Source Event Source Location DB Extract File DB IoT Data Mobile Apps Social Event Stream Telemetry
  • 11. Various IoT Data Protocols • MQTT (Message Queue Telemetry Transport) • CoaP • AMQP • DDS (Data Distribution Service) • STOMP • REST • WebSockets • …
  • 12. IoT Logistics use case – Kafka Ecosystem "in Action"
  • 13. Demo - IoT Logistics Use Case Trucks are sending driving info and geo-position data in one single message Position & Driving Info Testdata-Generator originally by Hortonworks {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} { "timestamp":1537343400827, "truckId":87, "driverId":13, "routeId":987179512, "eventType":"Normal", "latitude":38.65, "longitude":-90.21, "correlationId":"-32087002637” } ?
  • 14. Stream Data Integration – IoT Device to Kafka over MQTT
  • 15. Stream Data Integration Source Connector trucking_ driver Kafka Broker Sink Connector Stream Processing
  • 16. (I) IoT Device sends data via MQTT Message Queue Telemetry Transport (MQTT) Pub/Sub architecture with Message Broker Built in retry / QoS mechanism Last Will and Testament (LWT) Not all MQTT brokers are scalable Available Does not provide state (history) truck/nn/ position {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 17. MQTT to Kafka using Confluent MQTT Connector
  • 18. IoT Device sends data via MQTTs – how to get the data into Kafka? truck position truck/nn/ position ? {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 19. 2 Ways for MQTT with Confluent Streaming Platform Confluent MQTT Connector (Preview) • Pull-based • integrate with (existing) MQTT servers • can be used both as a Source and Sink • output is an envelope with all of the properties of the incoming message • Value: body of MQTT message • Key: is the MQTT topic the message was written to • Can consume multiple MQTT topics and write to one single Kafka topic • RegexRouter SMT can be used to change topic names Confluent MQTT Proxy • Push-based • enables MQTT clients to use the MQTT protocol to publish data directly to Kafka • MQTT Proxy is stateless and independent of other instances • simple mapping scheme of MQTT topics to Kafka topics based on regular expressions • reduced lag in message publishing compared to traditional MQTT brokers
  • 20. (II) MQTT to Kafka using Confluent MQTT Connector truck/nn/ position mqtt to kafka truck_position kafkacat {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 21. Confluent MQTT Connector Currently available as a Preview on Confluent Hub Setup plugin.path to specify the additional folder confluent-hub install confluentinc/kafka-connect-mqtt:1.0.0-preview plugin.path=/usr/share/java,/etc/kafka-connect/custom-plugins, /usr/share/confluent-hub-components
  • 22. Create an instance of Confluent MQTT Connector #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "mqtt-source", "config": { "connector.class": "io.confluent.connect.mqtt.MqttSourceConnector", "tasks.max": "1", "name": "mqtt-source", "mqtt.server.uri": "tcp://mosquitto:1883", "mqtt.topics": "truck/+/position", "kafka.topic":"truck_position", "mqtt.clean.session.enabled":"true", "mqtt.connect.timeout.seconds":"30", "mqtt.keepalive.interval.seconds":"60", "mqtt.qos":"0" } }'
  • 23. MQTTProxy (III) MQTT to Kafka using Confluent MQTT Proxy truck position engine metrics console consumer Engine Metrics {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 24. Configure MQTT Proxy Configure MQTT Proxy Start MQTT Proxy topic.regex.list=truck_position:.*position, engine_metric:.*engine_metric listeners=0.0.0.0:1883 bootstrap.servers=PLAINTEXT://broker-1:9092 confluent.topic.replication.factor=1 bin/kafka-mqtt-start kafka-mqtt.properties
  • 25. MQTTProxy MQTT Connector vs. MQTT Proxy MQTT Connector • Pull-based • Use existing MQTT infrastructures • Bi-directional MQTT Proxy • Push-based • Does not provide all MQTT functionality • Only uni-directional Position Position Position truck/nn/ driving info mqtt to kafka truck driving info truck/nn/ position mqtt to kafka truck position Position Position Position truck/nn/ driving info mqtt to kafka truck/nn/ position mqtt to kafka Position Position Position truck driving info truck position Position Position Position REGION-1 DC REGION-2 DC REGION-1 DC REGION-2 DC Headquarter DC Headquarter DC
  • 26. (IV) MQTT to Kafka using StreamSets Data Collector truck/nn/ position mqtt to kafka truck_position console consumer {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 27. MQTT to Kafka using StreamSets Data Collector
  • 28. MQTT Proxy Wait … there is more …. truck/nn/ position mqtt to kafka truck_driving info truck_position console consumer what about some analytics ? console consumer {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info Position & Driving Info {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"}
  • 31. KSQL - Terminology Stream • “History” • an unbounded sequence of structured data ("facts") • Facts in a stream are immutable • new facts can be inserted to a stream • existing facts can never be updated or deleted • Streams can be created from a Kafka topic or derived from an existing stream Table • “State” • a view of a stream, or another table, and represents a collection of evolving facts • Facts in a table are mutable • new facts can be inserted to the table • existing facts can be updated or deleted • Tables can be created from a Kafka topic or derived from existing streams and tables Enables stream processing with zero coding required The simplest way to process streams of data in real-time
  • 32. (V) Create STREAM on truck_position and use it in KSQL CLI truck/nn/ position mqtt-to- kafka truck- position Stream {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info KSQL CLI
  • 33. Create a STREAM on truck_driving_info ksql> CREATE STREAM truck_driving_info_s (ts VARCHAR, truckId VARCHAR, driverId BIGINT, routeId BIGINT, eventType VARCHAR, latitude DOUBLE, longitude DOUBLE, correlationId VARCHAR) WITH (kafka_topic='truck_driving_info', value_format=‘JSON'); Message ---------------- Stream created
  • 34. Create a STREAM on truck_driving_info ksql> describe truck_position_s; Field | Type --------------------------------- ROWTIME | BIGINT ROWKEY | VARCHAR(STRING) TS | VARCHAR(STRING) TRUCKID | VARCHAR(STRING) DRIVERID | BIGINT ROUTEID | BIGINT EVENTTYPE | VARCHAR(STRING) LATITUDE | DOUBLE LONGITUDE | DOUBLE CORRELATIONID | VARCHAR(STRING)
  • 35. KSQL - SELECT Selects rows from a KSQL stream or table Result of this statement will not be persisted in a Kafka topic and will only be printed out in the console from_item is one of the following: stream_name, table_name SELECT select_expr [, ...] FROM from_item [ LEFT JOIN join_table ON join_criteria ] [ WINDOW window_expression ] [ WHERE condition ] [ GROUP BY grouping_expression ] [ HAVING having_expression ] [ LIMIT count ];
  • 36. Use SELECT to browse from Stream ksql> SELECT * FROM truck_driving_info_s; 1539711991642 | truck/24/position | null | 24 | 10 | 1198242881 | Normal | 36.84 | -94.83 | -6187001306629414077 1539711991691 | truck/26/position | null | 26 | 13 | 1390372503 | Normal | 42.04 | -88.02 | -6187001306629414077 1539711991882 | truck/66/position | null | 66 | 22 | 1565885487 | Normal | 38.33 | -94.35 | -6187001306629414077 1539711991902 | truck/22/position | null | 22 | 26 | 1198242881 | Normal | 36.73 | -95.01 | -6187001306629414077 ksql> SELECT * FROM truck_position_s WHERE eventType != 'Normal'; 1539712101614 | truck/67/position | null | 67 | 11 | 160405074 | Lane Departure | 38.98 | -92.53 | -6187001306629414077 1539712116450 | truck/18/position | null | 18 | 25 | 987179512 | Overspeed | 40.76 | -88.77 | -6187001306629414077 1539712120102 | truck/31/position | null | 31 | 12 | 927636994 | Unsafe following distance | 38.22 | -91.18 | -6187001306629414077
  • 37. (VI) – CREATE AS … SELECT … detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 38. CREATE STREAM … AS SELECT … Create a new KSQL table along with the corresponding Kafka topic and stream the result of the SELECT query as a changelog into the topic WINDOW clause can only be used if the from_item is a stream CREATE STREAM stream_name [WITH ( property_name = expression [, ...] )] AS SELECT select_expr [, ...] FROM from_stream [ LEFT | FULL | INNER ] JOIN [join_table | join_stream] [ WITHIN [(before TIMEUNIT, after TIMEUNIT) | N TIMEUNIT] ] ON join_criteria [ WHERE condition ] [PARTITION BY column_name];
  • 39. INSERT INTO … AS SELECT … Stream the result of the SELECT query into an existing stream and its underlying topic schema and partitioning column produced by the query must match the stream’s schema and key If the schema and partitioning column are incompatible with the stream, then the statement will return an error stream_name and from_item must both refer to a Stream. Tables are not supported! CREATE STREAM stream_name ...; INSERT INTO stream_name SELECT select_expr [., ...] FROM from_stream [ WHERE condition ] [ PARTITION BY column_name ];
  • 40. CREATE AS … SELECT … ksql> CREATE STREAM dangerous_driving_s WITH (kafka_topic= dangerous_driving_s', value_format='JSON') AS SELECT * FROM truck_position_s WHERE eventtype != 'Normal'; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_s; 1539712399201 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 38.65 | -90.21 | -6187001306629414077 1539712416623 | truck/67/position | null | 67 | 11 | 160405074 | Unsafe following distance | 39.1 | -94.59 | -6187001306629414077 1539712430051 | truck/18/position | null | 18 | 25 | 987179512 | Lane Departure | 35.1 | -90.07 | -6187001306629414077
  • 41. Windowing streams are unbounded need some meaningful time frames to do computations (i.e. aggregations) Computations over events done using windows of data Windows are tracked per unique key Fixed Window Sliding Window Session Window Time Stream of Data Window of Data
  • 42. (VII) Aggregate and Window detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 43. SELECT COUNT … GROUP BY ksql> CREATE TABLE dangerous_driving_count AS SELECT eventType, count(*) nof FROM dangerous_driving_s WINDOW TUMBLING (SIZE 30 SECONDS) GROUP BY eventType; Message ---------------------------- Table created and running ksql> SELECT TIMESTAMPTOSTRING(ROWTIME, 'yyyy-MM-dd HH:mm:ss.SSS’), eventType, nof FROM dangerous_driving_count;; 2018-10-16 05:12:19.408 | Unsafe following distance | 1 2018-10-16 05:12:38.926 | Unsafe following distance | 1 2018-10-16 05:12:39.615 | Unsafe tail distance | 1 2018-10-16 05:12:43.155 | Overspeed | 1
  • 44. Joining Stream to Static (Table) Join Stream to Stream Join (one window join) Stream to Stream Join (two window join) Stream-to- Static Join Stream-to- Stream Join Stream-to- Stream Join TimeTime Time
  • 45. (VIII) – Join Table to enrich with Driver data Truck Driver kdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream Dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info
  • 46. Join Table to enrich with Driver data #!/bin/bash curl -X "POST" "http://192.168.69.138:8083/connectors" -H "Content-Type: application/json" -d $'{ "name": "jdbc-driver-source", "config": { "connector.class": "JdbcSourceConnector", "connection.url":"jdbc:postgresql://db/sample?user=sample&password=sample", "mode": "timestamp", "timestamp.column.name":"last_update", "table.whitelist":"driver", "validate.non.null":"false", "topic.prefix":"truck_", "key.converter":"org.apache.kafka.connect.json.JsonConverter", "key.converter.schemas.enable": "false", "value.converter":"org.apache.kafka.connect.json.JsonConverter", "value.converter.schemas.enable": "false", "name": "jdbc-driver-source", "transforms":"createKey,extractInt", "transforms.createKey.type":"org.apache.kafka.connect.transforms.ValueToKey", "transforms.createKey.fields":"id", "transforms.extractInt.type":"org.apache.kafka.connect.transforms.ExtractField$Key", "transforms.extractInt.field":"id" } }'
  • 47. Create Table with Driver State ksql> CREATE TABLE driver_t (id BIGINT, first_name VARCHAR, last_name VARCHAR, available VARCHAR) WITH (kafka_topic='truck_driver', value_format='JSON', key='id'); Message ---------------- Table created
  • 48. Create Table with Driver State ksql> CREATE STREAM dangerous_driving_and_driver_s WITH (kafka_topic='dangerous_driving_and_driver_s', value_format='JSON’, partitions=8) AS SELECT driverId, first_name, last_name, truckId, routeId, eventtype, latitude, longitude FROM truck_position_s LEFT JOIN driver_t ON dangerous_driving_and_driver_s.driverId = driver_t.id; Message ---------------------------- Stream created and running ksql> select * from dangerous_driving_and_driver_s; 1539713095921 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Lane Departure | 39.01 | -93.85 1539713113254 | 11 | 11 | Micky | Isaacson | 67 | 160405074 | Unsafe following distance | 39.0 | -93.65
  • 49. (IX) – Custom UDF for calculating Geohash Truck Driver kdbc-to- kafka truck- driver 27, Walter, Ward, Y, 24-JUL-85, 2017-10- 02 15:19:00 Table join dangerous- driving & driver Stream Dangerous- driving & driver detect_dangero us_driving truck/nn/ position mqtt-to- kafka truck- position Stream Stream Dangerous- driving count_by_ eventType Table Dangergous- driving-count {"id":27,"firstName":"Walter","lastName":"W ard","available":"Y","birthdate":"24-JUL- 85","last_update":1506923052012} {"timestamp":1537343400827,"truckId":87, "driverId":13,"routeId":987179512,"eventType":"Normal", ,"latitude":38.65,"longitude":-90.21, "correlationId":"- 3208700263746910537"} Position & Driving Info dangerous driving by geo Stream dangerous- drving-geohash
  • 50. Custom UDF for calculating Geohashes Geohash is a geocoding which encodes a geographic location into a short string of letters and digits hierarchical spatial data structure which subdivides space into buckets of grid shape Length Area width x height 1 5,009.4km x 4,992.6km 2 1,252.3km x 624.1km 3 156.5km x 156km 4 39.1km x 19.5km 5 39.1km x 19.5km 12 3.7cm x 1.9cm ksql> SELECT latitude, longitude, geohash(latitude, longitude, 4) FROM dangerous_driving_s; 38.31 | -91.07 | 9yz1 37.7 | -92.61 | 9ywn 34.78 | -92.31 | 9ynm 42.23 | -91.78 | 9zw8xw ... http://geohash.gofreerange.com/
  • 51. Add an UDF sample Geohash and join to some important messages for drivers @UdfDescription(name = "geohash", description = "returns the geohash for a given LatLong") public class GeoHashUDF { @Udf(description = "encode lat/long to geohash of specified length.") public String geohash(final double latitude, final double longitude, int length) { return GeoHash.encodeHash(latitude, longitude, length); } @Udf(description = "encode lat/long to geohash.") public String geohash(final double latitude, final double longitude) { return GeoHash.encodeHash(latitude, longitude); } }
  • 53. Summary Two ways to bring in MQTT data => MQTT Connector or MQTT Proxy KSQL is another way to work with data in Kafka => you can (re)use some of your SQL knowledge • Similar semantics to SQL, but is for queries on continuous, streaming data Well-suited for structured data (there is the "S" in KSQL) There is more • Stream to Stream Join • REST API for executing KSQL • Avro Format & Schema Registry • Using Kafka Connect to write results to data stores • …
  • 54. Choosing the Right API • Java, c#, c++, scala, phyton, node.js, go, php … • subscribe() • poll() • send() • flush() • Anything Kafka • Fluent Java API • mapValues() • filter() • flush() • Stream Analytics • SQL dialect • SELECT … FROM … • JOIN ... WHERE • GROUP BY • Stream Analytics Consumer, Producer API Kafka Streams KSQL • Declarative • Configuration • REST API • Out-of-the-box connectors • Stream Integration Kafka Connect Flexibility Simplicity Source: adapted from Confluent
  • 55. Technology on its own won't help you. You need to know how to use it properly.