Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020

Data in Motion
Building Stream-Based
Architectures with Qlik
Replicate & Kafka
John Neal
Senior Solution Architect
Qlik Partner Engineering

2
Qlik Data
Integration
Quick Overview

3
Data Warehouse Automation
Streaming Data Pipeline Automation
Design, Manage & Monitor
Modernize and Automate Data Integration
CDC Streaming
Azure
SQL DW
Amazon
Redshift
Managed Data Lake Creation
Generate
Change Data
Streams
Deliver
To Clouds,
Lakes…
Refine &
Merge
For Analytics,
AI/ML, Data
Science…
AI/ML
Analytics
Data
Science
Model
Commit
Conform
Consume
Catalog
Shop, Prepare & Provision
Catalog
Shop, Prepare & Provision
RDBMS
Data Warehouse
Files
Mainframe
SAAS
APPS
SAP
Amazon RDS Azure SQL DB
Google Cloud SQL

4
Streaming Data Pipeline Automation
Design, Manage & Monitor
Our Focus for Today: Qlik Replicate & Kafka
Generate
Change Data
Streams
Deliver
To Clouds,
Lakes…
Refine &
Merge
For Analytics,
AI/ML, Data
Science…
RDBMS
Data Warehouse
Files
Mainframe
SAAS
APPS
SAP

5
TARGET SCHEMA
CREATION
SAP
RDBMS
EDW
FILE
MAINFRAME
HETEROGENEOUS
DATA TYPE MAPPING
BATCH TO CDC
TRANSITION
DDL CHANGE
PROPAGATION
FILTERING
TRANSFORMATIONS
RDBMS
EDW
FILES
STREAMING
DATA LAKE
Log Based
CDC
BATCH
IN-MEMORY
Replicate
Qlik Replicate
Automated Real-time Data Delivery

6
Physics 101
As It Applies to Data

7
An object will not change its
motion unless acted on by an
unbalanced force.
• If it is at rest, it will stay at rest
• If it is in motion, it will remain at the
same velocity
Corollary: Objects with greater mass
have more inertia. It therefore takes
more force to change their motion.
Newton’s First Law of
Motion
Inertia

8
Data in motion tends to stay in motion until it
comes rest on disk.
Similarly, if data is at rest, it will remain at rest
until an external “force” puts it in motion
again.
— John Neal *
* With apologies to Sir Isaac Newton

9
Writing Data to a Database Introduces Friction
Data in Motion
Friction
How do we get the
data moving
again?
STOP

10
Get Landed Data Moving
Overcoming Storage “Friction”
File I/O (reads)
• Parsing challenges
• No deltas
Database Queries
• Not real-time
• Added database load
Database Triggers
• Doesn’t scale
ETL Tools
• Not real-time
• Getting deltas is hard
Qlik Replicate
• Real-time
• Reads the DB logs
• CDC provides delta processing

11
Getting Data in
Motion Again
With Qlik Replicate & Kafka

12
“Modern” Applications Leverage Microservices
• Components are “decoupled” and have well-defined interfaces
- Changes are easier to make because they are localized and isolated
- Results in increased reliability
- Allows for a faster release schedule supporting agile approaches
- Increases opportunity to innovate
• Microservices can use “purpose built” storage rather than a central
repository
- Teams are free to choose the most appropriate repository for the problem
at hand … a relational database is not always the answer.
• Data flows between components
Microservices

13
Data Catalog
Microservice-Based Applications
A Bucket of Bricks
Data Warehouse
Automation
Media
Data Streaming
(CDC)
Analytics
Security
Kafka
Streaming
Services
Event Processing
RDBMS
Wide-Column
Store
Spark /
ML
Cloud DW
Hadoop
Key-Value
Store
Graph DB
(NoSQL)
File Storage
Document
Store
(NoSQL)
IoT
Qlik

14
Lambda-Style Architectures
Streaming and batch working together
NoSQL
IoT
Mobile
Apps
Web
Legacy
DB/DW
Incoming Data
Streaming (Speed) Layer
Serving Layer
Batch Layer
Stream Processing
(Spark Streaming,
Storm, Flink, …)
Incremental
Views
All Data Pre-Compute
Views
(Spark, M/R, HQL, …)
Real-time Views
Batch Views
Queries /
ML /
Analytics
Ingest & Store Prepare / Curate Publish ConsumeData

15
Kappa-Style Architectures
Where everything is a stream
Streaming Data
Streaming Layer
Stream Processing
(Spark Streaming,
Storm, Flink, …)
Real-time Results
Serving Layer
Real-time View
Queries /
ML /
Analytics
Mirror events
to long term
storage
Storage Layer
Raw Data History
Re-compute
events from
storage if
needed
Historical View

16
Making Rubber
Meet the Road
Innovate by Keeping Data in Motion

17
Source
Legacy
SAP
Kafka
Streaming Data with Qlik Replicate
And then what?
Qlik
Replicate
CDC

18
Source
Data
Kafka
A Real-World Example
Credit Card Authorization
Qlik
Replicate
Spark
HBase
Hive
Machine
Learning
Decision
Service
Engine
Analytics
Application
Spark
Models
Data Lake
CDC

20
Sample Records – Willie Mays
Load / Reload, UPDATE
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA",
"birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "",
"deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170,
"height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID":
"maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "REFRESH",
"changeSequence": "", "timestamp": "", "streamPosition": "", "transactionId": "", "changeMask": null,
"columnMask": null, "transactionEventCounter": null, "transactionLastEvent": null}}
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry":
"NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "",
"deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard",
"weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09",
"retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": {"playerID": "mayswi01", "birthYear": 1931,
"birthMonth": 5, "birthDay": 6, "birthCountry": "USA", "birthState": "AL", "birthCity": "Westfield",
"deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast":
"Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut":
"1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "headers":
{"operation": "UPDATE", "changeSequence": "20200713204536000000000000000110813", "timestamp": "2020-07-
13T20:45:36.000", "streamPosition": "mysql-bin.000004:415943395:20:415951456:17592712139:mysql-
bin.000004:412843032", "transactionId": "000000000000000000000004189B7BCB", "changeMask": "000010",
"columnMask": "3FFFFF", "transactionEventCounter": 10962, "transactionLastEvent": false}}

21
Sample Records – Willie Mays
DELETE, INSERT
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry":
"NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "",
"deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard",
"weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09",
"retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "DELETE",
"changeSequence": "20200713204542000000000000000219813", "timestamp": "2020-07-13T20:45:42.000",
"streamPosition": "mysql-bin.000004:419832331:55:419840520:17598121412:mysql-bin.000004:418252305",
"transactionId": "00000000000000000000000418EE05C4", "changeMask": "000001", "columnMask": "3FFFFF",
"transactionEventCounter": 10962, "transactionLastEvent": false}}
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA",
"birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "",
"deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170,
"height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID":
"maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "INSERT", "changeSequence":
"20200713204606000000000000000297113", "timestamp": "2020-07-13T20:46:06.000", "streamPosition": "mysql-
bin.000004:422559929:1:422565460:17602420793:mysql-bin.000004:422551686", "transactionId":
"000000000000000000000004192FA039", "changeMask": "3FFFFF", "columnMask": "3FFFFF",
"transactionEventCounter": 62, "transactionLastEvent": false}}

23
Summarizing Key Points
Physics applies to data
Qlik Replicate delivers
data from databases to
Kafka in real-time.
“Modern” architectures
want data to be in
motion.
Kafka is a key
component.
Feedback loops can be
a useful way to keep
data moving

https://www.qlik.com/products/data-integration-products
john.neal@qlik.com

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020

Similar a Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020 (20)

Más de HostedbyConfluent

Más de HostedbyConfluent (20)

Último

Último (20)

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020