SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
Data in Motion
Building Stream-Based
Architectures with Qlik
Replicate & Kafka
John Neal
Senior Solution Architect
Qlik Partner Engineering
2
Qlik Data
Integration
Quick Overview
3
Data Warehouse Automation
Streaming Data Pipeline Automation
Design, Manage & Monitor
Modernize and Automate Data Integration
CDC Streaming
Azure
SQL DW
Amazon
Redshift
Managed Data Lake Creation
Generate
Change Data
Streams
Deliver
To Clouds,
Lakes…
Refine &
Merge
For Analytics,
AI/ML, Data
Science…
AI/ML
Analytics
Data
Science
Model
Commit
Conform
Consume
Catalog
Shop, Prepare & Provision
Catalog
Shop, Prepare & Provision
RDBMS
Data Warehouse
Files
Mainframe
SAAS
APPS
SAP
Amazon RDS Azure SQL DB
Google Cloud SQL
4
Streaming Data Pipeline Automation
Design, Manage & Monitor
Our Focus for Today: Qlik Replicate & Kafka
Generate
Change Data
Streams
Deliver
To Clouds,
Lakes…
Refine &
Merge
For Analytics,
AI/ML, Data
Science…
RDBMS
Data Warehouse
Files
Mainframe
SAAS
APPS
SAP
5
TARGET SCHEMA
CREATION
SAP
RDBMS
EDW
FILE
MAINFRAME
HETEROGENEOUS
DATA TYPE MAPPING
BATCH TO CDC
TRANSITION
DDL CHANGE
PROPAGATION
FILTERING
TRANSFORMATIONS
RDBMS
EDW
FILES
STREAMING
DATA LAKE
Log Based
CDC
BATCH
IN-MEMORY
Replicate
Qlik Replicate
Automated Real-time Data Delivery
6
Physics 101
As It Applies to Data
7
An object will not change its
motion unless acted on by an
unbalanced force.
• If it is at rest, it will stay at rest
• If it is in motion, it will remain at the
same velocity
Corollary: Objects with greater mass
have more inertia. It therefore takes
more force to change their motion.
Newton’s First Law of
Motion
Inertia
8
Data in motion tends to stay in motion until it
comes rest on disk.
Similarly, if data is at rest, it will remain at rest
until an external “force” puts it in motion
again.
— John Neal *
* With apologies to Sir Isaac Newton
9
Writing Data to a Database Introduces Friction
Data in Motion
Friction
How do we get the
data moving
again?
STOP
10
Get Landed Data Moving
Overcoming Storage “Friction”
File I/O (reads)
• Parsing challenges
• No deltas
Database Queries
• Not real-time
• Added database load
Database Triggers
• Added database load
• Doesn’t scale
ETL Tools
• Not real-time
• Added database load
• Getting deltas is hard
Qlik Replicate
• Real-time
• Reads the DB logs
• CDC provides delta processing
11
Getting Data in
Motion Again
With Qlik Replicate & Kafka
12
“Modern” Applications Leverage Microservices
• Components are “decoupled” and have well-defined interfaces
- Changes are easier to make because they are localized and isolated
- Results in increased reliability
- Allows for a faster release schedule supporting agile approaches
- Increases opportunity to innovate
• Microservices can use “purpose built” storage rather than a central
repository
- Teams are free to choose the most appropriate repository for the problem
at hand … a relational database is not always the answer.
• Data flows between components
Microservices
13
Data Catalog
Microservice-Based Applications
A Bucket of Bricks
Data Warehouse
Automation
Media
Data Streaming
(CDC)
Analytics
Security
Kafka
Streaming
Services
Event Processing
RDBMS
Wide-Column
Store
Spark /
ML
Cloud DW
Hadoop
Key-Value
Store
Graph DB
(NoSQL)
File Storage
Document
Store
(NoSQL)
IoT
Qlik
14
Lambda-Style Architectures
Streaming and batch working together
NoSQL
IoT
Mobile
Apps
Web
Legacy
DB/DW
Incoming Data
Streaming (Speed) Layer
Serving Layer
Batch Layer
Stream Processing
(Spark Streaming,
Storm, Flink, …)
Incremental
Views
All Data Pre-Compute
Views
(Spark, M/R, HQL, …)
Real-time Views
Batch Views
Queries /
ML /
Analytics
Ingest & Store Prepare / Curate Publish ConsumeData
15
Kappa-Style Architectures
Where everything is a stream
Streaming Data
Streaming Layer
Stream Processing
(Spark Streaming,
Storm, Flink, …)
Real-time Results
Serving Layer
Real-time View
Queries /
ML /
Analytics
Mirror events
to long term
storage
Storage Layer
Raw Data History
Re-compute
events from
storage if
needed
Historical View
Ingest & Store Prepare / Curate Publish ConsumeData
16
Making Rubber
Meet the Road
Innovate by Keeping Data in Motion
17
Source
Legacy
SAP
Kafka
Streaming Data with Qlik Replicate
Ingest & Store Prepare / Curate Publish ConsumeData
And then what?
Qlik
Replicate
CDC
18
Source
Data
Kafka
A Real-World Example
Credit Card Authorization
Ingest & Store Prepare / Curate Publish ConsumeData
Qlik
Replicate
Spark
HBase
Hive
Machine
Learning
Decision
Service
Engine
Analytics
Application
Spark
Models
Data Lake
CDC
19
Demo
20
Sample Records – Willie Mays
Load / Reload, UPDATE
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA",
"birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "",
"deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170,
"height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID":
"maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "REFRESH",
"changeSequence": "", "timestamp": "", "streamPosition": "", "transactionId": "", "changeMask": null,
"columnMask": null, "transactionEventCounter": null, "transactionLastEvent": null}}
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry":
"NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "",
"deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard",
"weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09",
"retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": {"playerID": "mayswi01", "birthYear": 1931,
"birthMonth": 5, "birthDay": 6, "birthCountry": "USA", "birthState": "AL", "birthCity": "Westfield",
"deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast":
"Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut":
"1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "headers":
{"operation": "UPDATE", "changeSequence": "20200713204536000000000000000110813", "timestamp": "2020-07-
13T20:45:36.000", "streamPosition": "mysql-bin.000004:415943395:20:415951456:17592712139:mysql-
bin.000004:412843032", "transactionId": "000000000000000000000004189B7BCB", "changeMask": "000010",
"columnMask": "3FFFFF", "transactionEventCounter": 10962, "transactionLastEvent": false}}
21
Sample Records – Willie Mays
DELETE, INSERT
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry":
"NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "",
"deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard",
"weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09",
"retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "DELETE",
"changeSequence": "20200713204542000000000000000219813", "timestamp": "2020-07-13T20:45:42.000",
"streamPosition": "mysql-bin.000004:419832331:55:419840520:17598121412:mysql-bin.000004:418252305",
"transactionId": "00000000000000000000000418EE05C4", "changeMask": "000001", "columnMask": "3FFFFF",
"transactionEventCounter": 10962, "transactionLastEvent": false}}
{"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA",
"birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "",
"deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170,
"height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID":
"maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "INSERT", "changeSequence":
"20200713204606000000000000000297113", "timestamp": "2020-07-13T20:46:06.000", "streamPosition": "mysql-
bin.000004:422559929:1:422565460:17602420793:mysql-bin.000004:422551686", "transactionId":
"000000000000000000000004192FA039", "changeMask": "3FFFFF", "columnMask": "3FFFFF",
"transactionEventCounter": 62, "transactionLastEvent": false}}
22
Wrapping Up
23
Summarizing Key Points
Physics applies to data
Qlik Replicate delivers
data from databases to
Kafka in real-time.
“Modern” architectures
want data to be in
motion.
Kafka is a key
component.
Feedback loops can be
a useful way to keep
data moving
https://www.qlik.com/products/data-integration-products
john.neal@qlik.com

Más contenido relacionado

La actualidad más candente

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
DataWorks Summit
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
DataWorks Summit
 

La actualidad más candente (20)

Managing 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with AmbariManaging 2000 Node Cluster with Ambari
Managing 2000 Node Cluster with Ambari
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Apache sqoop with an use case
Apache sqoop with an use caseApache sqoop with an use case
Apache sqoop with an use case
 
Introduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse ArchitectureIntroduction SQL Analytics on Lakehouse Architecture
Introduction SQL Analytics on Lakehouse Architecture
 
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
Apache Sqoop Tutorial | Sqoop: Import & Export Data From MySQL To HDFS | Hado...
 
Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)Unified Big Data Processing with Apache Spark (QCON 2014)
Unified Big Data Processing with Apache Spark (QCON 2014)
 
Introduction to Apache Calcite
Introduction to Apache CalciteIntroduction to Apache Calcite
Introduction to Apache Calcite
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 
Batch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache FlinkBatch and Stream Graph Processing with Apache Flink
Batch and Stream Graph Processing with Apache Flink
 
Kafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid CloudKafka for Real-Time Replication between Edge and Hybrid Cloud
Kafka for Real-Time Replication between Edge and Hybrid Cloud
 
Streaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache KafkaStreaming Data and Stream Processing with Apache Kafka
Streaming Data and Stream Processing with Apache Kafka
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
[DSC Europe 22] Overview of the Databricks Platform - Petar Zecevic
 
Microservices Patterns with GoldenGate
Microservices Patterns with GoldenGateMicroservices Patterns with GoldenGate
Microservices Patterns with GoldenGate
 
Azure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the CloudAzure Data Factory ETL Patterns in the Cloud
Azure Data Factory ETL Patterns in the Cloud
 
Spark streaming + kafka 0.10
Spark streaming + kafka 0.10Spark streaming + kafka 0.10
Spark streaming + kafka 0.10
 
Google Cloud Dataflow
Google Cloud DataflowGoogle Cloud Dataflow
Google Cloud Dataflow
 

Similar a Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020

Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
DataWorks Summit
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
Ajay Shriwastava
 
Couchbase Overview Nov 2013
Couchbase Overview Nov 2013Couchbase Overview Nov 2013
Couchbase Overview Nov 2013
Jeff Harris
 

Similar a Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020 (20)

Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...Paradigmas de procesamiento en  Big Data: estado actual,  tendencias y oportu...
Paradigmas de procesamiento en Big Data: estado actual, tendencias y oportu...
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Building a Data Lake on AWS
Building a Data Lake on AWSBuilding a Data Lake on AWS
Building a Data Lake on AWS
 
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
Slides: Proven Strategies for Hybrid Cloud Computing with Mainframes — From A...
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당::  AWS Summit Online Korea 2020AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당::  AWS Summit Online Korea 2020
AWS를 통한 데이터 분석 및 처리의 새로운 혁신 기법 - 김윤건, AWS사업개발 담당:: AWS Summit Online Korea 2020
 
Owning Your Own (Data) Lake House
Owning Your Own (Data) Lake HouseOwning Your Own (Data) Lake House
Owning Your Own (Data) Lake House
 
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
 
Designing big data analytics solutions on azure
Designing big data analytics solutions on azureDesigning big data analytics solutions on azure
Designing big data analytics solutions on azure
 
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services LayerLogical Data Warehouse: How to Build a Virtualized Data Services Layer
Logical Data Warehouse: How to Build a Virtualized Data Services Layer
 
Thu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjayThu-310pm-Impetus-SachinAndAjay
Thu-310pm-Impetus-SachinAndAjay
 
Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage Data Analytics Meetup: Introduction to Azure Data Lake Storage
Data Analytics Meetup: Introduction to Azure Data Lake Storage
 
Building Serverless Data Infrastructure in the AWS Cloud
Building Serverless Data Infrastructure in the AWS CloudBuilding Serverless Data Infrastructure in the AWS Cloud
Building Serverless Data Infrastructure in the AWS Cloud
 
Couchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data DemystifiedCouchbase Data Platform | Big Data Demystified
Couchbase Data Platform | Big Data Demystified
 
Couchbase Overview Nov 2013
Couchbase Overview Nov 2013Couchbase Overview Nov 2013
Couchbase Overview Nov 2013
 
Building Data Lakes with Apache Airflow
Building Data Lakes with Apache AirflowBuilding Data Lakes with Apache Airflow
Building Data Lakes with Apache Airflow
 
7 Databases in 70 minutes
7 Databases in 70 minutes7 Databases in 70 minutes
7 Databases in 70 minutes
 
Big Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS CloudBig Data Building Blocks with AWS Cloud
Big Data Building Blocks with AWS Cloud
 
Using Data Lakes
Using Data LakesUsing Data Lakes
Using Data Lakes
 
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
Best practices on Building a Big Data Analytics Solution (SQLBits 2018 Traini...
 

Más de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Más de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

Data in Motion: Building Stream-Based Architectures with Qlik Replicate & Kafka (John Neal, Qlik) Kafka Summit 2020

  • 1. Data in Motion Building Stream-Based Architectures with Qlik Replicate & Kafka John Neal Senior Solution Architect Qlik Partner Engineering
  • 3. 3 Data Warehouse Automation Streaming Data Pipeline Automation Design, Manage & Monitor Modernize and Automate Data Integration CDC Streaming Azure SQL DW Amazon Redshift Managed Data Lake Creation Generate Change Data Streams Deliver To Clouds, Lakes… Refine & Merge For Analytics, AI/ML, Data Science… AI/ML Analytics Data Science Model Commit Conform Consume Catalog Shop, Prepare & Provision Catalog Shop, Prepare & Provision RDBMS Data Warehouse Files Mainframe SAAS APPS SAP Amazon RDS Azure SQL DB Google Cloud SQL
  • 4. 4 Streaming Data Pipeline Automation Design, Manage & Monitor Our Focus for Today: Qlik Replicate & Kafka Generate Change Data Streams Deliver To Clouds, Lakes… Refine & Merge For Analytics, AI/ML, Data Science… RDBMS Data Warehouse Files Mainframe SAAS APPS SAP
  • 5. 5 TARGET SCHEMA CREATION SAP RDBMS EDW FILE MAINFRAME HETEROGENEOUS DATA TYPE MAPPING BATCH TO CDC TRANSITION DDL CHANGE PROPAGATION FILTERING TRANSFORMATIONS RDBMS EDW FILES STREAMING DATA LAKE Log Based CDC BATCH IN-MEMORY Replicate Qlik Replicate Automated Real-time Data Delivery
  • 6. 6 Physics 101 As It Applies to Data
  • 7. 7 An object will not change its motion unless acted on by an unbalanced force. • If it is at rest, it will stay at rest • If it is in motion, it will remain at the same velocity Corollary: Objects with greater mass have more inertia. It therefore takes more force to change their motion. Newton’s First Law of Motion Inertia
  • 8. 8 Data in motion tends to stay in motion until it comes rest on disk. Similarly, if data is at rest, it will remain at rest until an external “force” puts it in motion again. — John Neal * * With apologies to Sir Isaac Newton
  • 9. 9 Writing Data to a Database Introduces Friction Data in Motion Friction How do we get the data moving again? STOP
  • 10. 10 Get Landed Data Moving Overcoming Storage “Friction” File I/O (reads) • Parsing challenges • No deltas Database Queries • Not real-time • Added database load Database Triggers • Added database load • Doesn’t scale ETL Tools • Not real-time • Added database load • Getting deltas is hard Qlik Replicate • Real-time • Reads the DB logs • CDC provides delta processing
  • 11. 11 Getting Data in Motion Again With Qlik Replicate & Kafka
  • 12. 12 “Modern” Applications Leverage Microservices • Components are “decoupled” and have well-defined interfaces - Changes are easier to make because they are localized and isolated - Results in increased reliability - Allows for a faster release schedule supporting agile approaches - Increases opportunity to innovate • Microservices can use “purpose built” storage rather than a central repository - Teams are free to choose the most appropriate repository for the problem at hand … a relational database is not always the answer. • Data flows between components Microservices
  • 13. 13 Data Catalog Microservice-Based Applications A Bucket of Bricks Data Warehouse Automation Media Data Streaming (CDC) Analytics Security Kafka Streaming Services Event Processing RDBMS Wide-Column Store Spark / ML Cloud DW Hadoop Key-Value Store Graph DB (NoSQL) File Storage Document Store (NoSQL) IoT Qlik
  • 14. 14 Lambda-Style Architectures Streaming and batch working together NoSQL IoT Mobile Apps Web Legacy DB/DW Incoming Data Streaming (Speed) Layer Serving Layer Batch Layer Stream Processing (Spark Streaming, Storm, Flink, …) Incremental Views All Data Pre-Compute Views (Spark, M/R, HQL, …) Real-time Views Batch Views Queries / ML / Analytics Ingest & Store Prepare / Curate Publish ConsumeData
  • 15. 15 Kappa-Style Architectures Where everything is a stream Streaming Data Streaming Layer Stream Processing (Spark Streaming, Storm, Flink, …) Real-time Results Serving Layer Real-time View Queries / ML / Analytics Mirror events to long term storage Storage Layer Raw Data History Re-compute events from storage if needed Historical View Ingest & Store Prepare / Curate Publish ConsumeData
  • 16. 16 Making Rubber Meet the Road Innovate by Keeping Data in Motion
  • 17. 17 Source Legacy SAP Kafka Streaming Data with Qlik Replicate Ingest & Store Prepare / Curate Publish ConsumeData And then what? Qlik Replicate CDC
  • 18. 18 Source Data Kafka A Real-World Example Credit Card Authorization Ingest & Store Prepare / Curate Publish ConsumeData Qlik Replicate Spark HBase Hive Machine Learning Decision Service Engine Analytics Application Spark Models Data Lake CDC
  • 20. 20 Sample Records – Willie Mays Load / Reload, UPDATE {"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "REFRESH", "changeSequence": "", "timestamp": "", "streamPosition": "", "transactionId": "", "changeMask": null, "columnMask": null, "transactionEventCounter": null, "transactionLastEvent": null}} {"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "headers": {"operation": "UPDATE", "changeSequence": "20200713204536000000000000000110813", "timestamp": "2020-07- 13T20:45:36.000", "streamPosition": "mysql-bin.000004:415943395:20:415951456:17592712139:mysql- bin.000004:412843032", "transactionId": "000000000000000000000004189B7BCB", "changeMask": "000010", "columnMask": "3FFFFF", "transactionEventCounter": 10962, "transactionLastEvent": false}}
  • 21. 21 Sample Records – Willie Mays DELETE, INSERT {"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "NewCountry", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "DELETE", "changeSequence": "20200713204542000000000000000219813", "timestamp": "2020-07-13T20:45:42.000", "streamPosition": "mysql-bin.000004:419832331:55:419840520:17598121412:mysql-bin.000004:418252305", "transactionId": "00000000000000000000000418EE05C4", "changeMask": "000001", "columnMask": "3FFFFF", "transactionEventCounter": 10962, "transactionLastEvent": false}} {"data": {"playerID": "mayswi01", "birthYear": 1931, "birthMonth": 5, "birthDay": 6, "birthCountry": "USA", "birthState": "AL", "birthCity": "Westfield", "deathYear": "", "deathCountry": "", "deathState": "", "deathCity": "", "nameFirst": "Willie", "nameLast": "Mays", "nameGiven": "Willie Howard", "weight": 170, "height": 70, "bats": "R", "throws": "R", "debut": "1951-05-25", "finalGame": "1973-09-09", "retroID": "maysw101", "bbrefID": "mayswi01"}, "beforeData": null, "headers": {"operation": "INSERT", "changeSequence": "20200713204606000000000000000297113", "timestamp": "2020-07-13T20:46:06.000", "streamPosition": "mysql- bin.000004:422559929:1:422565460:17602420793:mysql-bin.000004:422551686", "transactionId": "000000000000000000000004192FA039", "changeMask": "3FFFFF", "columnMask": "3FFFFF", "transactionEventCounter": 62, "transactionLastEvent": false}}
  • 23. 23 Summarizing Key Points Physics applies to data Qlik Replicate delivers data from databases to Kafka in real-time. “Modern” architectures want data to be in motion. Kafka is a key component. Feedback loops can be a useful way to keep data moving