SlideShare una empresa de Scribd logo
1 de 46
©2018 Impetus Technologies, Inc. All rights reserved.
You are prohibited from making a copy or modification of, or from redistributing,
rebroadcasting, or re-encoding of this content without the prior written consent of
Impetus Technologies.
This presentation may include images from other products and services. These
images are used for illustrative purposes only. Unless explicitly stated there is no
implied endorsement or sponsorship of these products by Impetus Technologies. All
copyrights and trademarks are property of their respective owners.
Planning your Next-Gen Change Data Capture (CDC) Architecture
December 19, 2018
Agenda
What is CDC?
Various methods for CDC in the enterprise data warehouse
Key considerations for implementing a next-gen CDC architecture
Demo
Q&A
About Impetus
We exist to create powerful and intelligent enterprises through deep
data awareness, data integration, and data analytics.
About Impetus
Many of North America’s most respected and well-known brands trust
us as their strategic big data and analytics partner.
Transformation
Legacy EDW to
big data/cloud
Unification
Data processing,
preparation, and
access
Analytics
Real-time, machine
learning, and AI
Self-service
BI on big data/
cloud
End-to-End Big Data Solutions
What are the different change data capture use cases currently deployed
in your organization (choose all that apply)?
Continuous Ingestion in the Data Lake
Capturing streaming data changes
Database migration to cloud
Data preparation for analytics and ML jobs
We still have a legacy system
Our Speakers Today
SAURABH DUTTA
Technical Product Manager
SAMEER BHIDE
Senior Solutions Architect
What is Change Data Capture (CDC)?
CDC is the process of capturing changes made at the data source and
applying them throughout the enterprise.
Let’s Take a Closer Look
Source Database Target Database
Let’s Take a Closer Look
Source Database Target Database
WebApp
Let’s Take a Closer Look
Source Database Target Database
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Create
WebApp
Let’s Take a Closer Look
Change Data
Capture Event
Source Database Target Database
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Customer {
Telephone: “111”
}
Create
WebApp
Let’s Take a Closer Look
Change Data
Capture Event
Source Database Target Database
Customer {
Telephone: “222”
}
Customer {
Telephone: “222”
}
Customer {
Telephone: “222”
}
Update
WebApp
What Does CDC Mean for the Enterprise?
Batch
Replicate
Filter Transform
In-memory
Batch
RDBMS
Data
warehouse
Files
RDBMS
Data
warehouse
Hadoop
Streaming
Legacy
Real-time Incremental
Modern CDC Applications
Data lake: Continuous ingestion and pipeline
automation
Streaming: Data changes to Kafka, Kinesis, or
other queues
Cloud: Data workload migration
Business applications: Data preparation for
analytics and ML jobs
Legacy system: Data delivery and query offload
Data lakes
CloudStreaming
Data
warehouse
Files Legacy
RDBMS
Methods of Change Data Capture
Database triggers Data modification stamping Log based CDC
Database Triggers
Uses shadow tables
Challenges
• Introduces overhead
• Increases load to retrieve
• Loses intermediate changes
Date Modification Stamping
Transactional applications keep track of metadata in every row
• Tracks when the row was created and last modified
• Enables filter on the DATE_MODIFIED column
Challenges
• There is no DATE_MODIFIED for a deleted row
• Trigger based DATE_MODIFIED
• Extracting is resource intensive
Log Based CDC
Uses transactional logs
Challenges
• Interpreting transaction logs
• No direct interface to transaction log by vendors
• Agents and interfaces change with new database versions
• Supplemental logging increases volume of data
Run initial
load
Incremental
updates
Change Data Capture Implementation Steps
Enable CDC
for database
Define
target
Table to
handle CDC
states
Prepare table
for CDC
Next Gen Architecture Considerations
Ease of Use
Pre-packaged operators
Extensibility
Modern user experience
Real-time
Change Data Capture
Stream live updates
Optimized for high performance
Hybrid
Multiple vendors
On-premise and cloud
Databases, data warehouse,
and data lake
Value Proposition of CDC
Incremental update efficiency
Source/production impact
Time to value
TCO
Scale and flexibility
Continuous Ingestion in the Data Lake 46%
Capturing streaming data changes 58%
Database migration to cloud 38%
Data preparation for analytics and ML jobs 35%
We still have a legacy system 46%
What are the different change data capture use cases currently deployed
in your organization (choose all that apply)?
ETL, Real-time Stream Processing and Machine Learning Platform
+ A Visual IDE for Apache Spark
CDC with StreamAnalytix
Turnkey
adapters for
CDC vendor
ETL and data
wrangling
visual
operators
Elastic
compute
ReconcileTransform Enrich
Structured
data stores
CDC streams
Unstructured
data streams
File stores
Structured
data stores
Message
queues
Hadoop/Hive
Cloud storage
and DW
CDC Capabilities in StreamAnalytix
Integration with CDC providers
CDC Capabilities in StreamAnalytix
LogMiner integration
CDC Capabilities in StreamAnalytix
Turnkey reconciliation feature for Hadoop offload
CDC Capabilities in StreamAnalytix
Large set of visual operators for ETL,
analytics, and stream processing
Zero code approach to ETL design
Built in NFR support
StreamAnalytix CDC Solution Design
StreamAnalytix Workflow
A complete CDC solution has three parts:
Each aspect of the solution is modelled as a StreamAnalytix pipeline
Data
de-normalization
Join transactional data
with data at rest, and
store de-normalized data
on HDFS
Merge previously
processed transactional
data with new
incremental updates
Incremental
updates in Hive
Data ingestion
and staging
Stream data from Attunity,
replicate from Kafka or
LogMiner for multiple
tables, and store raw data
into HDFS
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #1: Data Ingestion and Staging (Streaming)
Data Enrichment
Enriches incoming data with
metadata information and event
timestamp
HDFS
Stores CDC data on HDFS in
landing area using OOB HDFS
emitter. HDFS files are rotated
based on time and size
Data Ingestion via Attunity
‘Channel’
Reads the data from Attunity and
target is Kafka. Configured to read
data feeds and metadata from a
separate topic
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #2: Data De-normalization (Batch)
Performs outer join to
merge incremental and
static data
Store de-normalized
data to HDFS directory
HDFS Data Channel
Ingests incremental data
from previous runs of the
staging location
Reads reference (data at
rest) from a fixed HDFS
location
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Pipeline #3: Incremental Updates in Hive (Batch)
Reconciliation Step
Hive “merge into” SQL, performs
insert, update and delete
operation based on the
operation in incremental data
Clean up step
Runs a drop table command on
the managed table to clean up
processed data – to avoid
repeated processing
Run a Hive SQL query to load a
managed table from the HDFS
incremental data generated
from Pipeline#2
Workflow: Oozie Coordinator Job
Oozie orchestration flow created using StreamAnalytix
webstudio – it orchestrates pipeline#2 and pipeline#3 into a
single Oozie flow that can be scheduled as shown here
Demo
Data channels
• Attunity Replicate and LogMiner
Data processing pipeline walkthrough
• Data filters and enrichment
• Analytics and data processing operators
• Data stores
Summary
Do more with your data acquisition flows
• Acquire and process data in real-time
• Enrich data from data marts
• Publish processed data as it arrives
• Multiple parallel processing paths (read once, process multiple times)
Move away from fragmented processes
• Unify data analytics and data processing/ETL flows
Conclusion
The right next gen CDC solution can make data ready for analytics as
it arrives in near real-time
CDC-based data integration is far more complex than the full export
and import of your database
A unified platform simplifies and reduces the complexities of
operationalizing CDC flows
LIVE Q&A
For a free trial download or cloud access visit www.StreamAnalytix.com
For any questions, contact us at inquiry@streamanalytix.com

Más contenido relacionado

La actualidad más candente

Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBill Liu
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Cathrine Wilhelmsen
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium confluent
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeDatabricks
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...HostedbyConfluent
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?DATAVERSITY
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Zaloni
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveDataWorks Summit
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?FlyData Inc.
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta LakeKnoldus Inc.
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...Chester Chen
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaionsridhark1981
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeDatabricks
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 

La actualidad más candente (20)

Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
Building Dynamic Pipelines in Azure Data Factory (SQLSaturday Oslo)
 
Modern Data Pipelines
Modern Data PipelinesModern Data Pipelines
Modern Data Pipelines
 
Introduction to Data Engineering
Introduction to Data EngineeringIntroduction to Data Engineering
Introduction to Data Engineering
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Data warehouse
Data warehouseData warehouse
Data warehouse
 
Making Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta LakeMaking Apache Spark Better with Delta Lake
Making Apache Spark Better with Delta Lake
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?Data Warehouse or Data Lake, Which Do I Choose?
Data Warehouse or Data Lake, Which Do I Choose?
 
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
Building a Modern Data Architecture by Ben Sharma at Strata + Hadoop World Sa...
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
 
What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?What is Change Data Capture (CDC) and Why is it Important?
What is Change Data Capture (CDC) and Why is it Important?
 
Spark with Delta Lake
Spark with Delta LakeSpark with Delta Lake
Spark with Delta Lake
 
Apache Hadoop 3
Apache Hadoop 3Apache Hadoop 3
Apache Hadoop 3
 
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
SF Big Analytics 20190612: Building highly efficient data lakes using Apache ...
 
Data warehouse presentaion
Data warehouse presentaionData warehouse presentaion
Data warehouse presentaion
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 

Similar a Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSStéphane Fréchette
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data AnalyticsAttunity
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopRTTS
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Hortonworks
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudDataWorks Summit/Hadoop Summit
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformHortonworks
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Group
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRABhadra Gowdra
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action MapR Technologies
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare Mostafa
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...DataWorks Summit
 

Similar a Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar (20)

Modernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APSModernizing Your Data Warehouse using APS
Modernizing Your Data Warehouse using APS
 
Accelerating Big Data Analytics
Accelerating Big Data AnalyticsAccelerating Big Data Analytics
Accelerating Big Data Analytics
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
Testing Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of HadoopTesting Big Data: Automated ETL Testing of Hadoop
Testing Big Data: Automated ETL Testing of Hadoop
 
Datalake Architecture
Datalake ArchitectureDatalake Architecture
Datalake Architecture
 
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
Webinar - Accelerating Hadoop Success with Rapid Data Integration for the Mod...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the CloudBring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
Bring your SAP and Enterprise Data to Hadoop, Apache Kafka and the Cloud
 
Exploring sql server 2016 bi
Exploring sql server 2016 biExploring sql server 2016 bi
Exploring sql server 2016 bi
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data PlatformModernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
Modernize Your Existing EDW with IBM Big SQL & Hortonworks Data Platform
 
Skilwise Big data
Skilwise Big dataSkilwise Big data
Skilwise Big data
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
Skillwise Big Data part 2
Skillwise Big Data part 2Skillwise Big Data part 2
Skillwise Big Data part 2
 
Analysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRAAnalysis of historical movie data by BHADRA
Analysis of historical movie data by BHADRA
 
Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action Data Warehouse Modernization: Accelerating Time-To-Action
Data Warehouse Modernization: Accelerating Time-To-Action
 
SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Big data talking stories in Healthcare
Big data talking stories in Healthcare Big data talking stories in Healthcare
Big data talking stories in Healthcare
 
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
Journey to the Data Lake: How Progressive Paved a Faster, Smoother Path to In...
 

Más de Impetus Technologies

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...Impetus Technologies
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Impetus Technologies
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarImpetus Technologies
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloudImpetus Technologies
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarImpetus Technologies
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeImpetus Technologies
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarImpetus Technologies
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarImpetus Technologies
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarImpetus Technologies
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataImpetus Technologies
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesImpetus Technologies
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Impetus Technologies
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkImpetus Technologies
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...Impetus Technologies
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Impetus Technologies
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 

Más de Impetus Technologies (19)

The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
The fastest way to convert etl analytics and data warehouse to AWS- Impetus W...
 
Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...Eliminate cyber-security threats using data analytics – Build a resilient ent...
Eliminate cyber-security threats using data analytics – Build a resilient ent...
 
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus WebinarAutomated EDW Assessment and Actionable Recommendations - Impetus Webinar
Automated EDW Assessment and Actionable Recommendations - Impetus Webinar
 
Building a mature foundation for life in the cloud
Building a mature foundation for life in the cloudBuilding a mature foundation for life in the cloud
Building a mature foundation for life in the cloud
 
Best practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus WebinarBest practices to build a sustainable data lake on cloud - Impetus Webinar
Best practices to build a sustainable data lake on cloud - Impetus Webinar
 
Automate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to SnowflakeAutomate and Optimize Data Warehouse Migration to Snowflake
Automate and Optimize Data Warehouse Migration to Snowflake
 
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinarInstantly convert Teradata ETL and EDW to Spark- Impetus webinar
Instantly convert Teradata ETL and EDW to Spark- Impetus webinar
 
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinarKeys to establish sustainable DW and analytics on the cloud -Impetus webinar
Keys to establish sustainable DW and analytics on the cloud -Impetus webinar
 
Solving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinarSolving the EDW transformation conundrum - Impetus webinar
Solving the EDW transformation conundrum - Impetus webinar
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Keys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of DataKeys to Formulating an Effective Data Management Strategy in the Age of Data
Keys to Formulating an Effective Data Management Strategy in the Age of Data
 
Build Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in MinutesBuild Spark-based ETL Workflows on Cloud in Minutes
Build Spark-based ETL Workflows on Cloud in Minutes
 
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
Apache Spark – The New Enterprise Backbone for ETL, Batch Processing and Real...
 
Streaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache SparkStreaming Analytics for IoT with Apache Spark
Streaming Analytics for IoT with Apache Spark
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...The structured streaming upgrade to Apache Spark and how enterprises can bene...
The structured streaming upgrade to Apache Spark and how enterprises can bene...
 
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...Apache spark empowering the real time data driven enterprise - StreamAnalytix...
Apache spark empowering the real time data driven enterprise - StreamAnalytix...
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Importance of Big Data Analytics
Importance of Big Data AnalyticsImportance of Big Data Analytics
Importance of Big Data Analytics
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 

Planning your Next-Gen Change Data Capture (CDC) Architecture in 2019 - StreamAnalytix Webinar

  • 1. ©2018 Impetus Technologies, Inc. All rights reserved. You are prohibited from making a copy or modification of, or from redistributing, rebroadcasting, or re-encoding of this content without the prior written consent of Impetus Technologies. This presentation may include images from other products and services. These images are used for illustrative purposes only. Unless explicitly stated there is no implied endorsement or sponsorship of these products by Impetus Technologies. All copyrights and trademarks are property of their respective owners.
  • 2. Planning your Next-Gen Change Data Capture (CDC) Architecture December 19, 2018
  • 3. Agenda What is CDC? Various methods for CDC in the enterprise data warehouse Key considerations for implementing a next-gen CDC architecture Demo Q&A
  • 4. About Impetus We exist to create powerful and intelligent enterprises through deep data awareness, data integration, and data analytics.
  • 5. About Impetus Many of North America’s most respected and well-known brands trust us as their strategic big data and analytics partner.
  • 6. Transformation Legacy EDW to big data/cloud Unification Data processing, preparation, and access Analytics Real-time, machine learning, and AI Self-service BI on big data/ cloud End-to-End Big Data Solutions
  • 7. What are the different change data capture use cases currently deployed in your organization (choose all that apply)? Continuous Ingestion in the Data Lake Capturing streaming data changes Database migration to cloud Data preparation for analytics and ML jobs We still have a legacy system
  • 8. Our Speakers Today SAURABH DUTTA Technical Product Manager SAMEER BHIDE Senior Solutions Architect
  • 9. What is Change Data Capture (CDC)? CDC is the process of capturing changes made at the data source and applying them throughout the enterprise.
  • 10. Let’s Take a Closer Look Source Database Target Database
  • 11. Let’s Take a Closer Look Source Database Target Database WebApp
  • 12. Let’s Take a Closer Look Source Database Target Database Customer { Telephone: “111” } Customer { Telephone: “111” } Create WebApp
  • 13. Let’s Take a Closer Look Change Data Capture Event Source Database Target Database Customer { Telephone: “111” } Customer { Telephone: “111” } Customer { Telephone: “111” } Create WebApp
  • 14. Let’s Take a Closer Look Change Data Capture Event Source Database Target Database Customer { Telephone: “222” } Customer { Telephone: “222” } Customer { Telephone: “222” } Update WebApp
  • 15. What Does CDC Mean for the Enterprise? Batch Replicate Filter Transform In-memory Batch RDBMS Data warehouse Files RDBMS Data warehouse Hadoop Streaming Legacy Real-time Incremental
  • 16. Modern CDC Applications Data lake: Continuous ingestion and pipeline automation Streaming: Data changes to Kafka, Kinesis, or other queues Cloud: Data workload migration Business applications: Data preparation for analytics and ML jobs Legacy system: Data delivery and query offload Data lakes CloudStreaming Data warehouse Files Legacy RDBMS
  • 17. Methods of Change Data Capture Database triggers Data modification stamping Log based CDC
  • 18. Database Triggers Uses shadow tables Challenges • Introduces overhead • Increases load to retrieve • Loses intermediate changes
  • 19. Date Modification Stamping Transactional applications keep track of metadata in every row • Tracks when the row was created and last modified • Enables filter on the DATE_MODIFIED column Challenges • There is no DATE_MODIFIED for a deleted row • Trigger based DATE_MODIFIED • Extracting is resource intensive
  • 20. Log Based CDC Uses transactional logs Challenges • Interpreting transaction logs • No direct interface to transaction log by vendors • Agents and interfaces change with new database versions • Supplemental logging increases volume of data
  • 21. Run initial load Incremental updates Change Data Capture Implementation Steps Enable CDC for database Define target Table to handle CDC states Prepare table for CDC
  • 22. Next Gen Architecture Considerations Ease of Use Pre-packaged operators Extensibility Modern user experience Real-time Change Data Capture Stream live updates Optimized for high performance Hybrid Multiple vendors On-premise and cloud Databases, data warehouse, and data lake
  • 23. Value Proposition of CDC Incremental update efficiency Source/production impact Time to value TCO Scale and flexibility
  • 24. Continuous Ingestion in the Data Lake 46% Capturing streaming data changes 58% Database migration to cloud 38% Data preparation for analytics and ML jobs 35% We still have a legacy system 46% What are the different change data capture use cases currently deployed in your organization (choose all that apply)?
  • 25. ETL, Real-time Stream Processing and Machine Learning Platform + A Visual IDE for Apache Spark
  • 26. CDC with StreamAnalytix Turnkey adapters for CDC vendor ETL and data wrangling visual operators Elastic compute ReconcileTransform Enrich Structured data stores CDC streams Unstructured data streams File stores Structured data stores Message queues Hadoop/Hive Cloud storage and DW
  • 27. CDC Capabilities in StreamAnalytix Integration with CDC providers
  • 28. CDC Capabilities in StreamAnalytix LogMiner integration
  • 29. CDC Capabilities in StreamAnalytix Turnkey reconciliation feature for Hadoop offload
  • 30. CDC Capabilities in StreamAnalytix Large set of visual operators for ETL, analytics, and stream processing Zero code approach to ETL design Built in NFR support
  • 31. StreamAnalytix CDC Solution Design StreamAnalytix Workflow A complete CDC solution has three parts: Each aspect of the solution is modelled as a StreamAnalytix pipeline Data de-normalization Join transactional data with data at rest, and store de-normalized data on HDFS Merge previously processed transactional data with new incremental updates Incremental updates in Hive Data ingestion and staging Stream data from Attunity, replicate from Kafka or LogMiner for multiple tables, and store raw data into HDFS
  • 32. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 33. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 34. Pipeline #1: Data Ingestion and Staging (Streaming) Data Enrichment Enriches incoming data with metadata information and event timestamp HDFS Stores CDC data on HDFS in landing area using OOB HDFS emitter. HDFS files are rotated based on time and size Data Ingestion via Attunity ‘Channel’ Reads the data from Attunity and target is Kafka. Configured to read data feeds and metadata from a separate topic
  • 35. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 36. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 37. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 38. Pipeline #2: Data De-normalization (Batch) Performs outer join to merge incremental and static data Store de-normalized data to HDFS directory HDFS Data Channel Ingests incremental data from previous runs of the staging location Reads reference (data at rest) from a fixed HDFS location
  • 39. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 40. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 41. Pipeline #3: Incremental Updates in Hive (Batch) Reconciliation Step Hive “merge into” SQL, performs insert, update and delete operation based on the operation in incremental data Clean up step Runs a drop table command on the managed table to clean up processed data – to avoid repeated processing Run a Hive SQL query to load a managed table from the HDFS incremental data generated from Pipeline#2
  • 42. Workflow: Oozie Coordinator Job Oozie orchestration flow created using StreamAnalytix webstudio – it orchestrates pipeline#2 and pipeline#3 into a single Oozie flow that can be scheduled as shown here
  • 43. Demo Data channels • Attunity Replicate and LogMiner Data processing pipeline walkthrough • Data filters and enrichment • Analytics and data processing operators • Data stores
  • 44. Summary Do more with your data acquisition flows • Acquire and process data in real-time • Enrich data from data marts • Publish processed data as it arrives • Multiple parallel processing paths (read once, process multiple times) Move away from fragmented processes • Unify data analytics and data processing/ETL flows
  • 45. Conclusion The right next gen CDC solution can make data ready for analytics as it arrives in near real-time CDC-based data integration is far more complex than the full export and import of your database A unified platform simplifies and reduces the complexities of operationalizing CDC flows
  • 46. LIVE Q&A For a free trial download or cloud access visit www.StreamAnalytix.com For any questions, contact us at inquiry@streamanalytix.com

Notas del editor

  1. intro - 5 min Poll - 2 mins   Saurabh Background 2 min Goals 3 min Steps 3 min Methods 3 min Architectural considerations 2 min   Sameer CDC with SAX 3 min Deployment Key Benefits 3 min Demo - 10 min  Beyond CDC - 5 min   Q&A - 10 mins
  2. CDC minimizes the resources required for ETL ( extract, transform, load ) processes because it only deals with data changes. The goal of CDC is to ensure data synchronicity.
  3. https://www.youtube.com/watch?v=v_hQyUZzLsA
  4. https://www.youtube.com/watch?v=v_hQyUZzLsA
  5. https://www.youtube.com/watch?v=v_hQyUZzLsA
  6. https://www.youtube.com/watch?v=v_hQyUZzLsA
  7. https://www.youtube.com/watch?v=v_hQyUZzLsA
  8. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  9. Reference: https://www.slideshare.net/jimdeppen/change-data-capture-13718162 https://www.hvr-software.com/blog/change-data-capture/
  10. https://blog.exsilio.com/all/how-to-use-change-data-capture/
  11. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  12. https://www.youtube.com/watch?v=1WrfgBx3hiQ
  13. Join Processor: It. HDFS:.
  14. Join Processor: It. HDFS:.
  15. Join Processor: It. HDFS:.
  16. Join Processor: It. HDFS:.
  17. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  18. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  19. DataGenerator: It generates a dummy record on start of pipeline. LoadStagingInHive: It runs a MergeStagingAndMasterData: It runs a. DropStagingHiveData: It runs a drop table command on hive to drop the managed table loaded in second step.
  20. Questions