SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
© 2014 VMware Inc. All rights reserved.
Real-time Data Loading from Oracle
and MySQL to Data Warehouses
and Oracle to Analytics
MC Brown, Senior Product Line Manager
Continuent Quick Introduction
2
History
 Products
2004
 Continuent established in USA
2009
 3rd Generation Continuent
Tungsten (aka VMware Continuent)
ships
2014
 100+ customers running business-
critical applications
Oct 2014
 Acquisition by VMware: Now part 

of the vCloud Air Business Unit
Oct 2015
 Continuent solutions available
through VMware sales
Industry-leading clustering and
replication for open source DBMS
Clustering – Commercial-grade HA,
performance scaling, and data
management for MySQL
Replication– Flexible, high-
performance data movement
Business-Critical Deployment Examples
High Availability for
MySQL
Largest cluster deployment performs 800M+ transactions/
day on 275 TB of relational data
Business Continuity
Cross-site cluster topologies widely deployed including
primary/DR and multi-master
High Performance
Replication
Largest installations transfer billions of transactions daily
using high speed, parallel replication
Heterogeneous
Integration
Customers replicate from MySQL to Oracle, Hadoop,
Redshift, Vertica, and others
Real-time Analytics
Optimized data loading for data warehouses with
deployments of up to 200 MySQL masters feeding to
Hadoop
Continuent Facts
3
Select Continuent Customers
4
Data Warehouse Integration is Changing
•  Traditional data warehouse usage was based on dump from transactional store,
loads into data warehouse
•  Data warehouse and analytics were done off historical data loaded
•  Data warehouses often use merged data from multiple sources, which was hard to
handled
•  Data warehouses are now frequently sources as well as targets for data, i.e.:
–  Export data to data warehouse
–  Analyze data
–  Feed summary data back to application to display stats to users
Modern Data Warehouse Sequences
How do we cope with that model
•  Traditional Extract-Transform-Load (ETL) methods take too long
•  Data needs to be replicated into a data warehouse in real-time
•  Continuous stream of information
•  Replicate everything
•  Use data warehouse to provide join and analytics
Data Warehouse Choices
•  Hadoop
–  General purpose storage platform
–  Map Reduce for data processing
–  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non-
SQL (Pig, native, Spark)
–  JDBC/ODBC Interfaces improving
•  Vertica
–  Massive cluster-based column store
–  SQL and ODBC/JDBC Interface
•  Amazon Redshift
–  Highly flexible column store
–  Easy to deploy
9
(software formerly known as Tungsten Replicator)
is a fast,
open source, database
replication engine
Designed for speed and flexibility
GPL V2 license
100% open source
Annual support subscription available
VMware Continuent for Replication/Data Warehouses
10
Master
(Transactions + Metadata)
Slave
Replicator
(Transactions + Metadata)
Replicator
Download
transactions
via network
Apply using
JDBC
Binlog
THL
THL
Continuent Master/Slave in Action (MySQL)
11
Master
(Transactions + Metadata)
Slave
Replicator
(Transactions + Metadata)
Replicator
Download
transactions
via network
Apply using
JDBC
CDC
THL
THL
Continuent Master/Slave in Action (Oracle)
12
Transactional Store Data Warehouse
Dump/Provision
Transactions?
X
Batch
The Data Warehouse Impedance Mismatch
Transactional and Data Warehouse Metadata
•  Replicating data is not just about the data
•  Table structures must be replicated too
•  ddlscan handles the translation
–  Migrates an existing MySQL or Oracle schema into the target schema
–  Template based
–  Handles underlying datatype matches
–  Needs to be executed before replication starts
Replicating into Vertica
Replicator
Replicator
CSV
JS
JDBC
cpimport
staging
base
merge
Vertica Demo
Replicating into Redshift
Replicator
Replicator
CSV
JS
JDBC
s3cmd
staging
base
merge
COPY
Replicating into Hadoop
Replicator
Replicator
CSV
JS
hadoop fs
Initial Materialization within Hadoop
load-reduce-check
Migrate staging/base DDL
Hive
materialization
CSV
Staging
Table
Base
Table
Hadoop Demo 1
Ongoing Materialization within Hadoop
materialize
Hive
materialization
CSV
Staging
Table
Base
Table
Hadoop Demo 2
Provisioning Options (all data warehouses)
•  MySQL
–  Traditional CSV export and import
–  Dump and load through Blackhole engine
–  Use tungsten_provision_thl
•  Oracle
–  Traditional CSV export and import
–  Use parallel extractor
Provisioning Options (Hadoop)
•  MySQL
–  Traditional CSV export and import
–  Dump and load through Blackhole engine
–  Use tungsten_provision_thl
–  Use Sqoop
•  Oracle
–  Traditional CSV export and import
–  Use parallel extractor
–  Use Sqoop
Comparing Loading Methods for Hadoop
Manual via CSV Sqoop Tungsten
Replicator
Process Manual/Scripted Manual/Scripted Fully Automated
Incremental
Loading
Possible with DDL
changes
Requires DDL
changes
Fully Supported
Latency Full-load Intermittent Real-time
Extraction
Requirements
Full table scan Full and partial
table scans
Low-impact CDC/
binlog scan
Hadoop Demo 3
Sqoop and Materialization within Hadoop
Hive
materialization
CSV
Staging
Table
Base
Table
Sqoop
Replicate
27
Op Seqno ID Msg
I 1 1 Hello World!
I 2 2 Meet MC
D 3 1
I 3 1 Goodbye World
Op Seq
no
ID Msg
I 2 2 Meet MC
I 3 1 Goodbye
World
How the Materialization Works
•  Extract from MySQL or Oracle
•  Hadoop Support
– Cloudera (Certified), HortonWorks, MapR, Pivotal, Amazon
EMR, IBM (Certified), Apache
•  Provision using Sqoop or parallel extraction
•  Schema generation for Hive
•  Tools for generating materialized views
•  Parallel CSV file loading
•  Partition loaded data by commit time
•  Schema Change Notification
28
Replication support (Hadoop specific)
29
1 2 3 4 5 6 7 8 9 1
0
1
1
1
2
1
3
1
4
1
5
1
6
1
7
1
8
1
9
2
0
2
1
2
2
2
3
2
4
2
5
2
6
2
7
2
8
2
9
3
0
3
1
3
2
3
3
3
4
3
5
3
6
3
7
3
8
3
9
4
0
4
1
4
2
4
3
4
4
4
5
Monday Wednesday Friday
Data Warehouse Possibilities: Point in Time Tables
30
Op Seqn
o
ID Date Msg
I 1 1 1/6/14 Hello World!
I 2 2 2/6/14 Meet MC
I 3 1 2/6/14 Goodbye World
I 4 1 3/6/14 Hello Tuesday
I 4 2 3/6/14 Ruby
Wednesday
I 5 1 4/6/14 Final Count
ID Date Msg
1 1/6/14 Hello World!
1 2/6/14 Goodbye World
1 3/6/14 Hello Tuesday
1 4/6/14 Final Count
Data Warehouse Possibilities: Time Series Generation
•  Tungsten Replicator builds are available on code.google.com
http://code.google.com/p/tungsten-replicator/
•  Replicator documentation is available on Continuent website
http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html
•  Tungsten Hadoop tools are available on GitHub
https://github.com/continuent/continuent-tools-hadoop
31
Contact Continuent for support
Getting Started!
For more information, contact us:
Robert Noyes
Alliance Manager, USA & Canada
rnoyes@vmware.com
+1 (650) 575-0958
Philippe Bernard
Alliance Manager, EMEA & APAC
pbernard@vmware.com
+41 79 347 1385
MC Brown
Senior Product Line Manager
mcb@vmware.com



Eero Teerikorpi
Sr. Director, Strategic Alliances
eteerikorpi@vmware.com
+1 (408) 431-3305


Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Set Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into HadoopSet Up & Operate Real-Time Data Loading into Hadoop
Set Up & Operate Real-Time Data Loading into Hadoop
 
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to AnalyticsReplicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
Replicate from Oracle to Oracle, Oracle to MySQL, and Oracle to Analytics
 
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
Real-Time Data Loading from MySQL to Hadoop with New Tungsten Replicator 3.0
 
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
February 2016 HUG: Apache Apex (incubating): Stream Processing Architecture a...
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
 
Tungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And OracleTungsten University: Replicate Between MySQL And Oracle
Tungsten University: Replicate Between MySQL And Oracle
 
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HAGalera Cluster: Synchronous Multi-Master Replication for MySQL HA
Galera Cluster: Synchronous Multi-Master Replication for MySQL HA
 
Connecting Hadoop and Oracle
Connecting Hadoop and OracleConnecting Hadoop and Oracle
Connecting Hadoop and Oracle
 
Oracle 12c PDB insights
Oracle 12c PDB insightsOracle 12c PDB insights
Oracle 12c PDB insights
 
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera ) Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
Wars of MySQL Cluster ( InnoDB Cluster VS Galera )
 
Apache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for HadoopApache Sqoop: A Data Transfer Tool for Hadoop
Apache Sqoop: A Data Transfer Tool for Hadoop
 
Reduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technologyReduce planned database down time with Oracle technology
Reduce planned database down time with Oracle technology
 
ORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, SmallerORC 2015: Faster, Better, Smaller
ORC 2015: Faster, Better, Smaller
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
 
Intro to Exadata
Intro to ExadataIntro to Exadata
Intro to Exadata
 
HBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table SnapshotsHBaseCon 2013: Apache HBase Table Snapshots
HBaseCon 2013: Apache HBase Table Snapshots
 
Install oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle homeInstall oracle binaris or clonse oracle home
Install oracle binaris or clonse oracle home
 
Oracle 12c Multi Tenant
Oracle 12c Multi TenantOracle 12c Multi Tenant
Oracle 12c Multi Tenant
 
Oracle High Availability
Oracle High AvailabilityOracle High Availability
Oracle High Availability
 

Similar a Replication in real-time from Oracle and MySQL into data warehouses and analytics

Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
DataWorks Summit
 

Similar a Replication in real-time from Oracle and MySQL into data warehouses and analytics (20)

How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
Replicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon RedshiftReplicating in Real-time from MySQL to Amazon Redshift
Replicating in Real-time from MySQL to Amazon Redshift
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
Virtualized Big Data Platform at VMware Corp IT @ VMWorld 2015
 
VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right VMworld 2013: Virtualizing Databases: Doing IT Right
VMworld 2013: Virtualizing Databases: Doing IT Right
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
Windows on AWS
Windows on AWSWindows on AWS
Windows on AWS
 
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
Cloudera Impala - Las Vegas Big Data Meetup Nov 5th 2014
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
 
K.I.S.S In The Cloud with AWS
K.I.S.S In The Cloud with AWSK.I.S.S In The Cloud with AWS
K.I.S.S In The Cloud with AWS
 
Business-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud AirBusiness-critical MySQL with DR in vCloud Air
Business-critical MySQL with DR in vCloud Air
 
1. beyond mission critical virtualizing big data and hadoop
1. beyond mission critical   virtualizing big data and hadoop1. beyond mission critical   virtualizing big data and hadoop
1. beyond mission critical virtualizing big data and hadoop
 
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage HadoopActian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
Actian Vector on Hadoop: First Industrial-strength DBMS to Truly Leverage Hadoop
 
Introducing Azure SQL Data Warehouse
Introducing Azure SQL Data WarehouseIntroducing Azure SQL Data Warehouse
Introducing Azure SQL Data Warehouse
 
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...
 
SQL Saturday San Diego
SQL Saturday San DiegoSQL Saturday San Diego
SQL Saturday San Diego
 
Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2Whats new in Oracle Database 12c release 12.1.0.2
Whats new in Oracle Database 12c release 12.1.0.2
 
Continuent webinar 02-19-2015
Continuent webinar 02-19-2015Continuent webinar 02-19-2015
Continuent webinar 02-19-2015
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware companyMySQL High Availability and Disaster Recovery with Continuent, a VMware company
MySQL High Availability and Disaster Recovery with Continuent, a VMware company
 

Más de Continuent

Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Continuent
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Continuent
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Continuent
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Continuent
 

Más de Continuent (20)

Tungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and BeyondTungsten Webinar: v6 & v7 Release Recap, and Beyond
Tungsten Webinar: v6 & v7 Release Recap, and Beyond
 
Continuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition WebinarContinuent Tungsten Value Proposition Webinar
Continuent Tungsten Value Proposition Webinar
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControlWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #7: ClusterControl
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #5: Oracle’s InnoDB Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera ClusterWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #2: Galera Cluster
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS AuroraWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #1: AWS Aurora
 
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
Webinar Slides: AWS Aurora MySQL Replacement: Break Away From Geo-Limitations...
 
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
Webinar Slides: No Data Loss MySQL: Guaranteed Credit Card Transaction Availa...
 
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent FailoverWebinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
Webinar Slides: Intelligent Database Proxies: Routing & Transparent Failover
 
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
Webinar Slides: High Volume MySQL HA: SaaS Continuous Operations with Terabyt...
 
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten DashboardTraining Slides: 205 - Installing and Configuring Tungsten Dashboard
Training Slides: 205 - Installing and Configuring Tungsten Dashboard
 
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & KafkaTraining Slides: 352 - Tungsten Replicator for MongoDB & Kafka
Training Slides: 352 - Tungsten Replicator for MongoDB & Kafka
 
Training Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data WarehousesTraining Slides: 351 - Tungsten Replicator for Data Warehouses
Training Slides: 351 - Tungsten Replicator for Data Warehouses
 
Training Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a ClusterTraining Slides: 303 - Replicating out of a Cluster
Training Slides: 303 - Replicating out of a Cluster
 
Training Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMITraining Slides: 206 - Using the Tungsten Cluster AMI
Training Slides: 206 - Using the Tungsten Cluster AMI
 
Training Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMITraining Slides: 254 - Using the Tungsten Replicator AMI
Training Slides: 254 - Using the Tungsten Replicator AMI
 
Training Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a ProTraining Slides: 253 - Filter like a Pro
Training Slides: 253 - Filter like a Pro
 
Training Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & TroubleshootingTraining Slides: 252 - Monitoring & Troubleshooting
Training Slides: 252 - Monitoring & Troubleshooting
 
Training Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSLTraining Slides: 302 - Securing Your Cluster With SSL
Training Slides: 302 - Securing Your Cluster With SSL
 

Último

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Último (20)

The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban%in Durban+277-882-255-28 abortion pills for sale in Durban
%in Durban+277-882-255-28 abortion pills for sale in Durban
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Exploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdfExploring the Best Video Editing App.pdf
Exploring the Best Video Editing App.pdf
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 

Replication in real-time from Oracle and MySQL into data warehouses and analytics

  • 1. © 2014 VMware Inc. All rights reserved. Real-time Data Loading from Oracle and MySQL to Data Warehouses and Oracle to Analytics MC Brown, Senior Product Line Manager
  • 2. Continuent Quick Introduction 2 History Products 2004 Continuent established in USA 2009 3rd Generation Continuent Tungsten (aka VMware Continuent) ships 2014 100+ customers running business- critical applications Oct 2014 Acquisition by VMware: Now part 
 of the vCloud Air Business Unit Oct 2015 Continuent solutions available through VMware sales Industry-leading clustering and replication for open source DBMS Clustering – Commercial-grade HA, performance scaling, and data management for MySQL Replication– Flexible, high- performance data movement
  • 3. Business-Critical Deployment Examples High Availability for MySQL Largest cluster deployment performs 800M+ transactions/ day on 275 TB of relational data Business Continuity Cross-site cluster topologies widely deployed including primary/DR and multi-master High Performance Replication Largest installations transfer billions of transactions daily using high speed, parallel replication Heterogeneous Integration Customers replicate from MySQL to Oracle, Hadoop, Redshift, Vertica, and others Real-time Analytics Optimized data loading for data warehouses with deployments of up to 200 MySQL masters feeding to Hadoop Continuent Facts 3
  • 5. Data Warehouse Integration is Changing •  Traditional data warehouse usage was based on dump from transactional store, loads into data warehouse •  Data warehouse and analytics were done off historical data loaded •  Data warehouses often use merged data from multiple sources, which was hard to handled •  Data warehouses are now frequently sources as well as targets for data, i.e.: –  Export data to data warehouse –  Analyze data –  Feed summary data back to application to display stats to users
  • 7. How do we cope with that model •  Traditional Extract-Transform-Load (ETL) methods take too long •  Data needs to be replicated into a data warehouse in real-time •  Continuous stream of information •  Replicate everything •  Use data warehouse to provide join and analytics
  • 8. Data Warehouse Choices •  Hadoop –  General purpose storage platform –  Map Reduce for data processing –  Front-end interfaces for interaction in SQL-like (Hive, HBase, Impala) and non- SQL (Pig, native, Spark) –  JDBC/ODBC Interfaces improving •  Vertica –  Massive cluster-based column store –  SQL and ODBC/JDBC Interface •  Amazon Redshift –  Highly flexible column store –  Easy to deploy
  • 9. 9 (software formerly known as Tungsten Replicator) is a fast, open source, database replication engine Designed for speed and flexibility GPL V2 license 100% open source Annual support subscription available VMware Continuent for Replication/Data Warehouses
  • 10. 10 Master (Transactions + Metadata) Slave Replicator (Transactions + Metadata) Replicator Download transactions via network Apply using JDBC Binlog THL THL Continuent Master/Slave in Action (MySQL)
  • 11. 11 Master (Transactions + Metadata) Slave Replicator (Transactions + Metadata) Replicator Download transactions via network Apply using JDBC CDC THL THL Continuent Master/Slave in Action (Oracle)
  • 12. 12 Transactional Store Data Warehouse Dump/Provision Transactions? X Batch The Data Warehouse Impedance Mismatch
  • 13. Transactional and Data Warehouse Metadata •  Replicating data is not just about the data •  Table structures must be replicated too •  ddlscan handles the translation –  Migrates an existing MySQL or Oracle schema into the target schema –  Template based –  Handles underlying datatype matches –  Needs to be executed before replication starts
  • 18. Initial Materialization within Hadoop load-reduce-check Migrate staging/base DDL Hive materialization CSV Staging Table Base Table
  • 20. Ongoing Materialization within Hadoop materialize Hive materialization CSV Staging Table Base Table
  • 22. Provisioning Options (all data warehouses) •  MySQL –  Traditional CSV export and import –  Dump and load through Blackhole engine –  Use tungsten_provision_thl •  Oracle –  Traditional CSV export and import –  Use parallel extractor
  • 23. Provisioning Options (Hadoop) •  MySQL –  Traditional CSV export and import –  Dump and load through Blackhole engine –  Use tungsten_provision_thl –  Use Sqoop •  Oracle –  Traditional CSV export and import –  Use parallel extractor –  Use Sqoop
  • 24. Comparing Loading Methods for Hadoop Manual via CSV Sqoop Tungsten Replicator Process Manual/Scripted Manual/Scripted Fully Automated Incremental Loading Possible with DDL changes Requires DDL changes Fully Supported Latency Full-load Intermittent Real-time Extraction Requirements Full table scan Full and partial table scans Low-impact CDC/ binlog scan
  • 26. Sqoop and Materialization within Hadoop Hive materialization CSV Staging Table Base Table Sqoop Replicate
  • 27. 27 Op Seqno ID Msg I 1 1 Hello World! I 2 2 Meet MC D 3 1 I 3 1 Goodbye World Op Seq no ID Msg I 2 2 Meet MC I 3 1 Goodbye World How the Materialization Works
  • 28. •  Extract from MySQL or Oracle •  Hadoop Support – Cloudera (Certified), HortonWorks, MapR, Pivotal, Amazon EMR, IBM (Certified), Apache •  Provision using Sqoop or parallel extraction •  Schema generation for Hive •  Tools for generating materialized views •  Parallel CSV file loading •  Partition loaded data by commit time •  Schema Change Notification 28 Replication support (Hadoop specific)
  • 29. 29 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7 1 8 1 9 2 0 2 1 2 2 2 3 2 4 2 5 2 6 2 7 2 8 2 9 3 0 3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 3 9 4 0 4 1 4 2 4 3 4 4 4 5 Monday Wednesday Friday Data Warehouse Possibilities: Point in Time Tables
  • 30. 30 Op Seqn o ID Date Msg I 1 1 1/6/14 Hello World! I 2 2 2/6/14 Meet MC I 3 1 2/6/14 Goodbye World I 4 1 3/6/14 Hello Tuesday I 4 2 3/6/14 Ruby Wednesday I 5 1 4/6/14 Final Count ID Date Msg 1 1/6/14 Hello World! 1 2/6/14 Goodbye World 1 3/6/14 Hello Tuesday 1 4/6/14 Final Count Data Warehouse Possibilities: Time Series Generation
  • 31. •  Tungsten Replicator builds are available on code.google.com http://code.google.com/p/tungsten-replicator/ •  Replicator documentation is available on Continuent website http://docs.continuent.com/tungsten-replicator-3.0/deployment-hadoop.html •  Tungsten Hadoop tools are available on GitHub https://github.com/continuent/continuent-tools-hadoop 31 Contact Continuent for support Getting Started!
  • 32. For more information, contact us: Robert Noyes Alliance Manager, USA & Canada rnoyes@vmware.com +1 (650) 575-0958 Philippe Bernard Alliance Manager, EMEA & APAC pbernard@vmware.com +41 79 347 1385 MC Brown Senior Product Line Manager mcb@vmware.com Eero Teerikorpi Sr. Director, Strategic Alliances eteerikorpi@vmware.com +1 (408) 431-3305