SlideShare una empresa de Scribd logo
1 de 24
Yabin Meng
An Effective Approach to Migrate Thrift to CQL
About Pythian
• Founded in 1997, a global leader in Enterprise IT transformation and operational excellence
•www.pythian.com
•https://www.linkedin.com/company/pythian
• Love your Data
•6000+ database under management
•Over 400 DBA’s, in 35 countries
•Top 5% of DBA workforce – 9 Oracle ACE’s, 2 Microsoft MVP’s, 1 Cassandra MVP
•Partners with Amazon, Cloudera, Oracle, Microsoft, DataStax, and a lot more
• Specialities: Databases, Cloud, DevOps, Big Data Infrastructure, Advanced Analytics, Infrastructure Management
• Industry Recognition
•DBTA1
100 2016 - The Companies That Matter Most in Data
•IAOP2
- GLOBAL OUTSOURCING 100® RISING STAR FOR 2016
•Top 25 Canadian ICT3
Professional Services Companies
•… …
DBTA1
: Database Trend and Applications (http://www.dbta.com/)
IAOP2
: International Association of Outsourcing Professionals (https://www.iaop.org/)
ICT3
: Information & Communication Technology
© Pythian, All Rights Reserved. 2
About Myself
• 15+ years IT experience - relational database/data warehousing, business intelligence and
analytics (BIA), NoSQL database/Big Data
• Started my career as a RDBMS client application developer (PowerBuilder, Java)
• 5 years as IBM DB2 Kernel Developer, New Development (DB2 v9.5, 9.7, 9.8, 10)
• Multi-years data warehousing and BIA experience in insurance and capital market sectors as
technical lead/architect (IBM/Cognos, SAS/SAS 9.x, Information Builders/WebFocus)
• DataStax Certified Developer, Administrator, and Partner Architect
• https://www.linkedin.com/in/yabinmeng
• @yabinmeng
© Pythian, All Rights Reserved. 3
1 Problem Overview
2 Background Discussion
3 Thrift to CQL Transition
4 C* Storage Engine (pre 3.0)
5 Data Migration
6 Conclusion
4© Pythian, All Rights Reserved.
Problem Overview
• C* cluster upgrade from version 1.2 to 2.1
• Thrift based application (Hector)
•Plan to convert to CQL, but not done yet
• Data copied from one C* cluster to another
•C* COPY command
• Application failed in DEV/TEST
•Cannot retrieve some required data
•cqlsh output shows NO difference
• Schema
• Data query result
© Pythian, All Rights Reserved. 5
Background Disccsion, COPY commands
• COPY commands
•Syntax: COPY <table_name> (column1, column2, …) TO/FROM <file_name> WITH options
•Since 1.1.3 (PostgresSQL COPY commands)
•Simple yet with limitation. Continuous improvement
•Small datasets (max. a few million rows)
•CSV format
•No primary key check
•New options and better performance in cqlsh copy
– 2.1.13, 2.2.5, 3.0.3, 3.2
– ex (common): NUMPROCESSES, CONFIGFILE
– ex (TO): BEGINTOKEN / ENDTOKEN
– ex (FROM): INGESTRATE (rows/second)
•Predefined schema
© Pythian, All Rights Reserved. 6
Background Discussion, Schema-less?
• Schema-less or Schema
•Many NoSQL DBs is marked as “schema-less”
• A lot of debate
•C* is schema optional (Thrift)
•C* is schema required (CQL)
© Pythian, All Rights Reserved. 7
Background Discussion, C* Utilities
•cassandra-cli utility
•Command line interface since very early years of C*
•Thrift-based counterpart of “cqlsh”
•C* data engine format (pre 3.0)
•Deprecated since C* 2.1 and will be removed in 3.0
•Comparator and Validator
•Validator: row key, column value
•Comparator: column name + column sorting order
•sstable2json utility
•Deprecated and will be removed in 3.0
SET users['key_val']['column1_name']='some value';
SET users['key_val']['column2_name']='some value';
...
LIST users;
GET users['key_val']['column1_name']
Thrift to CQL Transition
• What is Thrift and CQL?
•Thrift is a general framework (interface definition language and binary communication protocol)
• "scalable cross-language services development"
•APIs to interact with C*
•CQL is also a query language
© Pythian, All Rights Reserved. 9
Thrift Thrift
CQL
Binary
CQL query
language
CQL query
language
N/A *
C* evolvement
Protocol
Layer
Access
Abstraction
Layer
C* Data Engine
Storage
Layer
Thrift to CQL Transition, continued
• Thrift in C*
•Since the beginning as a communication protocol
•Low level API
•Deprecated
• CQL in C*
•0.8, CQL as a language
•1.2, CQL as a language, version 3 (CQL3)
•1.2, CQL as a communication protocol (CQL3 only)
•Continuously evolving
• Thrift vs. CQL as protocol
•Thrift: RPC based; Synchronous
•CQL: Server notification (event registration); Asynchronous
• CQL as a language
•Feature rich: UDF, lightweight transaction, JSON support, materialized view, …
• Thrift vs. CQL as client driver
•Thrift driver is deprecated and not supported
© Pythian, All Rights Reserved. 10
Thrift to CQL Transition, continued
• What is needed for the transition?
•Application
•Data Model
•Data itself
•Understanding C* storage engine *
* Pre-3.0 C*
© Pythian, All Rights Reserved. 11
C* Storage Engine (pre 3.0)
• C* ColumnFamily (Thrift) / Table (CQL) data structure
•Map of sorted map:
Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
CREATE TABLE song_tags (
id uuid,
tag_name text,
PRIMARY KEY (id, tag_name)
) WITH COMPACT STORAGE
© Pythian, All Rights Reserved. 12
C* Storage Engine – Thrift vs. CQL Terminology
• Table
–Thrift: Column Family
–CQL: Table
• Row
–Thrift: Storage Row / Partition
–CQL: Logic Representation / Part of a
partition
• Column
–Thrift: Cell (column key + column value)
–CQL: Metadata / column key or column
value
© Pythian, All Rights Reserved. 13
C* Storage Engine – Compact Storage
CREATE TABLE avg_grade2 (
student_id int,
class_id int,
grade double,
PRIMARY KEY (student_id, class_id)
) WITH COMPACT STORAGE
© Pythian, All Rights Reserved. 14
• Storage space vs. table metadata
– Backward compatibility
– Thrift: always compact
– CQL: optional
• Caveats:
– No adding / dropping column
– Single non-key column if compound primary key
– Non-frozen collection type
C* Storage Engine – Static Table Definition
create column family avg_grade
with key_validation_class = Int32Type
and comparator = UTF8Type
and column_metadata = [
{column_name: class_id, validation_class: Int32Type},
{column_name: grade, validation_class: FloatType}
]
create table avg_grade (
key int primary key,
class_id int,
grade float
) with compact storage
•Equivalent definition
•Thrift: no name for key -- default name used in CQL:
“key”
© Pythian, All Rights Reserved. 15
Thrift / cassandra_cli
C* Storage Engine – Dynamic Table Definition
Thrift / cassandra_cli
create column family sensor_data
with key_validation_class = Int32Type
and comparator = TimeUUIDType
and default_validation_class = DecimalType;
-------------------
RowKey: 1
=> (name=1171d7e0-14d2-11e6-858b-5f3b22f4f11c, value=21.5, timestamp=1462680314974000)
=> (name=23371940-14d2-11e6-858b-5f3b22f4f11c, value=32.1, timestamp=1462680344792000)
-------------------
RowKey: 2
=> (name=7537fcf0-14d2-11e6-858b-5f3b22f4f11c, value=11.0, timestamp=1462680482367000)
• Thrift only
• No “column_metadata” section
© Pythian, All Rights Reserved. 16
CREATE TABLE sensor_data (
key int,
column1 timeuuid,
value decimal,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE
•Default “key” for key name
•Default “column1” for column name/key
•Default “value” for column value
•Alter Table
C* Storage Engine – Mixed Table Definition
Thrift / cassandra_cli
create column family blogs
with key_validation_class = Int32Type
and comparator = UTF8Type
and column_metadata = [
{column_name: author, validation_class: UTF8Type}
]
© Pythian, All Rights Reserved. 17
CREATE TABLE blogs (
key int PRIMARY KEY,
author text
) WITH COMPACT STORAGE
key | author
-----+--------
1 | Donald
2 | John
-------------------
RowKey: 1
=> (name=author, value=Donald, timestamp=1462720696717000)
=> (name=tags:category, value=music, timestamp=1462720526053000)
=> (name=tags:rating, value=A, timestamp=1462720564590000)
=> (name=tags:recommend, value=Yes, timestamp=1462720640817000)
-------------------
RowKey: 2
=> (name=author, value=John, timestamp=1462720690911000)
Migrate Thrift Data – Overview
• Do not fully trust C* schema
•“show schema” / “describe keyspace”
• Sample your data
•“cassandra-cli”, yes
•“cqlsh”, no
• Review application design and code
•Schema-less / schema -optional
• Determine CQL schema
• Data Migration Method
© Pythian, All Rights Reserved. 18
Migrate Thrift Data – Determine CQL Schema
Thrift / cassandra-cli
create column family some_tbl_name
with key_validation_class = BytesType
and comparator = UTF8Type
and column_metadata = [
{column_name: StaticColName1, validation_class: xxx1},
{column_name: StaticColName2, validation_class: xxx2}
{column_name: StaticColName3, validation_class: xxx3}
… …
]
StaticColNameSet = {StaticColName1, StaticColName2, StaticColName3, …}
DynamicColNameSet = {DynamicColName1, DynmaicColName2, DyanmicColName3, …}
CQL / cqlsh schema
• Application specific
• Could it be a little bit more generic?
CREATE TABLE some_tbl_name (
key blob,
StaticColName1 yyy1,
StaticColName2 yyy2,
StaticColName3 yyy3,
… …,
DynamicColMapName map<text, blob>;
PRIMARY KEY (key)
)
© Pythian, All Rights Reserved. 19
Migrate Thrift Data - ETL Method
• ETL
•COPY command
•Commercial product (jasper soft, pentaho, …)
•Self developed
• Natural Approach
•Read every record from the source table and write it into the target table
•Slow
• C* load utility
•sstableloader
© Pythian, All Rights Reserved. 20
Migrate Thrift Data – Methods, continued
writer = CQLSSTableWriter.build()
thrift_rows = thirft_conn.fetchRows()
for each row in thrift_rows() {
keyValue = row.getKey()
dynColMap = new Map()
thirft_columns = row.getColumns()
for each column in thirft_columns {
colName = column.getName()
if colName is in DynamicColNameSet {
dynColMap.addKeyValuePair(column.name, column.value)
}
else {
staticColVal1 = column.getValue(StaticColName1)
staticColVal2 = column.getValue(StaticColName2)
… … … } } }
writer.addRow(keyValue, staticColVal1, staticColVal2, …, dynColMap)
}
writer.close()
• Read source data
• Thrift API
• Writing SSTables Directly
• SSTableSimpleUnsortedWriter
• CQLSSTableWriter
Migrate Thrift Data - Result
Fetching result from source table and generate SSTable ...
-------------------------
> batch_run: 1; batch_size: 20001; batch_runtime: PT5.594S
> batch_run: 2; batch_size: 20000; batch_runtime: PT4.368S
> batch_run: 3; batch_size: 20000; batch_runtime: PT3.937S
> batch_run: 4; batch_size: 20000; batch_runtime: PT3.838S
> batch_run: 5; batch_size: 20000; batch_runtime: PT2.015S
… … …
> batch_run: 59; batch_size: 20000; batch_runtime: PT1.941S
> batch_run: 60; batch_size: 12581; batch_runtime: PT2.82S
#### Total Records Processed: 1192582; Total Run Time: PT2M35.178S
$ sstableloader -d <node_ip>
<sstable_root>/<keyspace_name>/<target_cql_table_name>/
… … …
Summary statistics:
Connections per host: : 1
Total files transferred: : 15
Total bytes transferred: : 511566477
Total duration (ms): : 47451
Average transfer rate (MB/s): : 10
Peak transfer rate (MB/s): : 20
© Pythian, All Rights Reserved. 22
Conclusion
• C* is moving from Thrift to CQL
–Do it !
• C* schema may not tell you all
• Effective C* data migration is not easy
•C* does have “load” utility (framework)
• C* storage engine changes a lot in 3.0
• Pythian Blog Space:
•https://www.pythian.com/blog/
•An Effective Approach to Migrate Dynamic Thrift Data to CQL (Part 1, Part 2, Part 3)
© DataStax, All Rights Reserved. 23
© DataStax, All Rights Reserved. 24

Más contenido relacionado

La actualidad más candente

Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Julien Le Dem
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesDatabricks
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexingYoshinori Matsunobu
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in ImpalaCloudera, Inc.
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016DataStax
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsDuyhai Doan
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScyllaDB
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla ClusterScyllaDB
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouseAltinity Ltd
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storageMarian Marinov
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loadingalex_araujo
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterScyllaDB
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsBrendan Gregg
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...StreamNative
 
PostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GAPostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GANoriyoshi Shinoda
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...DataStax
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFTDataWorks Summit
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Ltd
 

La actualidad más candente (20)

Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013Parquet Hadoop Summit 2013
Parquet Hadoop Summit 2013
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
More mastering the art of indexing
More mastering the art of indexingMore mastering the art of indexing
More mastering the art of indexing
 
Query Compilation in Impala
Query Compilation in ImpalaQuery Compilation in Impala
Query Compilation in Impala
 
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
Myths of Big Partitions (Robert Stupp, DataStax) | Cassandra Summit 2016
 
Cassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patternsCassandra nice use cases and worst anti patterns
Cassandra nice use cases and worst anti patterns
 
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and BeyondScylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
 
Sizing Your Scylla Cluster
Sizing Your Scylla ClusterSizing Your Scylla Cluster
Sizing Your Scylla Cluster
 
Your first ClickHouse data warehouse
Your first ClickHouse data warehouseYour first ClickHouse data warehouse
Your first ClickHouse data warehouse
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
Comparison of-foss-distributed-storage
Comparison of-foss-distributed-storageComparison of-foss-distributed-storage
Comparison of-foss-distributed-storage
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
 
Performance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla ClusterPerformance Monitoring: Understanding Your Scylla Cluster
Performance Monitoring: Understanding Your Scylla Cluster
 
Linux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old SecretsLinux Performance Analysis: New Tools and Old Secrets
Linux Performance Analysis: New Tools and Old Secrets
 
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
MoP(MQTT on Pulsar) - a Powerful Tool for Apache Pulsar in IoT - Pulsar Summi...
 
PostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GAPostgreSQL 12 New Features with Examples (English) GA
PostgreSQL 12 New Features with Examples (English) GA
 
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
Deletes Without Tombstones or TTLs (Eric Stevens, ProtectWise) | Cassandra Su...
 
High throughput data replication over RAFT
High throughput data replication over RAFTHigh throughput data replication over RAFT
High throughput data replication over RAFT
 
Altinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouseAltinity Quickstart for ClickHouse
Altinity Quickstart for ClickHouse
 

Similar a An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian) | C* Summit 2016

Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightDataStax Academy
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteChris Baynes
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAPMariaDB plc
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingVassilis Bekiaris
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Zohar Elkayam
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...ScyllaDB
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftSnapLogic
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series applicationKévin LOVATO
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®confluent
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cRonald Francisco Vargas Quesada
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...DataStax
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger AnalyticsItzhak Kameli
 
Practical C++ Generative Programming
Practical C++ Generative ProgrammingPractical C++ Generative Programming
Practical C++ Generative ProgrammingSchalk Cronjé
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentationMichael Keane
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Cloudera, Inc.
 
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Lucas Jellema
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)Stratebi
 

Similar a An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian) | C* Summit 2016 (20)

Macy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-FlightMacy's: Changing Engines in Mid-Flight
Macy's: Changing Engines in Mid-Flight
 
Fast federated SQL with Apache Calcite
Fast federated SQL with Apache CalciteFast federated SQL with Apache Calcite
Fast federated SQL with Apache Calcite
 
How to make data available for analytics ASAP
How to make data available for analytics ASAPHow to make data available for analytics ASAP
How to make data available for analytics ASAP
 
Cassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series ModelingCassandra Basics, Counters and Time Series Modeling
Cassandra Basics, Counters and Time Series Modeling
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?Is SQLcl the Next Generation of SQL*Plus?
Is SQLcl the Next Generation of SQL*Plus?
 
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
Scylla Summit 2022: Building Zeotap's Privacy Compliant Customer Data Platfor...
 
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon RedshiftBest Practices for Supercharging Cloud Analytics on Amazon Redshift
Best Practices for Supercharging Cloud Analytics on Amazon Redshift
 
Observer, a "real life" time series application
Observer, a "real life" time series applicationObserver, a "real life" time series application
Observer, a "real life" time series application
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 
Presentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12cPresentación Oracle Database Migración consideraciones 10g/11g/12c
Presentación Oracle Database Migración consideraciones 10g/11g/12c
 
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
Building a Pluggable Analytics Stack with Cassandra (Jim Peregord, Element Co...
 
Big Data, Bigger Analytics
Big Data, Bigger AnalyticsBig Data, Bigger Analytics
Big Data, Bigger Analytics
 
Practical C++ Generative Programming
Practical C++ Generative ProgrammingPractical C++ Generative Programming
Practical C++ Generative Programming
 
In memory databases presentation
In memory databases presentationIn memory databases presentation
In memory databases presentation
 
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
Friction-free ETL: Automating data transformation with Impala | Strata + Hado...
 
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
Triple C - Centralize, Cloudify and Consolidate Dozens of Oracle Databases (O...
 
PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)PCM18 (Big Data Analytics)
PCM18 (Big Data Analytics)
 

Más de DataStax

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?DataStax
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...DataStax
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsDataStax
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphDataStax
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyDataStax
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...DataStax
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache KafkaDataStax
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseDataStax
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0DataStax
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...DataStax
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesDataStax
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDataStax
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudDataStax
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceDataStax
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...DataStax
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...DataStax
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...DataStax
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)DataStax
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsDataStax
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingDataStax
 

Más de DataStax (20)

Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?Is Your Enterprise Ready to Shine This Holiday Season?
Is Your Enterprise Ready to Shine This Holiday Season?
 
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
Designing Fault-Tolerant Applications with DataStax Enterprise and Apache Cas...
 
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid EnvironmentsRunning DataStax Enterprise in VMware Cloud and Hybrid Environments
Running DataStax Enterprise in VMware Cloud and Hybrid Environments
 
Best Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise GraphBest Practices for Getting to Production with DataStax Enterprise Graph
Best Practices for Getting to Production with DataStax Enterprise Graph
 
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step JourneyWebinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
Webinar | Data Management for Hybrid and Multi-Cloud: A Four-Step Journey
 
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...Webinar  |  How to Understand Apache Cassandra™ Performance Through Read/Writ...
Webinar | How to Understand Apache Cassandra™ Performance Through Read/Writ...
 
Webinar | Better Together: Apache Cassandra and Apache Kafka
Webinar  |  Better Together: Apache Cassandra and Apache KafkaWebinar  |  Better Together: Apache Cassandra and Apache Kafka
Webinar | Better Together: Apache Cassandra and Apache Kafka
 
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax EnterpriseTop 10 Best Practices for Apache Cassandra and DataStax Enterprise
Top 10 Best Practices for Apache Cassandra and DataStax Enterprise
 
Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0Introduction to Apache Cassandra™ + What’s New in 4.0
Introduction to Apache Cassandra™ + What’s New in 4.0
 
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
Webinar: How Active Everywhere Database Architecture Accelerates Hybrid Cloud...
 
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud RealitiesWebinar  |  Aligning GDPR Requirements with Today's Hybrid Cloud Realities
Webinar | Aligning GDPR Requirements with Today's Hybrid Cloud Realities
 
Designing a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for DummiesDesigning a Distributed Cloud Database for Dummies
Designing a Distributed Cloud Database for Dummies
 
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid CloudHow to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
How to Power Innovation with Geo-Distributed Data Management in Hybrid Cloud
 
How to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerceHow to Evaluate Cloud Databases for eCommerce
How to Evaluate Cloud Databases for eCommerce
 
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
Webinar: DataStax Enterprise 6: 10 Ways to Multiply the Power of Apache Cassa...
 
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
Webinar: DataStax and Microsoft Azure: Empowering the Right-Now Enterprise wi...
 
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
Webinar - Real-Time Customer Experience for the Right-Now Enterprise featurin...
 
Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)Datastax - The Architect's guide to customer experience (CX)
Datastax - The Architect's guide to customer experience (CX)
 
An Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking ApplicationsAn Operational Data Layer is Critical for Transformative Banking Applications
An Operational Data Layer is Critical for Transformative Banking Applications
 
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design ThinkingBecoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
Becoming a Customer-Centric Enterprise Via Real-Time Data and Design Thinking
 

Último

Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxRTS corp
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptxVinzoCenzo
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolsosttopstonverter
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfRTS corp
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Rob Geurden
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsJean Silva
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesKrzysztofKkol1
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonApplitools
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITmanoharjgpsolutions
 

Último (20)

Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptxReal-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
Real-time Tracking and Monitoring with Cargo Cloud Solutions.pptx
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
Osi security architecture in network.pptx
Osi security architecture in network.pptxOsi security architecture in network.pptx
Osi security architecture in network.pptx
 
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News UpdateVictoriaMetrics Q1 Meet Up '24 - Community & News Update
VictoriaMetrics Q1 Meet Up '24 - Community & News Update
 
eSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration toolseSoftTools IMAP Backup Software and migration tools
eSoftTools IMAP Backup Software and migration tools
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdfEnhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
Enhancing Supply Chain Visibility with Cargo Cloud Solutions.pdf
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...Simplifying Microservices & Apps - The art of effortless development - Meetup...
Simplifying Microservices & Apps - The art of effortless development - Meetup...
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
Strategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero resultsStrategies for using alternative queries to mitigate zero results
Strategies for using alternative queries to mitigate zero results
 
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilitiesAmazon Bedrock in Action - presentation of the Bedrock's capabilities
Amazon Bedrock in Action - presentation of the Bedrock's capabilities
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + KobitonLeveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
Leveraging AI for Mobile App Testing on Real Devices | Applitools + Kobiton
 
Best Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh ITBest Angular 17 Classroom & Online training - Naresh IT
Best Angular 17 Classroom & Online training - Naresh IT
 

An Effective Approach to Migrate Cassandra Thrift to CQL (Yabin Meng, Pythian) | C* Summit 2016

  • 1. Yabin Meng An Effective Approach to Migrate Thrift to CQL
  • 2. About Pythian • Founded in 1997, a global leader in Enterprise IT transformation and operational excellence •www.pythian.com •https://www.linkedin.com/company/pythian • Love your Data •6000+ database under management •Over 400 DBA’s, in 35 countries •Top 5% of DBA workforce – 9 Oracle ACE’s, 2 Microsoft MVP’s, 1 Cassandra MVP •Partners with Amazon, Cloudera, Oracle, Microsoft, DataStax, and a lot more • Specialities: Databases, Cloud, DevOps, Big Data Infrastructure, Advanced Analytics, Infrastructure Management • Industry Recognition •DBTA1 100 2016 - The Companies That Matter Most in Data •IAOP2 - GLOBAL OUTSOURCING 100® RISING STAR FOR 2016 •Top 25 Canadian ICT3 Professional Services Companies •… … DBTA1 : Database Trend and Applications (http://www.dbta.com/) IAOP2 : International Association of Outsourcing Professionals (https://www.iaop.org/) ICT3 : Information & Communication Technology © Pythian, All Rights Reserved. 2
  • 3. About Myself • 15+ years IT experience - relational database/data warehousing, business intelligence and analytics (BIA), NoSQL database/Big Data • Started my career as a RDBMS client application developer (PowerBuilder, Java) • 5 years as IBM DB2 Kernel Developer, New Development (DB2 v9.5, 9.7, 9.8, 10) • Multi-years data warehousing and BIA experience in insurance and capital market sectors as technical lead/architect (IBM/Cognos, SAS/SAS 9.x, Information Builders/WebFocus) • DataStax Certified Developer, Administrator, and Partner Architect • https://www.linkedin.com/in/yabinmeng • @yabinmeng © Pythian, All Rights Reserved. 3
  • 4. 1 Problem Overview 2 Background Discussion 3 Thrift to CQL Transition 4 C* Storage Engine (pre 3.0) 5 Data Migration 6 Conclusion 4© Pythian, All Rights Reserved.
  • 5. Problem Overview • C* cluster upgrade from version 1.2 to 2.1 • Thrift based application (Hector) •Plan to convert to CQL, but not done yet • Data copied from one C* cluster to another •C* COPY command • Application failed in DEV/TEST •Cannot retrieve some required data •cqlsh output shows NO difference • Schema • Data query result © Pythian, All Rights Reserved. 5
  • 6. Background Disccsion, COPY commands • COPY commands •Syntax: COPY <table_name> (column1, column2, …) TO/FROM <file_name> WITH options •Since 1.1.3 (PostgresSQL COPY commands) •Simple yet with limitation. Continuous improvement •Small datasets (max. a few million rows) •CSV format •No primary key check •New options and better performance in cqlsh copy – 2.1.13, 2.2.5, 3.0.3, 3.2 – ex (common): NUMPROCESSES, CONFIGFILE – ex (TO): BEGINTOKEN / ENDTOKEN – ex (FROM): INGESTRATE (rows/second) •Predefined schema © Pythian, All Rights Reserved. 6
  • 7. Background Discussion, Schema-less? • Schema-less or Schema •Many NoSQL DBs is marked as “schema-less” • A lot of debate •C* is schema optional (Thrift) •C* is schema required (CQL) © Pythian, All Rights Reserved. 7
  • 8. Background Discussion, C* Utilities •cassandra-cli utility •Command line interface since very early years of C* •Thrift-based counterpart of “cqlsh” •C* data engine format (pre 3.0) •Deprecated since C* 2.1 and will be removed in 3.0 •Comparator and Validator •Validator: row key, column value •Comparator: column name + column sorting order •sstable2json utility •Deprecated and will be removed in 3.0 SET users['key_val']['column1_name']='some value'; SET users['key_val']['column2_name']='some value'; ... LIST users; GET users['key_val']['column1_name']
  • 9. Thrift to CQL Transition • What is Thrift and CQL? •Thrift is a general framework (interface definition language and binary communication protocol) • "scalable cross-language services development" •APIs to interact with C* •CQL is also a query language © Pythian, All Rights Reserved. 9 Thrift Thrift CQL Binary CQL query language CQL query language N/A * C* evolvement Protocol Layer Access Abstraction Layer C* Data Engine Storage Layer
  • 10. Thrift to CQL Transition, continued • Thrift in C* •Since the beginning as a communication protocol •Low level API •Deprecated • CQL in C* •0.8, CQL as a language •1.2, CQL as a language, version 3 (CQL3) •1.2, CQL as a communication protocol (CQL3 only) •Continuously evolving • Thrift vs. CQL as protocol •Thrift: RPC based; Synchronous •CQL: Server notification (event registration); Asynchronous • CQL as a language •Feature rich: UDF, lightweight transaction, JSON support, materialized view, … • Thrift vs. CQL as client driver •Thrift driver is deprecated and not supported © Pythian, All Rights Reserved. 10
  • 11. Thrift to CQL Transition, continued • What is needed for the transition? •Application •Data Model •Data itself •Understanding C* storage engine * * Pre-3.0 C* © Pythian, All Rights Reserved. 11
  • 12. C* Storage Engine (pre 3.0) • C* ColumnFamily (Thrift) / Table (CQL) data structure •Map of sorted map: Map<RowKey, SortedMap<ColumnKey, ColumnValue>> CREATE TABLE song_tags ( id uuid, tag_name text, PRIMARY KEY (id, tag_name) ) WITH COMPACT STORAGE © Pythian, All Rights Reserved. 12
  • 13. C* Storage Engine – Thrift vs. CQL Terminology • Table –Thrift: Column Family –CQL: Table • Row –Thrift: Storage Row / Partition –CQL: Logic Representation / Part of a partition • Column –Thrift: Cell (column key + column value) –CQL: Metadata / column key or column value © Pythian, All Rights Reserved. 13
  • 14. C* Storage Engine – Compact Storage CREATE TABLE avg_grade2 ( student_id int, class_id int, grade double, PRIMARY KEY (student_id, class_id) ) WITH COMPACT STORAGE © Pythian, All Rights Reserved. 14 • Storage space vs. table metadata – Backward compatibility – Thrift: always compact – CQL: optional • Caveats: – No adding / dropping column – Single non-key column if compound primary key – Non-frozen collection type
  • 15. C* Storage Engine – Static Table Definition create column family avg_grade with key_validation_class = Int32Type and comparator = UTF8Type and column_metadata = [ {column_name: class_id, validation_class: Int32Type}, {column_name: grade, validation_class: FloatType} ] create table avg_grade ( key int primary key, class_id int, grade float ) with compact storage •Equivalent definition •Thrift: no name for key -- default name used in CQL: “key” © Pythian, All Rights Reserved. 15 Thrift / cassandra_cli
  • 16. C* Storage Engine – Dynamic Table Definition Thrift / cassandra_cli create column family sensor_data with key_validation_class = Int32Type and comparator = TimeUUIDType and default_validation_class = DecimalType; ------------------- RowKey: 1 => (name=1171d7e0-14d2-11e6-858b-5f3b22f4f11c, value=21.5, timestamp=1462680314974000) => (name=23371940-14d2-11e6-858b-5f3b22f4f11c, value=32.1, timestamp=1462680344792000) ------------------- RowKey: 2 => (name=7537fcf0-14d2-11e6-858b-5f3b22f4f11c, value=11.0, timestamp=1462680482367000) • Thrift only • No “column_metadata” section © Pythian, All Rights Reserved. 16 CREATE TABLE sensor_data ( key int, column1 timeuuid, value decimal, PRIMARY KEY (key, column1) ) WITH COMPACT STORAGE •Default “key” for key name •Default “column1” for column name/key •Default “value” for column value •Alter Table
  • 17. C* Storage Engine – Mixed Table Definition Thrift / cassandra_cli create column family blogs with key_validation_class = Int32Type and comparator = UTF8Type and column_metadata = [ {column_name: author, validation_class: UTF8Type} ] © Pythian, All Rights Reserved. 17 CREATE TABLE blogs ( key int PRIMARY KEY, author text ) WITH COMPACT STORAGE key | author -----+-------- 1 | Donald 2 | John ------------------- RowKey: 1 => (name=author, value=Donald, timestamp=1462720696717000) => (name=tags:category, value=music, timestamp=1462720526053000) => (name=tags:rating, value=A, timestamp=1462720564590000) => (name=tags:recommend, value=Yes, timestamp=1462720640817000) ------------------- RowKey: 2 => (name=author, value=John, timestamp=1462720690911000)
  • 18. Migrate Thrift Data – Overview • Do not fully trust C* schema •“show schema” / “describe keyspace” • Sample your data •“cassandra-cli”, yes •“cqlsh”, no • Review application design and code •Schema-less / schema -optional • Determine CQL schema • Data Migration Method © Pythian, All Rights Reserved. 18
  • 19. Migrate Thrift Data – Determine CQL Schema Thrift / cassandra-cli create column family some_tbl_name with key_validation_class = BytesType and comparator = UTF8Type and column_metadata = [ {column_name: StaticColName1, validation_class: xxx1}, {column_name: StaticColName2, validation_class: xxx2} {column_name: StaticColName3, validation_class: xxx3} … … ] StaticColNameSet = {StaticColName1, StaticColName2, StaticColName3, …} DynamicColNameSet = {DynamicColName1, DynmaicColName2, DyanmicColName3, …} CQL / cqlsh schema • Application specific • Could it be a little bit more generic? CREATE TABLE some_tbl_name ( key blob, StaticColName1 yyy1, StaticColName2 yyy2, StaticColName3 yyy3, … …, DynamicColMapName map<text, blob>; PRIMARY KEY (key) ) © Pythian, All Rights Reserved. 19
  • 20. Migrate Thrift Data - ETL Method • ETL •COPY command •Commercial product (jasper soft, pentaho, …) •Self developed • Natural Approach •Read every record from the source table and write it into the target table •Slow • C* load utility •sstableloader © Pythian, All Rights Reserved. 20
  • 21. Migrate Thrift Data – Methods, continued writer = CQLSSTableWriter.build() thrift_rows = thirft_conn.fetchRows() for each row in thrift_rows() { keyValue = row.getKey() dynColMap = new Map() thirft_columns = row.getColumns() for each column in thirft_columns { colName = column.getName() if colName is in DynamicColNameSet { dynColMap.addKeyValuePair(column.name, column.value) } else { staticColVal1 = column.getValue(StaticColName1) staticColVal2 = column.getValue(StaticColName2) … … … } } } writer.addRow(keyValue, staticColVal1, staticColVal2, …, dynColMap) } writer.close() • Read source data • Thrift API • Writing SSTables Directly • SSTableSimpleUnsortedWriter • CQLSSTableWriter
  • 22. Migrate Thrift Data - Result Fetching result from source table and generate SSTable ... ------------------------- > batch_run: 1; batch_size: 20001; batch_runtime: PT5.594S > batch_run: 2; batch_size: 20000; batch_runtime: PT4.368S > batch_run: 3; batch_size: 20000; batch_runtime: PT3.937S > batch_run: 4; batch_size: 20000; batch_runtime: PT3.838S > batch_run: 5; batch_size: 20000; batch_runtime: PT2.015S … … … > batch_run: 59; batch_size: 20000; batch_runtime: PT1.941S > batch_run: 60; batch_size: 12581; batch_runtime: PT2.82S #### Total Records Processed: 1192582; Total Run Time: PT2M35.178S $ sstableloader -d <node_ip> <sstable_root>/<keyspace_name>/<target_cql_table_name>/ … … … Summary statistics: Connections per host: : 1 Total files transferred: : 15 Total bytes transferred: : 511566477 Total duration (ms): : 47451 Average transfer rate (MB/s): : 10 Peak transfer rate (MB/s): : 20 © Pythian, All Rights Reserved. 22
  • 23. Conclusion • C* is moving from Thrift to CQL –Do it ! • C* schema may not tell you all • Effective C* data migration is not easy •C* does have “load” utility (framework) • C* storage engine changes a lot in 3.0 • Pythian Blog Space: •https://www.pythian.com/blog/ •An Effective Approach to Migrate Dynamic Thrift Data to CQL (Part 1, Part 2, Part 3) © DataStax, All Rights Reserved. 23
  • 24. © DataStax, All Rights Reserved. 24