Migration Best Practices: From RDBMS to Cassandra without a Hitch

Migration Best Practices: From RDBMS to Cassandra without a Hitch
#Cassandra @doanduyhai

Who am I ?
2
DuyHai Doan
Achilles
Cassandra Technical Advocate @ Datastax
Former Java Developer @ Libon

Agenda
•  Libon context
•  Migration strategy
•  Business code migration
•  Data Modeling
•  Take Away
3

Libon Context

What is Libon ?
•  Messaging app
•  VOIP (out)
•  Custom voicemail & greetings
•  SMS/chat/ﬁle transfer
•  Contacts matching
5

Contact Matching
6
Libon User

Contact Matching
7
Libon User Friend

Contact Matching
8
Libon User Friend
Contact matching

Contact Matching
9
Libon User Friend
Accept link

Project Context
•  Application grew over the years
10

Project Context
•  Application grew over the years
•  Already using Cassandra to handle events
•  messaging / ﬁle sharing / SMS / notiﬁcations
•  Cassandra R/W latencies ≈ 0,4 ms
•  server response time under 10 ms
11

Project Context
•  About contacts …
12

Project Context
•  stored as relational model in RDBMS (Oracle)
13

Project Context
•  1 user ≈ 300 contacts
14

Project Context
•  with millions users ☞ billions of contacts to handle
15

Project Context
•  with millions users ☞ billions of contacts to handle
•  query latency unpredictable
16

Fixing the problem
•  Tune the RDBMS
18

Fixing the problem
•  indices
19

Fixing the problem
•  indices
•  partitioning
20

Fixing the problem
•  indices
•  partitioning
•  less joins, simpliﬁed relational model
21

Fixing the problem
•  indices
•  partitioning
•  hardware capacity increased
22

Fixing the problem
•  indices
•  partitioning
23
That worked

Fixing the problem
•  indices
•  partitioning
24
That worked
but …

Back-end application
RDBMS Cassandra
25

Back-end application
RDBMS Cassandra
26
We need to
choose

Next Challenges
•  High Availability (DB failure, site failure …)
27

Next Challenges
•  Predictable performance at scale
28

Next Challenges
•  Going to multi data-centers
29

Next Challenges
•  Going to multi data-centers
☞ Cassandra, what else ?
30

Data Migration Strategy

Objectives
•  No downtime
32

Objectives
•  No downtime
•  No concurrency corner-cases
33

Objectives
•  No downtime
•  Safe rollback possible
34

Objectives
•  No downtime
•  Safe rollback possible
•  Replay-ability & resume-ability
35

Strategy
•  4 phases
36

Strategy
•  4 phases
•  Write contacts to both data stores
37

Strategy
•  4 phases
•  Old contacts migration
38

Strategy
•  4 phases
•  Switch to Cassandra (but keep RDBMS in case of…)
39

Strategy
•  4 phases
•  Switch to Cassandra (but keep RDBMS in case of…)
•  Remove the RDBMS code
40

Migration Phase 1
41
Back end server
·
·
·
SQLSQLSQL
C*
C*
C*
C*
C*
Write
contactUUID
contactId … contactUUID
129363
123e4567-
e89b-12d3…
834849
contacId(long) + contactUUID

Migration Phase 1
42
Back end server
·
·
·
SQLSQLSQL
C*
C*
C*
C*
C*
Read

Migration Phase 2
•  On live production, migrate old contacts
43
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Old contacts created
before phase 1

Migration Phase 2
•  On live production, migrate old contacts
44
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Logged batches of
INSERT INTO contacts(..)
VALUES(…)
USING TIMESTAMP
now() - 1 week
Old contacts created
before phase 1

Migration Phase 2
45
USING TIMESTAMP now() - 1 week 😳

Migration Phase 2
•  During data migration …
46

Migration Phase 2
•  … concurrent writes from the migration batch …
47

Migration Phase 2
•  … concurrent writes from the migration batch …
•  … and updates from production for the same contact
48

Migration Phase 2
49
contact_uuid
name (now -1 week)
…
name (now)
…
Johny …
Johnny …
Insert from batch
(to the past)
Update from production

Migration Phase 2
50
contact_uuid
name (now -1 week)
…
name (now)
…
Johny …
Johnny …
Future reads pick the most up-to-date value

Last Write Win in action
51
Case 1 Case 2
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)

Migration Phase 2
52
"Write to the Past…
to save the Future"
Libon – 2014/10/08

Migration Phase 3
53
Back end server
·
·
·
SQLSQLSQL
C*
C*
C*
C*
C*
Write

Migration Phase 4
54
Back end server
·
·
·
SQLSQLSQL
C*
C*
C*
C*
C*
Write
❌

Business Code Refactoring

Code Inventory
•  Written for RDBMS
56

Code Inventory
•  Lots of joins (no surprise)
57

Code Inventory
•  Designed around transactions
58

Code Inventory
•  Designed around transactions
•  Spring @Transactional everywhere
59

Code Inventory cont.
•  Entities go through Services & Repositories
60
Repositories

Services
ContactEntity

•  Hibernate is auto-magic
61

•  Hibernate is auto-magic
•  lazy loading
•  1st level cache
•  N+1 select
62
Repositories

Services
ContactEntity

Which options ?
•  Throw existing code …
•  … and re-design from scratch for Cassandra

63

Which options ?
•  Throw existing code …
•  … and re-design from scratch for Cassandra

64
No way !

Code Quality
•  Existing business code has…
•  … ≈ 3500 unit tests

65

Code Quality
•  Existing business code has…
•  … ≈ 3500 unit tests
•  and ≈600+ integration tests
66

Code Quality
67
"The code coverage
is one of your most
valuable technical asset"
Libon – since beginning

Repositories
Services
Refactoring Strategy
68
ContactMatchingService
ContactService
ContactSync
ContactEntity
n
1
n
n

Repositories
Services
69
ContactService
ContactNoSQLEntity
ContactSync
ContactEntity
n
1
n
n
Proxy

Repositories
Services
70
ContactService
ContactNoSQLEntity
ContactSync
ContactEntity
n
1
n
n
Denorm2
…
DenormN
Denorm1
Proxy

•  Use CQRS
•  ContactReadRepository
•  ContactWriteRepository
•  ContactUpdateRepository
•  ContactDeleteRepository
71

•  ContactReadRepository
•  direct sequential read
•  no joins
•  1 read ≈ 1 SELECT
72

•  ContactWriteRepository
•  write to all denormalized tables
•  using CQL logged batches
•  use TTLs
73

•  ContactUpdateRepository
•  read-before-write most of the time 😟
•  rare updates ☞ acceptable perf penalty
74

•  ContactDeleteRepository
•  delete by partition key
75

Outcome
•  5 months of 2 men work
76

Outcome
•  Many iterations to ﬁx bugs (thanks to IT)
77

Outcome
•  Lots of performance benchmarks using Gatling
78

Gatling Output
79

Outcome

☞ data model & code validation
80

Outcome

☞ data model & code validation
•  … we are almost there for production
81

Data Model

Denormalization, the good
•  Support fast reads
•  1 read ≈ 1 SELECT
•  Worthy because mostly read, few updates
83

Denormalization, the bad
•  Updating mutable data can be nightmare
•  Data model bound by existing client-facing API
•  Update paths very error-prone without tests
84

Data model in detail
85
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user

86
Contacts_by_id
user_id always component
of partition key

Scalable design
87
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1
user_id2
user_id3
user_id4
user_id5

Scalable design
88
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H
user_id1user_id2
user_id3
user_id4
user_id5

Bloom filters in action
•  For some tables, partition key = (user_id, contact_id)
☞ fast look-up, leverages Bloom ﬁlters
☞ touches 1 SSTable most of the time
89

90
Contacts_by_id
Wide partition

A "queue" story
•  contacts_by_modiﬁcation_date
•  queue-like pattern 😭
91

A "queue" story
•  contacts_by_modiﬁcation_date
•  queue-like pattern 😭
☞ buckets to the rescue
92
user_id:2014-12
date35
date12 …
…
date47
… …
…
…
user_id:2014-11
date11
date12 …
…
date34
… …
…
…

Data model summary
•  7 tables for denormalization
93

Data model summary
•  Normalize some tables because rare access
94

Data model summary
•  Normalize some tables because rare access
•  Read-before write in most update scenarios 😟
95

Notes on contact_id
•  In SQL, auto-generated long using sequence
•  In Cassandra, auto-generated timeuuid
96

Notes on contact_id
•  How to store both types ?
97

Notes on contact_id
•  As text ? ☞ easy solution …
98

Notes on contact_id
•  As text ? ☞ easy solution …
•  … but waste of space !
•  because encoded as UTF-8 or ASCII in Cassandra
99

Notes on contact_id
•  Long ☞ 8 bytes
•  Long as text(UTF-8: 1 byte) ☞ "digits count" bytes
100

Notes on contact_id
•  UUID ☞ 16 bytes
E81D4C70-A638-11E4-83CB-DEB70BF9330F
•  32 hex chars + 4 hyphens = 36 chars
•  UUID as text(UTF-8: 1 byte) ☞ 36 bytes
•  Bytes overhead = 36 – 16 = 20 bytes
101

Notes on contact_id
•  20 bytes wasted per contact uuid
102

Notes on contact_id
•  × 7 denormalizations = 140 bytes per contact uuid
103

Notes on contact_id
•  × 7 denormalizations = 140 bytes per contact uuid
•  × 109 contacts = 140 GB wasted
104
😠
not even counting replication factor …

Notes on contact_id
•  ☞ just save contact id as byte[ ]
105

Notes on contact_id
•  Achilles @TypeTransformer for automatic conversion
(see later)
106

Notes on contact_id
•  Achilles @TypeTransformer for automatic conversion
(see later)
•  Use blobAsBigInt( ) or blobAsUUID( ) to view data
107

Achilles
•  Advanced "object mapper"
•  Fluent API
•  Tons of features
•  TDD friendly
108

Achilles
•  Dirty checking, what is it ?
109

Achilles
•  1 contact ≈ 8 mutable ﬁelds
110

Achilles
•  × 7 denormalizations = 56 update combinations …
111

Achilles
•  × 7 denormalizations = 56 update combinations …
•  and not even counting multiple ﬁelds updates …
112

Achilles
•  Are you going to manually generate 56+ prepared
statements for all possible updates ?
113

Achilles
•  Are you going to manually generate 56+ prepared
statements for all possible updates ?

•  Or just use dynamic plain string statements and get
some perf penalty ?
114

Achilles
•  Dirty check in action
115

//No read-before-write

ContactEntity proxy = manager.forUpdate(ContactEntity.class, contactId);

proxy.setFirstName(…);

proxy.setLastName(…); //type-safe updates

proxy.setAddress(…);

manager.update(proxy);

Achilles
116
Empty
Entity
DirtyMap
Proxy Setters interception
PrimaryKey

Achilles
•  Dynamic statements generation
117

UPDATE contacts SET firstname=?, lastname=?,address=?

WHERE contact_id=?
prepared statements are cached, of course

Achilles
•  Insert strategy, why is it so important ?
118

Achilles
•  Simple INSERT prepared statement
119

INSERT INTO

contacts(contact_id,name,age,address,gender,avatar,…)

VALUES(?, ?, ?, ? … ?);

Achilles
•  Runtime values binding
•  some columns are optional
120

preparedStatement.bind(49374,’John DOE’,33, null, null, …, null);

Achilles
121
Wait … are you saying inserting null in CQL???
😳

Achilles
122
Inserting null creating tombstones

Achilles
123
× 7 denormalizations

Achilles
124
× 7 denormalizations
× billions of contacts created
😱
not even counting replication factor …

Achilles
•  Simple annotation
125

@Entity(table = "contacts_by_id »)

@Strategy(insert = InsertStrategy.NOT_NULL_FIELDS)

public class ContactById {

}

Achilles
•  Runtime dynamic INSERT statement
126

INSERT INTO

contacts(contact_id, name, age, address,)

VALUES(:contact_id, :name, :age, :address);
prepared statements are cached, of course

Achilles
•  Remember the contactId ⇄ byte[ ] conversion ?
127
@PartitionKey
@Column(name = "contact_id")
@TypeTransformer(valueCodecClass = ContactIdToBytes.class)
private ContactId contactId;
BYOC ☞ Bring Your Own Codec

Achilles
128
public interface Codec<FROM, TO> {

Class<FROM> sourceType();

Class<TO> targetType();

TO encode(FROM fromJava)

FROM decode(TO fromCassandra);
}

Achilles
•  Dynamic logging in action
129

2014-12-01 14:25:20,554 Bound statement : [INSERT INTO
contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES
(:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM]

2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]

2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM
contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND
(modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL

[LOCAL_QUORUM]

2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]

Achilles
•  Dynamic logging
•  runtime activation
•  no need to recompile/re-deploy
•  save us hours of debugging
•  TRACE log level ☞ query tracing
130

Take Away

Conditions for success
•  Data modeling is crucial
132

•  Double-run strategy & timestamp trick FTW
133

•  Data type conversion can be tricky
134

•  Benchmark !
135

•  Benchmark !
•  Mindset shifts for the team
136

Thank You
! ""

Migration Best Practices: From RDBMS to Cassandra without a Hitch

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Migration Best Practices: From RDBMS to Cassandra without a Hitch

Similar a Migration Best Practices: From RDBMS to Cassandra without a Hitch (20)

Más de DataStax Academy

Más de DataStax Academy (20)

Último

Último (20)

Migration Best Practices: From RDBMS to Cassandra without a Hitch