Presenter: Duy Hai Doan, Technical Advocate at Datastax
Libon is a messaging service designed to improve mobile communications through free calls, chat and a voicemail services regardless of operator or Internet access provider. As a mobile communications application, Libon processes billions of messages and calls while backing up billions of contact data. Join this webinar to learn best practices and pitfalls to avoid when tackling a migration project from Relational Database (RDBMS) to Cassandra and how Libon is now able to ingest massive volumes of high velocity data with read and write latency below 10 milliseconds.
11. #Cassandra @doanduyhai
Project Context
• Application grew over the years
• Already using Cassandra to handle events
• messaging / file sharing / SMS / notifications
• Cassandra R/W latencies ≈ 0,4 ms
• server response time under 10 ms
11
15. #Cassandra @doanduyhai
Project Context
• About contacts …
• stored as relational model in RDBMS (Oracle)
• 1 user ≈ 300 contacts
• with millions users ☞ billions of contacts to handle
15
16. #Cassandra @doanduyhai
Project Context
• About contacts …
• stored as relational model in RDBMS (Oracle)
• 1 user ≈ 300 contacts
• with millions users ☞ billions of contacts to handle
• query latency unpredictable
16
22. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
22
23. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
23
That worked
24. #Cassandra @doanduyhai
Fixing the problem
• Tune the RDBMS
• indices
• partitioning
• less joins, simplified relational model
• hardware capacity increased
24
That worked
but …
30. #Cassandra @doanduyhai
Next Challenges
• High Availability (DB failure, site failure …)
• Predictable performance at scale
• Going to multi data-centers
☞ Cassandra, what else ?
30
39. #Cassandra @doanduyhai
Strategy
• 4 phases
• Write contacts to both data stores
• Old contacts migration
• Switch to Cassandra (but keep RDBMS in case of…)
39
40. #Cassandra @doanduyhai
Strategy
• 4 phases
• Write contacts to both data stores
• Old contacts migration
• Switch to Cassandra (but keep RDBMS in case of…)
• Remove the RDBMS code
40
43. #Cassandra @doanduyhai
Migration Phase 2
• On live production, migrate old contacts
43
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Old contacts created
before phase 1
44. #Cassandra @doanduyhai
Migration Phase 2
• On live production, migrate old contacts
44
SQLSQLSQL
C*
C*
C*
C*
C*
For each batch of users
SELECT * FROM contacts
WHERE user_id = …
AND contact_uuid IS NULL
Logged batches of
INSERT INTO contacts(..)
VALUES(…)
USING TIMESTAMP
now() - 1 week
Old contacts created
before phase 1
48. #Cassandra @doanduyhai
Migration Phase 2
• During data migration …
• … concurrent writes from the migration batch …
• … and updates from production for the same contact
48
49. #Cassandra @doanduyhai
Migration Phase 2
49
contact_uuid
name (now -1 week)
…
name (now)
…
Johny …
Johnny …
Insert from batch
(to the past)
Update from production
51. #Cassandra @doanduyhai
Last Write Win in action
51
Case 1 Case 2
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)
Batchpast(Johny)
t1
Prodnow(Johnny)
t2
t3
Read(Johnny)
59. #Cassandra @doanduyhai
Code Inventory
• Written for RDBMS
• Lots of joins (no surprise)
• Designed around transactions
• Spring @Transactional everywhere
59
78. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
78
80. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
☞ data model & code validation
80
81. #Cassandra @doanduyhai
Outcome
• 5 months of 2 men work
• Many iterations to fix bugs (thanks to IT)
• Lots of performance benchmarks using Gatling
☞ data model & code validation
• … we are almost there for production
81
84. #Cassandra @doanduyhai
Denormalization, the bad
• Updating mutable data can be nightmare
• Data model bound by existing client-facing API
• Update paths very error-prone without tests
84
85. #Cassandra @doanduyhai
Data model in detail
85
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
86. #Cassandra @doanduyhai
Data model in detail
86
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
user_id always component
of partition key
89. #Cassandra @doanduyhai
Bloom filters in action
• For some tables, partition key = (user_id, contact_id)
☞ fast look-up, leverages Bloom filters
☞ touches 1 SSTable most of the time
89
90. #Cassandra @doanduyhai
Data model in detail
90
Contacts_by_id
Contacts_by_identifiers
Contacts_in_profiles
Contacts_by_modification_date
Contacts_by_firstname_lastname
Contacts_linked_user
Wide partition
95. #Cassandra @doanduyhai
Data model summary
• 7 tables for denormalization
• Normalize some tables because rare access
• Read-before write in most update scenarios 😟
95
96. #Cassandra @doanduyhai
Notes on contact_id
• In SQL, auto-generated long using sequence
• In Cassandra, auto-generated timeuuid
96
99. #Cassandra @doanduyhai
Notes on contact_id
• How to store both types ?
• As text ? ☞ easy solution …
• … but waste of space !
• because encoded as UTF-8 or ASCII in Cassandra
99
106. #Cassandra @doanduyhai
Notes on contact_id
• ☞ just save contact id as byte[ ]
• Achilles @TypeTransformer for automatic conversion
(see later)
106
107. #Cassandra @doanduyhai
Notes on contact_id
• ☞ just save contact id as byte[ ]
• Achilles @TypeTransformer for automatic conversion
(see later)
• Use blobAsBigInt( ) or blobAsUUID( ) to view data
107
114. #Cassandra @doanduyhai
Achilles
• Are you going to manually generate 56+ prepared
statements for all possible updates ?
• Or just use dynamic plain string statements and get
some perf penalty ?
114
129. #Cassandra @doanduyhai
Achilles
• Dynamic logging in action
129
2014-12-01 14:25:20,554 Bound statement : [INSERT INTO
contacts.contacts_by_modification_date(user_id,month_bucket,modification_date,...) VALUES
(:user_id,:month_bucket,:modification_date,...) USING TTL :ttl;] with CONSISTENCY LEVEL [LOCAL_QUORUM]
2014-12-01 14:25:20,554 bound values : [222130151, 2014-12, e13d0d50-7965-11e4-af38-90b11c2549e0, ...]
2014-12-01 14:25:20,701 Bound statement : [SELECT birthday,middlename,avatar_size,... FROM
contacts.contacts_by_modification_date WHERE user_id=:user_id AND month_bucket=:month_bucket AND
(modification_date)>=(:modification_date) ORDER BY modification_date ASC;] with CONSISTENCY LEVEL
[LOCAL_QUORUM]
2014-12-01 14:25:20,701 bound values : [222130151, 2014-10, be6bc010-6109-11e4-b385-000038377ead]
130. #Cassandra @doanduyhai
Achilles
• Dynamic logging
• runtime activation
• no need to recompile/re-deploy
• save us hours of debugging
• TRACE log level ☞ query tracing
130
134. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
134
135. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
• Benchmark !
135
136. #Cassandra @doanduyhai
Conditions for success
• Data modeling is crucial
• Double-run strategy & timestamp trick FTW
• Data type conversion can be tricky
• Benchmark !
• Mindset shifts for the team
136