Speaker: Eric Zoerner, Senior Software Developer at eBuddy
Video: http://www.youtube.com/watch?v=fwgCJ2MzakA&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=12
In this session you'll learn about the design and implementation of a new open source general-purpose Java library that supports storing structured data in Cassandra. Instead of mapping the data to multiple tables like an ORM would or embedding data using serialization, this approach decomposes structured data of arbitrary complexity into separate columns of simple values, allowing the data to be retrieved or updated in parts using hierarchical paths. Implementations are included for Cassandra using both the Thrift and CQL3 APIs. In addition, Eric's experiences are shared regarding the challenges of using CQL3 vs. Thrift for schema-less data.
6. Cassandra in
eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
#CASSANDRAEU
CASSANDRASUMMITEU
7. Cassandra in
eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
#CASSANDRAEU
CASSANDRASUMMITEU
8. Cassandra in
eBuddy Messaging Platform
• User Data Service
• User Discovery Service
• Persistent Session Store
• Message History
• Location-based Discovery
#CASSANDRAEU
CASSANDRASUMMITEU
9. Some Statistics
• Current size of data
– 1,4 TB total (replication of 3x); 467 GB actual data
!
• 12 million sessions (11 million users plus groups)
!
• Almost a billion rows in one column family
(inverse social graph)
#CASSANDRAEU
CASSANDRASUMMITEU
16. C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/
ebuddy/c-star-path
!
!
*
Artifacts available at Maven Central.
#CASSANDRAEU
CASSANDRASUMMITEU
18. C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
#CASSANDRAEU
CASSANDRASUMMITEU
19. C* Path: Decomposition
• Easy to Use • Simple API
• Good for Cassandra because:
– Structural Access: Write parts of objects without reading first
– Good for denormalizing data, can read or write large complex
objects with one read or write operation
#CASSANDRAEU
CASSANDRASUMMITEU
20. How does it work?
#CASSANDRAEU
CASSANDRASUMMITEU
21. API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
#CASSANDRAEU
CASSANDRASUMMITEU
22. API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
#CASSANDRAEU
CASSANDRASUMMITEU
23. API Example - Write to a Path
StructuredDataSupport<UUID> dao = … ;
UUID rowKey = … ;
Pojo pojo = … ;
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
dao.writeToPath(rowKey, path, pojo);
#CASSANDRAEU
CASSANDRASUMMITEU
24. API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!
#CASSANDRAEU
CASSANDRASUMMITEU
25. API Example - Read from a Path
!
Path path = dao.createPath(“some”, “path”,
”to”,”my”,”pojo”);
!
!
Pojo pojo = dao.readFromPath(rowKey, path,
new TypeReference<Pojo>() { });
#CASSANDRAEU
CASSANDRASUMMITEU
26. API Example - Delete
!
!
dao.deletePath(rowKey, path);
#CASSANDRAEU
CASSANDRASUMMITEU
28. Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!
#CASSANDRAEU
CASSANDRASUMMITEU
29. Read or write at any level of a path
Person person = …;
!
Path path = dao.createPath(“x”);
dao.writeToPath(rowKey, path, person);
!
Path pathToName =
path.withElements(“name”);
String name = dao.readFromPath(rowKey,
pathToName, stringTypeReference);
#CASSANDRAEU
CASSANDRASUMMITEU
30. Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations
#CASSANDRAEU
CASSANDRASUMMITEU
31. Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer
#CASSANDRAEU
CASSANDRASUMMITEU
32. Write Implementation: Decomposition
• Step 1:
– Convert domain object into basic structure of Maps, Lists, and
simple values. Uses the jackson (fasterxml) library for this and
honors the jackson annotations
• Step 2:
– Decompose this basic structure into a map of paths to simple
values (i.e. String, Number, Boolean), done by Decomposer
• Step 3:
– Write this map as key-value pairs in the database
#CASSANDRAEU
CASSANDRASUMMITEU
33. Example Decomposition - step 1
Person
name: String
birthdate: Date
nickname: String
*
1
Address
street: String
city: String
province: String
postalCode: String
countryCode: String
Simplify structure into regular
Maps, Lists, and simple values
1
*
Phone
name: String
number: String
#CASSANDRAEU
CASSANDRASUMMITEU
37. Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure
(Maps, Lists, simple values), done by Composer
#CASSANDRAEU
CASSANDRASUMMITEU
38. Read implementation: Composition
• Step 1:
– Read path-value pairs from database
• Step 2:
– “Merge” path-value maps back into basic structure
(Maps, Lists, simple values), done by Composer
• Step 3:
– Use Jackson to convert basic structure back into domain object
using a TypeReference
#CASSANDRAEU
CASSANDRASUMMITEU
40. Path Encoding
• Paths stored as strings
• Forward slashes in paths (but hidden by Path API)
• Path elements are internally URL encoded allowing
use of special characters in the implementation
• Special characters: @ for list indices
(@0, @1, @2, ...)
#CASSANDRAEU
CASSANDRASUMMITEU
45. Challenge: “Shrinking Lists”
✔
Solution:
Implementation writes a list
terminator value.
Unfortunately, this is only a partial solution, because it is still possible to
read “stale” list elements using a positional index in the path.
!
This can be avoided by doing a delete before a write, but for performance
reasons the library will not do that automatically.
!
Conclusion: The user must know what they are doing and understand the
implementation.
#CASSANDRAEU
CASSANDRASUMMITEU
46. Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);
#CASSANDRAEU
x/address/street/
“Singel 45”
x/name/
“John”
CASSANDRASUMMITEU
47. Challenge: Inconsistent Updates
Because objects can be updated at any path, there is no
protection against a write “corrupting” an object
structure
Path path = dao.createPath(“x”);
dao.writeToPath(key, path, person1);
x/address/street/
“Singel 45”
x/name/
“John”
x/address/street/
path = dao.createPath(“x”,”name”);
dao.writeToPath(key, path, person1);
✘
#CASSANDRAEU
“Singel 45”
x/name/
“John”
x/name/address/street/ “Singel 45”
x/name/name/
“John”
CASSANDRASUMMITEU
48. Challenge: Inconsistent Updates
✔
Solution:
Don’t do that!
* If it does happen...
!
The implementation provides a way to still get the “corrupted” data as
simple structures, but an attempt to convert to a now incompatible POJO
will fail.
Conclusion: The user must know what they are doing and understand
the implementation.
#CASSANDRAEU
CASSANDRASUMMITEU
49. Issue: Sorting
Question:
What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?
!
!
#CASSANDRAEU
CASSANDRASUMMITEU
50. Issue: Sorting
Question:
What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?
!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.
!
#CASSANDRAEU
CASSANDRASUMMITEU
51. Issue: Sorting
Question:
What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?
!
Instead of storing paths as strings, the implementation
could have used DynamicComposite.
!
We tried it.
#CASSANDRAEU
CASSANDRASUMMITEU
52. Issue: Sorting
Question:
What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?
!
It can work. CQL supports it as a user-defined type.
!
Unfortunately it causes cqlsh to crash, making it
difficult to “browse” the data.
#CASSANDRAEU
CASSANDRASUMMITEU
53. Issue: Sorting
Question:
What about sorting path elements as something other
than strings, such as numerical or time-based UUID
elements?
!
It is still in consideration to use DynamicComposite for
paths in a future version.
#CASSANDRAEU
CASSANDRASUMMITEU
55. Thrift
row key
column value
column name
“Singel 45”
“John”
…
column family
x/address/street/
x/name
<UUID>
…
- OR super column name
row key
x
<UUID>
super column family
!
(coming soon)
#CASSANDRAEU
address/street/
“Singel 45”
name
“John”
…
…
CASSANDRASUMMITEU
56. Thrift
Thrift implementation relies on the Hector client.
ColumnFamilyOperations<K,String,Object> operations =
new ColumnFamilyTemplate<K,String,Object>(
keyspace,KeySerializer,StringSerializer,StructureSerializer);
!
!
!
!
StructuredDataSupport<K> dao = new ThriftStructuredDataSupport<K>(operations);
#CASSANDRAEU
CASSANDRASUMMITEU
57. CQL
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)
• Cannot use the path itself as a column name because it
is “dynamic”
• Dynamic column family
#CASSANDRAEU
CASSANDRASUMMITEU
58. CQL: Data Model Constraints
CREATE TABLE person (
key text,
path text,
value text,
PRIMARY KEY (key, path)
)
•
Need to do a range (“slice”) query on the path
path must be a clustering key
•
Also, the path must be the first clustering key, since otherwise we would need to
have to provide an equals condition on previous clustering keys in a query.
•
One might try putting a secondary index on the path instead of making it a
clustering key, but this doesn’t work since Cassandra indexes only work with
equals conditions
Bad Request: No indexed columns present in by-columns clause with Equal operator
#CASSANDRAEU
CASSANDRASUMMITEU
59. CQL
CQL implementation relies on the DataStax Java driver.
!
StructuredDataSupport<K> dao =
new CqlStructuredDataSupport<K>(String tableName,
String partitionKeyColumnName,
String pathColumnName,
String valueColumnName,
Session session);
#CASSANDRAEU
CASSANDRASUMMITEU
61. Planned Features
• Sets with simple values: element
values stored in path
• DynamicComposites?
• Multiple row reads and writes
• Slice queries on path ranges
#CASSANDRAEU
CASSANDRASUMMITEU
62. Credits and Acknowledgements
•
Thanks to Joost van de Wijgerd at eBuddy for his ideas and feedback
•
jackson JSON Processor, which is core to the C* Path implementation
http://wiki.fasterxml.com/JacksonHome
•
Image credits:
Slide
image name
author
link
Some Strategies
binary
noegranado
http://www.flickr.com/photos/
43360884@N04/6949896929/
#CASSANDRAEU
CASSANDRASUMMITEU
63. C* Path
Open Source Java Library for decomposing
complex objects into Path-Value pairs —
and storing them in Cassandra
https://github.com/
ebuddy/c-star-path
!
!
*
Artifacts available at Maven Central.
#CASSANDRAEU
CASSANDRASUMMITEU