3. Our Problem
Good, bad doctors? Dead doctors?
Prescriber eligibility and remediation.
4. The World-Wide
Globally Scalable
Naughty List!
How about a Naughty and
Nice list for Santa?
1.9 billion children
That will fit in a single row!
Queries to support:
Children can login and check
their standing.
Santa can find nice children
by country, state or zip.
5.
6. Installation
As easy as…
Download
http://cassandra.apache.org/download/
Uncompress
tar -xvzf apache-cassandra-1.2.0-beta3-bin.tar.gz
Run
bin/cassandra –f
(-f puts it in foreground)
7. Configuration
conf/cassandra.yaml
start_native_transport: true // CHANGE THIS TO TRUE
commitlog_directory: /var/lib/cassandra/commitlog
conf/log4j-server.properties
log4j.appender.R.File=/var/log/cassandra/system.log
8. Data Model
Schema (a.k.a. Keyspace)
Table (a.k.a. Column Family)
Row
Have arbitrary #‟s of columns
Validator for keys (e.g. UTF8Type)
Column
Validator for values and keys
Comparator for keys (e.g. DateType or BYOC)
(http://www.youtube.com/watch?v=bKfND4woylw)
9. Distributed Architecture
Nodes form a token ring.
Nodes partition the ring by initial token
initial_token: (in cassandra.yaml)
Partitioners map row keys to tokens.
Usually randomly, to evenly distribute the data
All columns for a row are stored together on disk
in sorted order.
10. Visually
Row Hash Token/Hash Range : 0-99
Alice 50
Bob 3
Eve 15
(1-33)
11. Java Interpretation
Each table is a Distributed HashMap
Each row is a SortedMap.
Cassandra provides a massively scalable version of:
HashMap<rowKey, SortedMap<columnKey, columnValue>
Implications:
Direct row fetch is fast.
Searching a range of rows can be costly.
Searching a range of columns is cheap.
12.
13. Two Tables
Children Table
Store all the children in the world.
One row per child.
One column per attribute.
NaughtyOrNice Table
Supports the queries we anticipate
Wide-Row Strategy
14. Details of the NaughtyOrNice
List
One row per standing:country
Ensures all children in a country are grouped together
on disk.
One column per child using a compound key
Ensures the columns are sorted to support our search
at varying levels of granularity
○ e.g. All nice children in the US.
○ e.g. All naughty children in PA.
15. Visually Nice:USA
Node 1 CA:94333:johny.b.good
(1) Go to the row. CA:94333:richie.rich
(2) Get the column slice
Nice:IRL
Node 2 D:EI33:collin.oneill
Watch out for: D:EI33:owen.oneill
• Hot spotting
• Unbalanced Clusters
Nice:USA
CA:94111:bart.simpson
Node 3
CA:94222:dennis.menace
PA:18964:michael.myers
16. Our Schema
bin/cqlsh -3
CREATE KEYSPACE northpole WITH replication = {'class':'SimpleStrategy',
'replication_factor':1};
create table children ( childId varchar, firstName varchar, lastName varchar, timezone varchar,
country varchar, state varchar, zip varchar, primary key (childId ) ) WITH COMPACT STORAGE;
create table naughtyOrNiceList ( standingByZone varchar, country varchar, state varchar, zip
varchar, childId varchar, primary key (standingByZone, country, state, zip, childId) );
bin/cassandra-cli
(the “old school” interface)
17. The CQL->Data Model
Rules
First primary key becomes the rowkey.
Subsequent components of the primary key
form a composite column name.
One column is then written for each non-
primary key column.
18. CQL View
cqlsh:northpole> select * from naughtyornicelist ;
standingbycountry | state | zip | childid
-------------------+-------+-------+---------------
naughty:USA | CA | 94111 | bart.simpson
naughty:USA | CA | 94222 | dennis.menace
nice:IRL | D | EI33 | collin.oneill
nice:IRL | D | EI33 | owen.oneill
nice:USA | CA | 94333 | johny.b.good
nice:USA | CA | 94333 | richie.rich
19. CLI View
[default@northpole] list naughtyornicelist;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: naughty:USA
=> (column=CA:94111:bart.simpson:, value=, timestamp=1355168971612000)
=> (column=CA:94222:dennis.menace:, value=, timestamp=1355168971614000)
-------------------
RowKey: nice:IRL
=> (column=D:EI33:collin.oneill:, value=, timestamp=1355168971604000)
=> (column=D:EI33:owen.oneill:, value=, timestamp=1355168971601000)
-------------------
RowKey: nice:USA
=> (column=CA:94333:johny.b.good:, value=, timestamp=1355168971610000)
=> (column=CA:94333:richie.rich:, value=, timestamp=1355168971606000)
20. Data Model Implications
select * from children where childid='owen.oneill';
select * from naughtyornicelist where childid='owen.oneill';
Bad Request:
select * from naughtyornicelist where
standingbycountry='nice:IRL' and state='D' and zip='EI33'
and childid='owen.oneill';
21.
22. No, seriously. Let‟s code!
What API should we use?
Production- Potential Momentum
Readiness
Thrift 10 -1 -1
Hector 10 8 8
Astyanax 8 9 10
Kundera (JPA) 6 9 9
Pelops 7 6 7
Firebrand 8 10 8
PlayORM 5 8 7
GORA 6 9 7
CQL Driver ? ? ?
Asytanax FTW!
23. Connect
this.astyanaxContext = new AstyanaxContext.Builder()
.forCluster("ClusterName")
.forKeyspace(keyspace)
.withAstyanaxConfiguration(…)
.withConnectionPoolConfiguration(…)
.buildKeyspace(ThriftFamilyFactory.getInstance());
Specify:
Cluster Name (arbitrary identifier)
Keyspace
Node Discovery Method
Connection Pool Information
24. Write/Update
MutationBatch mutation = keyspace.prepareMutationBatch();
columnFamily = new ColumnFamily<String, String>(columnFamilyName,
StringSerializer.get(), StringSerializer.get());
mutation.withRow(columnFamily, rowKey)
.putColumn(entry.getKey(), entry.getValue(), null);
mutation.execute();
Process:
Create a mutation
Specify the Column Family with Serializers
Put your columns.
Execute
25. Composite Types
Composite (a.k.a. Compound)
public class ListEntry {
@Component(ordinal = 0)
public String state;
@Component(ordinal = 1)
public String zip;
@Component(ordinal = 2)
public String childId;
}
26. Range Builders
range = entitySerializer.buildRange()
.withPrefix(state)
.greaterThanEquals("")
.lessThanEquals("99999");
Then...
.withColumnRange(range).execute();
27.
28. CQL Collections!
http://www.datastax.com/dev/blog/cql3_collections
Set
UPDATE users SET emails = emails + {'fb@friendsofmordor.org'} WHERE
user_id = 'frodo';
List
UPDATE users SET top_places = [ 'the shire' ] + top_places WHERE
user_id = 'frodo';
Maps
UPDATE users SET todo['2012-10-2 12:10'] = 'die' WHERE user_id =
'frodo';
30. Let‟s get back to cranking…
Recreate the schema (to be CQL friendly)
UPDATE children SET toys = toys + [ „legos' ] WHERE childId = ‟owen.oneill‟;
Crank out a Dao layer to use CQL collections
operations.
31. Shameless Shoutout(s)
Virgil
https://github.com/boneill42/virgil
REST interface for Cassandra
https://github.com/boneill42/storm-cassandra
Distributed Processing on Cassandra
(Webinar in January)