For the last several years Cassandra has been the heavyweight in the NoSQL space. But its massive scalability was accompanied by a bare bones feature set, a substantial learning curve, and a Thrift-based RPC mechanism that left newbies bewildered by a sea of potential client libraries–all with their own fragmented semantics. Over the last year that’s all changed, culminating in the recently unveiled Cassandra 2.0. In this talk I’ll bring you up to speed on Cassandra Query Language, cursors, the new native libraries, lightweight transactions, virtual nodes, and loads of other new goodies. Whether you’re completely new to Cassandra or a seasoned veteran who wants the latest scoop, this talk has something for you.
Breaking the Kubernetes Kill Chain: Host Path Mount
Big Data Grows Up - A (re)introduction to Cassandra
1. Big Data Grows Up
A (re)introduction to Cassandra
Robbie Strickland
2. Who am I?
Robbie Strickland
Software Development Manager
The Weather Channel
rostrickland@gmail.com
@dont_use_twitter
3. Who am I?
●
●
●
●
●
Cassandra user/contributor since 2010
… it was at release 0.5 back then
4 years? Oracle DBA’s aren’t impressed
Done lots of dumb stuff with Cassandra
… and some really awesome stuff too
13. What’s still the same?
●
●
●
●
●
Still not an RDBMS
Still no joins (see above)
Still no ad-hoc queries (see above again)
Still requires a denormalized data model (^^)
Still need to know what the heck you’re
doing
15. The old way
● 1 token per node
● Assigned manually
● Adding nodes ==
reassignment of all
tokens
● Node rebuild
heavily taxes a few
nodes
A
F
B
cluster with
no vnodes
E
C
D
16. … enter Vnodes
N
A
B
C
M
L
D
cluster with
vnodes
K
J
E
F
I
H
G
● n tokens per node
● Assigned magically
● Adding nodes ==
painless
● Node rebuild
distributed across
many nodes
23. Death of a (Thrift)
Salesman
Or, how to build a killer data store
without a crappy interface
24. Reasons not to ditch Thrift
●
●
●
●
Lots of client libraries still use it
You finally got it installed
You didn’t know there was another choice
It sucks less than many alternatives
25. … in spite of all those benefits, you
really should ditch Thrift because:
● It requires your entire result set to fit into
RAM on both client and server
● The native protocol is better, faster, and
supports all the new features
● Thrift-based client libraries are always a step
behind
● It’s going away eventually
26. … and did I mention ...
It requires your entire result set
to fit into RAM
on both client and server!!!
29. Native protocol
●
●
●
●
●
●
●
It’s binary, making it lighter weight
It supports cursors (FTW!)
It supports prepared statements
Cluster awareness built-in
Either synchronous or asynchronous ops
Only supports CQL-based operations
Can be used side-by-side with Thrift
31. Native query example
val insert =
session.prepare("INSERT INTO myKsp.myTable (myKey, col1, col2) VALUES (?,?,?)")
val select = session.prepare("SELECT * FROM myKsp.myTable WHERE myKey = ?")
val cluster = Cluster.builder().addContactPoints(host1, host2, host3)
val session = cluster.connect()
session.execute(insert.bind(myKey, col1, col2))
val result = session.execute(select.bind(myKey))
32. Wait, was that SQL?!!
Or, how to make Cassandra more awesome
while simultaneously irritating early adopters
33. Introducing CQL3
●
●
●
●
●
●
●
Because the first two attempts sucked
Stands for “Cassandra Query Language”
Looks a heck of a lot like SQL
… but isn’t
Substantially lowers the learning curve
… but also makes it easier to screw up
An abstraction over the storage rows
34. Storage rows
[default@unknown] create keyspace Library;
[default@unknown] use Library;
[default@Library] create column family Books
...
with comparator=UTF8Type
...
and key_validation_class=UTF8Type
…
and default_validation_class=UTF8Type;
[default@Library] set Books['Patriot Games']['author'] = 'Tom Clancy';
[default@Library] set Books['Patriot Games']['year'] = '1987';
[default@Library] list Books;
RowKey: Patriot Games
=> (name=author, value=Tom Clancy, timestamp=1393102991499000)
=> (name=year, value=1987, timestamp=1393103015955000)
35. Storage rows - composites
[default@Library] create column family Authors
...
with key_validation_class=UTF8Type
...
and comparator='CompositeType(LongType,UTF8Type,UTF8Type)'
...
and default_validation_class=UTF8Type;
[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:publisher'] = 'Putnam';
[default@Library] set Authors['Tom Clancy']['1987:Patriot Games:ISBN'] = '0-399-13241-4';
[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:publisher'] = 'Putnam';
[default@Library] set Authors['Tom Clancy']['1993:Without Remorse:ISBN'] = '0-399-13825-0';
[default@Library] list Authors;
RowKey: Tom Clancy
=> (name=1987:Patriot Games:ISBN, value=0-399-13241-4, timestamp=1393104011458000)
=> (name=1987:Patriot Games:publisher, value=Putnam, timestamp=1393103948577000)
=> (name=1993:Without Remorse:ISBN, value=0-399-13825-0, timestamp=1393104109214000)
=> (name=1993:Without Remorse:publisher, value=Putnam, timestamp=1393104083773000)
40. Keys and Filters
●
●
●
●
●
●
Ad hoc queries are NOT supported
Query by key
Key must include all potential filter columns
Must include partition key in filter
Subsequent filters must be in order
Only last filter can be a range
41. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (title)
)
42. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (author, title)
)
43. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (author, year)
)
44. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (year, author)
)
46. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (author)
)
CREATE INDEX Books_year ON Books(year)
47. Composite Partition Keys
● PRIMARY KEY((year, author), title)
● Creates a more granular shard key
● Can be useful to make certain queries more
efficient, or to better distribute data
● Updates sharing a partition key are atomic
and isolated
48. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY ((year, author), title)
)
49. Example - Books table
CREATE TABLE Books (
title varchar,
author varchar,
year int,
PRIMARY KEY (year, author, title)
)
62. Antipattern
CREATE TABLE WorkQueue (
name varchar,
time bigint,
workItem varchar,
PRIMARY KEY (name, time)
)
… do a bunch of inserts ...
SELECT * FROM WorkQueue WHERE name='ToDo' ORDER BY time ASC;
DELETE FROM WorkQueue WHERE name=’ToDo’ AND time=[some_time]
67. Primer
●
●
●
●
●
●
Supports basic Compare-and-Set ops
Provides linearizable consistency
… aka serial isolation
Uses “Paxos light” under the hood
Still expensive -- four round trips!
For most cases quorum reads/writes will be
sufficient
68. Usage
INSERT INTO Users (login, name)
VALUES (‘rs_atl’, ‘Robbie Strickland’)
IF NOT EXISTS;
UPDATE Users
SET password=’super_secure_password’
WHERE login=’rs_atl’
IF reset_token=’some_reset_token’;