2. • Cassandra -- What makes it different?
• Who’s using it, and for what?
• DIY Real Time Analytics on Cassandra
• The Easy Option -- Acunu Analytics
2
20. eg: “show me the number of mentions
of ‘Acunu’ per day, between May and
November 2011, on Twitter”
Batch (Hadoop) approach would
require processing ~30 billion
tweets, or ~4.2 TB of data
http://blog.twitter.com/2011/03/numbers.html
12
21. eg: “show me the number of mentions
of ‘Acunu’ per day, between May and
November 2011, on Twitter”
Batch (Hadoop) approach would
require processing ~30 billion
tweets, or ~4.2 TB of data
http://blog.twitter.com/2011/03/numbers.html
Cassandra approach:
For each tweet,
increment a bunch of counters,
such that answering a query
is as easy as reading some counters
12
22. 12:32:15 I like #trafficlights
12:33:43 Nobody expects...
12:33:49 I ate a #bee; woe is...
12:34:04 Man, @acunu rocks!
[1234, man] +1
[1234, acunu] +1
[1234, rock] +1
13
Analytics
23. 12:32:15 I like #trafficlights
12:33:43 Nobody expects...
12:33:49 I ate a #bee; woe is...
12:34:04 Man, @acunu rocks!
[1234, man] +1
[1234, acunu] +1
[1234, rock] +1
Key 00:01 00:02 ...
[01/05/11, acunu] 3 5 ...
[02/05/11, acunu] 12 4 ...
... ... ...
Row key is ‘big’ time Column key is ‘small’
bucket time bucket
13
Analytics
24. Solution Con
Scalability
$$$
Not real time
Spartan query semantics:
complex, DIY solutions
14
25. Acunu Analytics
High Velocity As events are ingested:
■ Update real time views
Event Streams
HTTP JSON, MQ, flume ■ Refresh dashboards
■ Preserve original event data
0101
01 0 1 000
10101101 0001110011010
1
10
0 1 1
01
011011 0
01 01
01 010
1
10
1
10
01 010 10
110
10
010 101 00
0
01
0 01011
0
01
10
0
10
0
11
11
10
10
10
10
01
1
10
0
10
01
01
10
10
01
10 1
10 10 010
10
0101 0
1010101 01011010
0
00 00 1
1 10 100101010101101
10
10 10
101
01101001
Dashboards and API
Provide definitions and real time views:
Via the RESTful HTTP API, command line tools, or the UI query builder
deliver pre-computed
results:
create table foo (
x long, ■ Roll-ups
y string, ■ Drilldowns
t time(hour, min),
z path('/')
■ Trends
);
create view select sum(x) from foo where y group by z;
create view select count from foo where x, t group by t;
15
38. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
21
Analytics
39. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
US all→354 user01→4 user04→8 user56→17 ...
...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
21
Analytics
40. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
22
Analytics
41. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
∅ all→87315 UK→239 US→354 ...
22
Analytics
42. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
23
Analytics
43. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
23
Analytics
44. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
group all by geo
24
Analytics
45. where time 21:00-22:00
count(*)
21:00 all→1345 :00→45 :01→62 :02→87 ...
where time 22:00-23:00, 22:00 all→3222 :00→22 :01→19 :02→105 ...
group by minute ... ...
UK all→229 user01→2 user14→12 user99→7 ...
where geography=UK US all→354 user01→4 user04→8 user56→17 ...
group all by user, ...
UK, 22:00 all→1905 ...
count all ∅ all→87315 UK→239 US→354 ...
group all by geo
24
Analytics
46. DRILLDOWN TO
APPROXIMATE AGGREGATES ORIGINAL EVENTS
Identify the root causes of
Fast probabilistic data structures for aggregate results
COUNT UNIQUE, TOP n to trade
accuracy for performance - predictably TRENDING AND
CORRELATION
Proactively identify
k deviation from baseline
and breaks from trends
Accuracy Performance
HIERARCHICAL
AGGREGATES
Automatic handling of
paths, timestamps and
geospatial queries
25
48. Shameless plug
REAL-TIME BIG DATA ANALYTICS,
POWERED BY NOSQL
■ Roll-up and transform cubes in real time
■ Leverage NoSQL for write-optimization,
DASHBOARDS UI, JSON APIs schema freedom, and horizontal scalability
ACUNU ANALYTICS CASSANDRA ENHANCED FOR
OPS HIGHER DENSITY, LOWER TCO
OPS
ENHANCED CASSANDRA UI ■ Enhanced Cassandra for higher density,
UI
better scalability, simpler management
CASTLE: STORAGE ENGINE ■ ‘Single pane of glass‘ management UI
COMMODITY HW OR CLOUD STORAGE CRAFTED FOR BIG DATA
■ In-kernel storage engine designed and
optimised for NoSQL databases
27
50. THANK YOU @acunu
@timmoreton
Apache, Apache Cassandra, Cassandra, Hadoop, and the eye and elephant logos
are trademarks of the Apache Software Foundation. 29