Learn how to model beyond traditional direct access in Apache Cassandra. Utilizing the DataStax platform to harness the power of Spark and Solr to perform search, analytics, and complex operations in place on your Cassandra data!
13. Multi-Data Center Replication
Data Center 1
hash(key) => token(43)
replication factor = 3
80
10
3050
70
60
40
20
Data Center 2
replication factor = 3
81
11
3151
71
61
41
21
Application
14. How does DSE integrate Solr?
C* C*/Solr
Transactional Search
15.
16. SELECT *
FROM killrvideo.videos
WHERE solr_query='{
"q": "{!edismax qf="name^2 tags^1
description”}datastax"
}';
SELECT id, value
FROM keyspace.table
WHERE token(id) >= -3074457345618258601
AND token(id) <= 3074457345618258603
AND solr_query='id:*'
39. Behind the scenes…
// Videos by id
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
// Index for tag keywords
CREATE TABLE videos_by_tag (
tag text,
videoid uuid,
added_date timestamp,
userid uuid,
name text,
preview_image_location text,
tagged_date timestamp,
PRIMARY KEY (tag, videoid)
);
Not a great idea
Possible Index
40. // Videos by id
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
And
this?
This?
This?
41.
42. 1) Spin up a new C* Cluster with search enabled using the DSE
installer.
$ sudo service dse cassandra -s
2) Run your schema DDL to create the C* keyspace and tables.
3) Run dse_tool on the videos table
$ dsetool create_core killrvideo.videos generateResources=true
4) Use the Solr Admin to check sanity and make sure you have a
core.
5) Write a CQL query with a Solr Search in it.
SELECT * FROM killrvideo.videos
WHERE solr_query='{ "q": "{!edismax qf="name^2 tags^1 description
”}?" }';
43. Now you get this!
SELECT name
FROM videos
WHERE solr_query = 'tags:crime*';
44. Attaching to Spark and Cassandra
// Import Cassandra-specific functions on SparkContext and RDD objects
import org.apache.spark.{SparkContext, SparkConf}
import com.datastax.spark.connector._
/** The setMaster("local") lets us run & test the job right in our IDE */
val conf = new SparkConf(true)
.set("spark.cassandra.connection.host", "127.0.0.1")
.setMaster(“local[*]")
.setAppName(getClass.getName)
// Optionally
.set("cassandra.username", "cassandra")
.set("cassandra.password", “cassandra")
val sc = new SparkContext(conf)
45. Comment table example
CREATE TABLE comments_by_video (
videoid uuid,
commentid timeuuid,
userid uuid,
comment text,
PRIMARY KEY (videoid, commentid)
) WITH CLUSTERING ORDER BY (commentid DESC);
46. Simple example
/** keyspace & table */
val tableRDD = sc.cassandraTable("killrvideo", “comments_by_video”)
/** get a simple count of all the rows in the raw_weather_data table */
val rowCount = tableRDD.count()
println(s"Total Rows in Comments Table: $rowCount")
sc.stop()
47. Simple example
/** keyspace & table */
val tableRDD = sc.cassandraTable("killrvideo", “comments_by_video”)
/** get a simple count of all the rows in the comments_by_video table */
val rowCount = tableRDD.count()
println(s"Total Rows in Comments Table: $rowCount")
sc.stop()
Executer
SELECT *
FROM killrvideo.comments_by_video
Spark RDD
Spark Partition
Spark Connector
48. Using CQL
SELECT userid
FROM comments_by_video
WHERE videoid = '01860584-de45-018f-12be-5f81704e8033'
val cqlRRD = sc.cassandraTable("killrvideo", “comments_by_video”)
.select("userid")
.where("videoid = ?”,
“01860584-de45-018f-12be-5f81704e8033")
49. spark-sql> SELECT cast(videoid as String) videoid, count(*) c
FROM comments_by_video
GROUP BY cast(videoid as String)
ORDER BY c DESC limit 10;
50. Saving back to Cassandra
// Create insert data
val collection = sc.parallelize(Seq(("01860584-de45-018f-12be-5f81704e8033", "Great video", "cdaf6bd5-8914-29e0-
f0b6-8b0bc6156777"),
("01860584-de45-018f-12be-5f81704e8033", "Hated it", "cdaf6bd5-8914-29e0-f0b6-8b0bc6156777")))
// Insert data into table
collection.saveToCassandra("killrvideo", "comments_by_video", SomeColumns("videoid", "comment", "userid"))
51.
val solrQueryRDD = sc.cassandraTable("killrvideo", “videos")
.select("name").where("solr_query='tags:crime*'")
solrQueryRDD.collect().map(row => println(row.getString("name")))