SlideShare una empresa de Scribd logo
1 de 54
Descargar para leer sin conexión
Building Your First Application with
Cassandra
Luke Tillman (@LukeTillman)
Language Evangelist at DataStax
Who are you?!
• Evangelist with a focus on the .NET Community
• Long-time .NET Developer
• Recently presented at Cassandra Summit 2014 with Microsoft
2
KillrVideo, a Video Sharing Site
• Think a YouTube competitor
– Users add videos, rate them, comment on them, etc.
– Can search for videos by tag
See the Live Demo, Get the Code
• Live demo available at http://www.killrvideo.com
– Written in C#
– Live Demo running in Azure
– Open source: https://github.com/luketillman/killrvideo-csharp
• Interesting use case because of different data modeling
challenges and the scale of something like YouTube
– More than 1 billion unique users visit YouTube each month
– 100 hours of video are uploaded to YouTube every minute
4
1 Think Before You Model
2 A Data Model for Cat Videos
3 Phase 2: Build the Application
4 Software Architecture, A Love Story
5 The Future
5
Think Before You Model
Or how to keep doing what you’re already doing
6
Getting to Know Your Data
• What things do I have in the system?
• What are the relationships between them?
• This is your conceptual data model
• You already do this in the RDBMS world
Some of the Entities and Relationships in KillrVideo
8
User
id
firstname
lastname
email
password
Video
id
name
description
location
preview_image
tags
features
Comment
comment
id
adds
timestamp
posts
timestamp
1
n
n
1
1
n
n
m
rates
rating
Getting to Know Your Queries
• What are your application’s workflows?
• How will I access the data?
• Knowing your queries in advance is NOT optional
• Different from RDBMS because I can’t just JOIN or create a new
indexes to support new queries
9
Some Application Workflows in KillrVideo
10
User Logs
into site
Show basic
information
about user
Show videos
added by a
user
Show
comments
posted by a
user
Search for a
video by tag
Show latest
videos added
to the site
Show
comments
for a video
Show ratings
for a video
Show video
and its
details
Some Queries in KillrVideo to Support Workflows
11
Users
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Comments
Show
comments
for a video
Find comments by
video (latest first)
Show
comments
posted by a
user
Find comments by
user (latest first)
Ratings
Show ratings
for a video Find ratings by video
Some Queries in KillrVideo to Support Workflows
12
Videos
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
A Data Model for Cat Videos
Because the Internet loves ‘em some cat videos
13
Just How Popular are Cats on the Internet?
14
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Just How Popular are Cats on the Internet?
15
http://mashable.com/2013/07/08/cats-bacon-rule-internet/
Data Modeling Refresher
• Cassandra limits us to queries that can scale across many nodes
– Include value for Partition Key and optionally, Clustering Column(s)
• We know our queries, so we build tables to answer them
• Denormalize at write time to do as few reads as possible
• Many times we end up with a “table per query”
– Similar to materialized views from the RDBMS world
16
Users – The Relational Way
• Single Users table with all user data and an Id Primary Key
• Add an index on email address to allow queries by email
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
Users – The Cassandra Way
User Logs
into site
Find user by email
address
Show basic
information
about user
Find user by id
CREATE TABLE user_credentials (
email text,
password text,
userid uuid,
PRIMARY KEY (email)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Videos Everywhere!
19
Show video
and its
details
Find video by id
Show videos
added by a
user
Find videos by user
(latest first)
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE user_videos (
userid uuid,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (userid,
added_date, videoid)
)
WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC);
Videos Everywhere!
Considerations When Duplicating Data
• Can the data change?
• How likely is it to change or how frequently will it change?
• Do I have all the information I need to update duplicates and
maintain consistency?
20
Search for a
video by tag Find video by tag
Show latest
videos added
to the site
Find videos by date
(latest first)
Modeling Relationships – Collection Types
• Cassandra doesn’t support JOINs, but your data will still have
relationships (and you can still model that in Cassandra)
• One tool available is CQL collection types
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
Modeling Relationships – Client Side Joins
22
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
location text,
location_type int,
preview_image_location text,
tags set<text>,
added_date timestamp,
PRIMARY KEY (videoid)
);
CREATE TABLE users (
userid uuid,
firstname text,
lastname text,
email text,
created_date timestamp,
PRIMARY KEY (userid)
);
Currently requires query for video,
followed by query for user by id based
on results of first query
Modeling Relationships – Client Side Joins
• What is the cost? Might be OK in small situations
• Do NOT scale
• Avoid when possible
23
Modeling Relationships – Client Side Joins
24
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
user_firstname text,
user_lastname text,
user_email text,
PRIMARY KEY (videoid)
);
CREATE TABLE users_by_video (
videoid uuid,
userid uuid,
firstname text,
lastname text,
email text,
PRIMARY KEY (videoid)
);
or
Modeling Relationships – Client Side Joins
• Remember the considerations when you duplicate data
• What happens if a user changes their name or email address?
• Can I update the duplicated data?
25
Cassandra Rules Can Impact Your Design
• Video Ratings – use counters to track sum of all ratings and
count of ratings
• Counters are a good example of something with special rules
CREATE TABLE videos (
videoid uuid,
userid uuid,
name text,
description text,
...
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
CREATE TABLE video_ratings (
videoid uuid,
rating_counter counter,
rating_total counter,
PRIMARY KEY (videoid)
);
Single Nodes Have Limits Too
• Latest videos are bucketed by
day
• Means all reads/writes to latest
videos are going to same
partition (and thus the same
nodes)
• Could create a hotspot
27
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (yyyymmdd,
added_date, videoid)
) WITH CLUSTERING ORDER BY (
added_date DESC,
videoid ASC
);
Single Nodes Have Limits Too
• Mitigate by adding data to the
Partition Key to spread load
• Data that’s already naturally a
part of the domain
– Latest videos by category?
• Arbitrary data, like a bucket
number
– Round robin at the app level
28
Show latest
videos added
to the site
Find videos by date
(latest first)
CREATE TABLE latest_videos (
yyyymmdd text,
bucket_number int,
added_date timestamp,
videoid uuid,
name text,
preview_image_location text,
PRIMARY KEY (
(yyyymmdd, bucket_number)
added_date, videoid)
) ...
Phase 2: Build the Application
Phase 3: Profit
29
Phase 1: Data Model
The DataStax Drivers for Cassandra
• Currently Available
– C# (.NET)
– Python
– Java
– NodeJS
– Ruby
– C++
• Will Probably Happen
– PHP
– Scala
– JDBC
• Early Discussions
– Go
– Rust
30
• Open source, Apache 2 licensed, available on GitHub
– https://github.com/datastax/
The DataStax Drivers for Cassandra
Language Bootstrapping Code
C#
Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build();
ISession session = cluster.Connect("killrvideo");
Python
from cassandra.cluster import Cluster
cluster = Cluster(contact_points=['127.0.0.1'])
session = cluster.connect('killrvideo')
Java
Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build();
Session session = cluster.connect("killrvideo");
NodeJS
var cassandra = require('cassandra-driver');
var client = new cassandra.Client({
contactPoints: ['127.0.0.1'], keyspace: 'killrvideo'
});
Use Prepared Statements
• Performance optimization for queries you run repeatedly
• Pay the cost of preparing once (causes roundtrip to Cassandra)
• KillrVideo: looking a user’s credentials up by email address
• Save and reuse the PreparedStatement instance after preparing
32
PreparedStatement prepared = session.Prepare(
"SELECT * FROM user_credentials WHERE email = ?");
Use Prepared Statements
• Bind variable values when ready to execute
• Execution only has to send variable values over the wire
• Cassandra doesn’t have to reparse the CQL string each time
• Remember: Prepare once, bind and execute many
33
BoundStatement bound = prepared.Bind("luke.tillman@datastax.com");
RowSet rows = await _session.ExecuteAsync(bound);
Batch Statements: Use and Misuse
• You can mix and match Simple/Bound statements in a batch
• Batches are Logged (atomic) by default
• Use when you want a group of mutations (statements) to all
succeed or all fail (denormalizing at write time)
• Large batches are an anti-pattern (Cassandra will warn you)
• Not a performance optimization for bulk-loading data
34
KillrVideo: Update a Video’s Name with a Batch
35
public class VideoCatalogDataAccess
{
public VideoCatalogDataAccess(ISession session)
{
_session = session;
_prepared = _session.Prepare(
"UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?");
}
public async Task UpdateVideoName(UpdateVideoDto video)
{
BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId);
var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?",
video.Name, video.VideoId);
// Use an atomic batch to send over all the mutations
var batchStatement = new BatchStatement();
batchStatement.Add(bound);
batchStatement.Add(simple);
RowSet rows = await _session.ExecuteAsync(batch);
}
}
Lightweight Transactions when you need them
• Use when you don’t want writes to step on each other
– Sometimes called Linearizable Consistency
– Similar to Serial Isolation Level from RDBMS
• Essentially a Check and Set (CAS) operation using Paxos
• Read the fine print: has a latency cost associated with it
• The canonical example: unique user accounts
36
KillrVideo: LWT to create user accounts
• Returns a column called [applied] indicating success/failure
• Different from relational world where you might expect an
Exception (i.e. PrimaryKeyViolationException or similar)
37
string cql = "INSERT INTO user_credentials (email, password, userid)" +
"VALUES (?, ?, ?) IF NOT EXISTS";
var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId);
RowSet rows = await _session.ExecuteAsync(statement);
var userInserted = rows.Single().GetValue<bool>("[applied]");
Software Architecture, A Love Story
Disclaimer: I am not paid to be a software architect
38
KillrVideo Logical Architecture
Web UI
HTML5 / JavaScript
KillrVideo MVC App
Serves up Web UI HTML and handles JSON requests from Web UI
Comments
Tracks comments on
videos by users
Uploads
Handles processing,
storing, and encoding
uploaded videos
Video Catalog
Tracks the catalog of
available videos
User Management
User accounts, login
credentials, profiles
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
DataStax
OpsCenter
Management,
provisioning, and
monitoring
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Service
Bus
Published events
from services for
interactions
Browser
Server
Services
Infrastructure
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Service
Bus
Published events
from services for
interactions
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores metadata about videos in
Cassandra (e.g. name, description,
location, thumbnail location, etc.)
Inside a Simple Service: Video Catalog
Video Catalog
Tracks the catalog of
available videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g. YouTubeVideoAdded,
UploadedVideoAccepted, etc.)
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
Azure Media
Services
Video encoding,
thumbnail
generation
Azure Service
Bus
Published events
from services for
interactions
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Cassandra
Cluster (DSE)
App data storage
for services (e.g.
users, comments)
• Stores data about uploaded video file
locations, encoding jobs, job status, etc.
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Storage
(Blob, Queue)
Video file and
thumbnail image
storage
• Stores original and re-encoded video file
assets, as well as thumbnail preview
images generated
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Media
Services
Video encoding,
thumbnail
generation
• Re-encodes uploaded videos to format
suitable for the web, generates
thumbnail image previews
Inside a More Complicated Service: Uploads
Uploads
Handles processing,
storing, and encoding
uploaded videos
Azure Service
Bus
Published events
from services for
interactions
• Publishes events about interesting things
that happen (e.g.
UploadedVideoPublished, etc.)
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
User
Management
Comments
Video
Ratings
Sample Data
Search
Statistics
Suggested
Videos
Uploads
Video
Catalog
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Event Driven Architecture
• Only the application(s)
give commands
• Decoupled: Pub-sub
messaging to tell other
parts of the system
something interesting
happened
• Services could be
deployed, scaled, and
versioned independently
(AKA microservices)
42
Azure Service
Bus
Search
Suggested
Videos
Video
Catalog
Hey, I added this
new YouTube video
to the catalog!
Time to figure
out what videos
to suggest for
that new video.
Better index that
new video so it
shows up in
search results.
The Future
In the year 3,000…
51
The Future, Conan?
Where do we go with KillrVideo from here?
• Spark or AzureML for video suggestions
• Video search via Solr
• Actors that store state in C* (Akka.NET or Orleans)
• Storing file data (thumbnails, profile pics) in C* using pithos
Questions?
54
Follow me on Twitter for updates or to ask questions later: @LukeTillman

Más contenido relacionado

Similar a Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassandra

Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with CassandraLuke Tillman
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraDataStax Academy
 
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...Jeffrey Carpenter
 
Real-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesReal-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesa2tt
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced previewPatrick McFadin
 
Microservices with Node.js and Apache Cassandra
Microservices with Node.js and Apache CassandraMicroservices with Node.js and Apache Cassandra
Microservices with Node.js and Apache CassandraJorge Bay Gondra
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Gina Montgomery, V-TSP
 
Busy Developers Guide to AngularJS (Tiberiu Covaci)
Busy Developers Guide to AngularJS (Tiberiu Covaci)Busy Developers Guide to AngularJS (Tiberiu Covaci)
Busy Developers Guide to AngularJS (Tiberiu Covaci)ITCamp
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titaniumNaga Harish M
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationC4Media
 
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SKHTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SKDavid Wesst
 
Capture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaCapture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaVito Flavio Lorusso
 
Riviera jug apicassandra
Riviera jug apicassandraRiviera jug apicassandra
Riviera jug apicassandraCédrick Lunven
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumTechday7
 
What is Windows Azure?
What is Windows Azure?What is Windows Azure?
What is Windows Azure?Lynn Langit
 
Towards Functional Programming through Hexagonal Architecture
Towards Functional Programming through Hexagonal ArchitectureTowards Functional Programming through Hexagonal Architecture
Towards Functional Programming through Hexagonal ArchitectureCodelyTV
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity frameworkMaxim Shaptala
 
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...CodelyTV
 
Zimmertwins Presentation
Zimmertwins PresentationZimmertwins Presentation
Zimmertwins PresentationAshok Modi
 

Similar a Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassandra (20)

Building your First Application with Cassandra
Building your First Application with CassandraBuilding your First Application with Cassandra
Building your First Application with Cassandra
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
 
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
Creating a Python Microservice Tier in Four Sprints with Cassandra, Kafka, an...
 
Real-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classesReal-time Code Sharing Service for one-to-many coding classes
Real-time Code Sharing Service for one-to-many coding classes
 
Cassandra 3.0 advanced preview
Cassandra 3.0 advanced previewCassandra 3.0 advanced preview
Cassandra 3.0 advanced preview
 
Microservices with Node.js and Apache Cassandra
Microservices with Node.js and Apache CassandraMicroservices with Node.js and Apache Cassandra
Microservices with Node.js and Apache Cassandra
 
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
Unleash the Power of Video Communication - Office 365 Video vs. Azure Media S...
 
Busy Developers Guide to AngularJS (Tiberiu Covaci)
Busy Developers Guide to AngularJS (Tiberiu Covaci)Busy Developers Guide to AngularJS (Tiberiu Covaci)
Busy Developers Guide to AngularJS (Tiberiu Covaci)
 
Getting started with titanium
Getting started with titaniumGetting started with titanium
Getting started with titanium
 
Crossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful DegradationCrossroads of Asynchrony and Graceful Degradation
Crossroads of Asynchrony and Graceful Degradation
 
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SKHTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
HTML5: The Parts You Care About - 4/Nov/13 - PrDC Saskatoon, SK
 
Capture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninjaCapture, record, clip, embed and play, search: video from newbie to ninja
Capture, record, clip, embed and play, search: video from newbie to ninja
 
Riviera jug apicassandra
Riviera jug apicassandraRiviera jug apicassandra
Riviera jug apicassandra
 
Getting started with Appcelerator Titanium
Getting started with Appcelerator TitaniumGetting started with Appcelerator Titanium
Getting started with Appcelerator Titanium
 
What is Windows Azure?
What is Windows Azure?What is Windows Azure?
What is Windows Azure?
 
Towards Functional Programming through Hexagonal Architecture
Towards Functional Programming through Hexagonal ArchitectureTowards Functional Programming through Hexagonal Architecture
Towards Functional Programming through Hexagonal Architecture
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
01 introduction to entity framework
01   introduction to entity framework01   introduction to entity framework
01 introduction to entity framework
 
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...
Acercándonos a la Programación Funcional a través de la Arquitectura Hexag...
 
Zimmertwins Presentation
Zimmertwins PresentationZimmertwins Presentation
Zimmertwins Presentation
 

Más de DataStax Academy

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftDataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseDataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraDataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsDataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingDataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackDataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache CassandraDataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready CassandraDataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonDataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First ClusterDataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with DseDataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraDataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseDataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and DriversDataStax Academy
 

Más de DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
 

Último

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sectoritnewsafrica
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentMahmoud Rabie
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Jeffrey Haguewood
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 

Último (20)

[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
4. Cobus Valentine- Cybersecurity Threats and Solutions for the Public Sector
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Digital Tools & AI in Career Development
Digital Tools & AI in Career DevelopmentDigital Tools & AI in Career Development
Digital Tools & AI in Career Development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
Email Marketing Automation for Bonterra Impact Management (fka Social Solutio...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 

Cassandra Day Atlanta 2015: Building Your First Application with Apache Cassandra

  • 1. Building Your First Application with Cassandra Luke Tillman (@LukeTillman) Language Evangelist at DataStax
  • 2. Who are you?! • Evangelist with a focus on the .NET Community • Long-time .NET Developer • Recently presented at Cassandra Summit 2014 with Microsoft 2
  • 3. KillrVideo, a Video Sharing Site • Think a YouTube competitor – Users add videos, rate them, comment on them, etc. – Can search for videos by tag
  • 4. See the Live Demo, Get the Code • Live demo available at http://www.killrvideo.com – Written in C# – Live Demo running in Azure – Open source: https://github.com/luketillman/killrvideo-csharp • Interesting use case because of different data modeling challenges and the scale of something like YouTube – More than 1 billion unique users visit YouTube each month – 100 hours of video are uploaded to YouTube every minute 4
  • 5. 1 Think Before You Model 2 A Data Model for Cat Videos 3 Phase 2: Build the Application 4 Software Architecture, A Love Story 5 The Future 5
  • 6. Think Before You Model Or how to keep doing what you’re already doing 6
  • 7. Getting to Know Your Data • What things do I have in the system? • What are the relationships between them? • This is your conceptual data model • You already do this in the RDBMS world
  • 8. Some of the Entities and Relationships in KillrVideo 8 User id firstname lastname email password Video id name description location preview_image tags features Comment comment id adds timestamp posts timestamp 1 n n 1 1 n n m rates rating
  • 9. Getting to Know Your Queries • What are your application’s workflows? • How will I access the data? • Knowing your queries in advance is NOT optional • Different from RDBMS because I can’t just JOIN or create a new indexes to support new queries 9
  • 10. Some Application Workflows in KillrVideo 10 User Logs into site Show basic information about user Show videos added by a user Show comments posted by a user Search for a video by tag Show latest videos added to the site Show comments for a video Show ratings for a video Show video and its details
  • 11. Some Queries in KillrVideo to Support Workflows 11 Users User Logs into site Find user by email address Show basic information about user Find user by id Comments Show comments for a video Find comments by video (latest first) Show comments posted by a user Find comments by user (latest first) Ratings Show ratings for a video Find ratings by video
  • 12. Some Queries in KillrVideo to Support Workflows 12 Videos Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first) Show video and its details Find video by id Show videos added by a user Find videos by user (latest first)
  • 13. A Data Model for Cat Videos Because the Internet loves ‘em some cat videos 13
  • 14. Just How Popular are Cats on the Internet? 14 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 15. Just How Popular are Cats on the Internet? 15 http://mashable.com/2013/07/08/cats-bacon-rule-internet/
  • 16. Data Modeling Refresher • Cassandra limits us to queries that can scale across many nodes – Include value for Partition Key and optionally, Clustering Column(s) • We know our queries, so we build tables to answer them • Denormalize at write time to do as few reads as possible • Many times we end up with a “table per query” – Similar to materialized views from the RDBMS world 16
  • 17. Users – The Relational Way • Single Users table with all user data and an Id Primary Key • Add an index on email address to allow queries by email User Logs into site Find user by email address Show basic information about user Find user by id
  • 18. Users – The Cassandra Way User Logs into site Find user by email address Show basic information about user Find user by id CREATE TABLE user_credentials ( email text, password text, userid uuid, PRIMARY KEY (email) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) );
  • 19. Videos Everywhere! 19 Show video and its details Find video by id Show videos added by a user Find videos by user (latest first) CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC);
  • 20. Videos Everywhere! Considerations When Duplicating Data • Can the data change? • How likely is it to change or how frequently will it change? • Do I have all the information I need to update duplicates and maintain consistency? 20 Search for a video by tag Find video by tag Show latest videos added to the site Find videos by date (latest first)
  • 21. Modeling Relationships – Collection Types • Cassandra doesn’t support JOINs, but your data will still have relationships (and you can still model that in Cassandra) • One tool available is CQL collection types CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) );
  • 22. Modeling Relationships – Client Side Joins 22 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, location text, location_type int, preview_image_location text, tags set<text>, added_date timestamp, PRIMARY KEY (videoid) ); CREATE TABLE users ( userid uuid, firstname text, lastname text, email text, created_date timestamp, PRIMARY KEY (userid) ); Currently requires query for video, followed by query for user by id based on results of first query
  • 23. Modeling Relationships – Client Side Joins • What is the cost? Might be OK in small situations • Do NOT scale • Avoid when possible 23
  • 24. Modeling Relationships – Client Side Joins 24 CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... user_firstname text, user_lastname text, user_email text, PRIMARY KEY (videoid) ); CREATE TABLE users_by_video ( videoid uuid, userid uuid, firstname text, lastname text, email text, PRIMARY KEY (videoid) ); or
  • 25. Modeling Relationships – Client Side Joins • Remember the considerations when you duplicate data • What happens if a user changes their name or email address? • Can I update the duplicated data? 25
  • 26. Cassandra Rules Can Impact Your Design • Video Ratings – use counters to track sum of all ratings and count of ratings • Counters are a good example of something with special rules CREATE TABLE videos ( videoid uuid, userid uuid, name text, description text, ... rating_counter counter, rating_total counter, PRIMARY KEY (videoid) ); CREATE TABLE video_ratings ( videoid uuid, rating_counter counter, rating_total counter, PRIMARY KEY (videoid) );
  • 27. Single Nodes Have Limits Too • Latest videos are bucketed by day • Means all reads/writes to latest videos are going to same partition (and thus the same nodes) • Could create a hotspot 27 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (yyyymmdd, added_date, videoid) ) WITH CLUSTERING ORDER BY ( added_date DESC, videoid ASC );
  • 28. Single Nodes Have Limits Too • Mitigate by adding data to the Partition Key to spread load • Data that’s already naturally a part of the domain – Latest videos by category? • Arbitrary data, like a bucket number – Round robin at the app level 28 Show latest videos added to the site Find videos by date (latest first) CREATE TABLE latest_videos ( yyyymmdd text, bucket_number int, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY ( (yyyymmdd, bucket_number) added_date, videoid) ) ...
  • 29. Phase 2: Build the Application Phase 3: Profit 29 Phase 1: Data Model
  • 30. The DataStax Drivers for Cassandra • Currently Available – C# (.NET) – Python – Java – NodeJS – Ruby – C++ • Will Probably Happen – PHP – Scala – JDBC • Early Discussions – Go – Rust 30 • Open source, Apache 2 licensed, available on GitHub – https://github.com/datastax/
  • 31. The DataStax Drivers for Cassandra Language Bootstrapping Code C# Cluster cluster = Cluster.Builder().AddContactPoint("127.0.0.1").Build(); ISession session = cluster.Connect("killrvideo"); Python from cassandra.cluster import Cluster cluster = Cluster(contact_points=['127.0.0.1']) session = cluster.connect('killrvideo') Java Cluster cluster = Cluster.builder().addContactPoint("127.0.0.1").build(); Session session = cluster.connect("killrvideo"); NodeJS var cassandra = require('cassandra-driver'); var client = new cassandra.Client({ contactPoints: ['127.0.0.1'], keyspace: 'killrvideo' });
  • 32. Use Prepared Statements • Performance optimization for queries you run repeatedly • Pay the cost of preparing once (causes roundtrip to Cassandra) • KillrVideo: looking a user’s credentials up by email address • Save and reuse the PreparedStatement instance after preparing 32 PreparedStatement prepared = session.Prepare( "SELECT * FROM user_credentials WHERE email = ?");
  • 33. Use Prepared Statements • Bind variable values when ready to execute • Execution only has to send variable values over the wire • Cassandra doesn’t have to reparse the CQL string each time • Remember: Prepare once, bind and execute many 33 BoundStatement bound = prepared.Bind("luke.tillman@datastax.com"); RowSet rows = await _session.ExecuteAsync(bound);
  • 34. Batch Statements: Use and Misuse • You can mix and match Simple/Bound statements in a batch • Batches are Logged (atomic) by default • Use when you want a group of mutations (statements) to all succeed or all fail (denormalizing at write time) • Large batches are an anti-pattern (Cassandra will warn you) • Not a performance optimization for bulk-loading data 34
  • 35. KillrVideo: Update a Video’s Name with a Batch 35 public class VideoCatalogDataAccess { public VideoCatalogDataAccess(ISession session) { _session = session; _prepared = _session.Prepare( "UPDATE user_videos SET name = ? WHERE userid = ? AND videoid = ?"); } public async Task UpdateVideoName(UpdateVideoDto video) { BoundStatement bound = _prepared.Bind(video.Name, video.UserId, video.VideoId); var simple = new SimpleStatement("UPDATE videos SET name = ? WHERE videoid = ?", video.Name, video.VideoId); // Use an atomic batch to send over all the mutations var batchStatement = new BatchStatement(); batchStatement.Add(bound); batchStatement.Add(simple); RowSet rows = await _session.ExecuteAsync(batch); } }
  • 36. Lightweight Transactions when you need them • Use when you don’t want writes to step on each other – Sometimes called Linearizable Consistency – Similar to Serial Isolation Level from RDBMS • Essentially a Check and Set (CAS) operation using Paxos • Read the fine print: has a latency cost associated with it • The canonical example: unique user accounts 36
  • 37. KillrVideo: LWT to create user accounts • Returns a column called [applied] indicating success/failure • Different from relational world where you might expect an Exception (i.e. PrimaryKeyViolationException or similar) 37 string cql = "INSERT INTO user_credentials (email, password, userid)" + "VALUES (?, ?, ?) IF NOT EXISTS"; var statement = new SimpleStatement(cql, user.Email, hashedPassword, user.UserId); RowSet rows = await _session.ExecuteAsync(statement); var userInserted = rows.Single().GetValue<bool>("[applied]");
  • 38. Software Architecture, A Love Story Disclaimer: I am not paid to be a software architect 38
  • 39. KillrVideo Logical Architecture Web UI HTML5 / JavaScript KillrVideo MVC App Serves up Web UI HTML and handles JSON requests from Web UI Comments Tracks comments on videos by users Uploads Handles processing, storing, and encoding uploaded videos Video Catalog Tracks the catalog of available videos User Management User accounts, login credentials, profiles Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) DataStax OpsCenter Management, provisioning, and monitoring Azure Media Services Video encoding, thumbnail generation Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Service Bus Published events from services for interactions Browser Server Services Infrastructure
  • 40. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Service Bus Published events from services for interactions
  • 41. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores metadata about videos in Cassandra (e.g. name, description, location, thumbnail location, etc.)
  • 42. Inside a Simple Service: Video Catalog Video Catalog Tracks the catalog of available videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. YouTubeVideoAdded, UploadedVideoAccepted, etc.)
  • 43. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) Azure Storage (Blob, Queue) Video file and thumbnail image storage Azure Media Services Video encoding, thumbnail generation Azure Service Bus Published events from services for interactions
  • 44. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Cassandra Cluster (DSE) App data storage for services (e.g. users, comments) • Stores data about uploaded video file locations, encoding jobs, job status, etc.
  • 45. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Storage (Blob, Queue) Video file and thumbnail image storage • Stores original and re-encoded video file assets, as well as thumbnail preview images generated
  • 46. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Media Services Video encoding, thumbnail generation • Re-encodes uploaded videos to format suitable for the web, generates thumbnail image previews
  • 47. Inside a More Complicated Service: Uploads Uploads Handles processing, storing, and encoding uploaded videos Azure Service Bus Published events from services for interactions • Publishes events about interesting things that happen (e.g. UploadedVideoPublished, etc.)
  • 48. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus User Management Comments Video Ratings Sample Data Search Statistics Suggested Videos Uploads Video Catalog
  • 49. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog!
  • 50. Event Driven Architecture • Only the application(s) give commands • Decoupled: Pub-sub messaging to tell other parts of the system something interesting happened • Services could be deployed, scaled, and versioned independently (AKA microservices) 42 Azure Service Bus Search Suggested Videos Video Catalog Hey, I added this new YouTube video to the catalog! Time to figure out what videos to suggest for that new video. Better index that new video so it shows up in search results.
  • 51. The Future In the year 3,000… 51
  • 53. Where do we go with KillrVideo from here? • Spark or AzureML for video suggestions • Video search via Solr • Actors that store state in C* (Akka.NET or Orleans) • Storing file data (thumbnails, profile pics) in C* using pithos
  • 54. Questions? 54 Follow me on Twitter for updates or to ask questions later: @LukeTillman