SlideShare una empresa de Scribd logo
1 de 27
Descargar para leer sin conexión
Getting to know
by Michelle Darling
mdarlingcmt@gmail.com
August 2013
Agenda:
● What is Cassandra?
● Installation, CQL3
● Data Modelling
● Summary
Only 15 min to cover these, so
please hold questions til the
end, or email me :-) and I’ll
summarize Q&A for everyone.
Unfortunately, no time for:
● DB Admin
○ Detailed Architecture
○ Partitioning /
Consistent Hashing
○ Consistency Tuning
○ Data Distribution &
Replication
○ System Tables
● App Development
○ Using Python, Ruby etc
to access Cassandra
○ Using Hadoop to
stream data into
Cassandra
What is Cassandra?
“Fortuneteller of Doom”
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.
NoSQL Distributed DB
● Consistency - A__ID
● Availability - High
● Point of Failure - none
● Good for Event
Tracking & Analysis
○ Time series data
○ Sensor device data
○ Social media analytics
○ Risk Analysis
○ Failure Prediction
Rackspace: “Which servers
are under heavy load
and are about to crash?”
The Evolution of Cassandra
2008: Open-Source Release / 2013: Enterprise & Community Editions
Data Model
● Wide rows, sparse arrays
● High performance through very
fast write throughput.
Infrastructure
● Peer-Peer Gossip
● Key-Value Pairs
● Tunable Consistency
2006
2005
● Originally for Inbox Search
● But now used for Instagram
Other NoSQL vs. Cassandra
NoSQL Taxonomy:
● Key-Value Pairs
○ Dynamo, Riak, Redis
● Column-Based
○ BigTable, HBase,
Cassandra
● Document-Based
○ MongoDB, Couchbase
● Graph
○ Neo4J
Big Data Capable
C* Differentiators:
● Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
● “Clear Winner” in
Scalability,
Performance,
Availability
-- DataStax
Architecture
● Cluster (ring)
● Nodes (circles)
● Peer-to-Peer Model
● Gossip Protocol
Partitioner:
Consistent Hashing
Netflix
Streaming Video
● Personalized
Recommendations per
family member
● Built on Amazon Web
Services (AWS) +
Cassandra
Cloud installation using
● Amazon Web Services (AWS)
● Elastic Compute Cloud (EC2)
○ Free for the 1st year! Then pay only for what you use.
○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,
● Amazon Machine Image (AMI)
○ Preconfigured installation template
○ Choose: “DataStax AMI for Cassandra
Community Edition”
○ Follow these *very good* step-by-step
instructions from DataStax.
○ AMIs also available for CouchBase, MongoDB
(make sure you pick the free tier community versions to avoid
monthly charge$$!!!).
AWS EC2 Dashboard
DataStax AMI Setup
DataStax AMI Setup
--clustername Michelle
--totalnodes 1
--version community
“Roll your Own” Installation
DataStax Community Edition
● Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/getting-
started-with-cassandra
● Video: “Set up a 4-
node Cassandra
cluster in under 2
minutes”
http://www.screenr.com/5G6
Invoke CQLSH, CREATE KEYSPACE
./bin/cqlsh
cqlsh> CREATE KEYSPACE big_data
… with strategy_class = ‘org.apache.cassandra.
locator.SimpleStrategy’
… with strategy_options:replication_factor=‘1’;
cqlsh> use big_data;
cqlsh:big_data>
Tip: Skip Thrift -- use CQL3
Thrift RPC
// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);
CQL3
- Uses cqlsh
- “SQL-like” language
- Runs on top of Thrift RPC
- Much more user-friendly.
Thrift code on left
equals this in CQL3:
INSERT INTO (id, name)
VALUES ('key',
'value');
CREATE TABLE
cqlsh:big_data> create table user_tags (
… user_id varchar,
… tag varchar,
… value counter,
… primary key (user_id, tag)
…):
● TABLE user_tags: “How many times has a user
mentioned a hashtag?”
● COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.
UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = ‘paul’ AND tag =
‘cassandra’
cqlsh:big_data> SELECT * FROM user_tags
user_id | tag | value
--------+-----------+----------
paul | cassandra | 1
DATA MODELING
A Major Paradigm Shift!
RDBMS Cassandra
Structured Data, Fixed Schema Unstructured Data, Flexible Schema
“Array of Arrays”
2D: ROW x COLUMN
“Nested Key-Value Pairs”
3D: ROW Key x COLUMN key x COLUMN values
DATABASE KEYSPACE
TABLE TABLE a.k.a COLUMN FAMILY
ROW ROW a.k.a PARTITION. Unit of replication.
COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit
of storage. Up to 2 billion columns per row.
FOREIGN KEYS, JOINS,
ACID Consistency
Referential Integrity not enforced, so A_CID.
BUT relationships represented using COLLECTIONS.
Cassandra
3D+: Nested Objects
RDBMS
2D: Rows
x columns
Example:
“Twissandra” Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
● Play with the app:
twissandra.com
● Examine & learn
from the code on
GitHub.
Features/Queries:
● Sign In, Sign Up
● Post Tweet
● Userline (User’s tweets)
● Timeline (All tweets)
● Following (Users being
followed by user)
● Followers (Users
following this user)
Twissandra.com vs Twitter.com
Twissandra - RDBMS Version
Entities
● USER, TWEET
● FOLLOWER, FOLLOWING
● FRIENDS
Relationships:
● USER has many TWEETs.
● USER is a FOLLOWER of many
USERs.
● Many USERs are FOLLOWING
USER.
Twissandra - Cassandra Version
Tip: Model tables to mirror queries.
TABLES or CFs
● TWEET
● USER, USERNAME
● FOLLOWERS, FOLLOWING
● USERLINE, TIMELINE
Notes:
● Extra tables mirror queries.
● Denormalized tables are
“pre-formed”for faster
performance.
TABLE
Tip: Remember,
Skip Thrift -- use CQL3
What does C* data look like?
TABLE Userline
“List all of user’s Tweets”
*************
Row Key: user_id
Columns
● Column Key: tweet_id
● “at” Timestamp
● TTL (Time to Live) -
seconds til expiration
date.
*************
Cassandra Data Model = LEGOs?
FlexibleSchema
Summary:
● Go straight from SQL
to CQL3; skip Thrift, Column
Families, SuperColumns, etc
● Denormalize tables to
mirror important queries.
Roughly 1 table per impt query.
● Choose wisely:
○ Partition Keys
○ Cluster Keys
○ Indexes
○ TTL
○ Counters
○ Collections
See DataStax Music Service
Example
● Consider hybrid
approach:
○ 20% - RDBMS for highly
structured, OLTP, ACID
requirements.
○ 80% - Scale Cassandra to
handle the rest of data.
Remember:
● Cheap: storage,
servers, OpenSource
software.
● Precious: User AND
Developer Happiness.
Resources
C* Summit 2013:
● Slides
● Cassandra at eBay Scale (slides)
● Data Modelers Still Have Jobs -
Adjusting For the NoSQL
Environment (Slides)
● Real-time Analytics using
Cassandra, Spark and Shark
slides
● Cassandra By Example: Data
Modelling with CQL3 Slides
● DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE
CASSANDRA slides
I wish I found these 1st:
● How do I Cassandra?
slides
● Mobile version of
DataStax web docs
(link)

Más contenido relacionado

La actualidad más candente

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & FeaturesDataStax Academy
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overviewPritamKathar
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureScyllaDB
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systemsDave Gardner
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraFolio3 Software
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcachedJurriaan Persyn
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...DataStax
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to CassandraGokhan Atil
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較Yoshiyasu SAEKI
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送Google Cloud Platform - Japan
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScyllaDB
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra Knoldus Inc.
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandraNguyen Quang
 

La actualidad más candente (20)

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
 
Unique ID generation in distributed systems
Unique ID generation in distributed systemsUnique ID generation in distributed systems
Unique ID generation in distributed systems
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Introduction to memcached
Introduction to memcachedIntroduction to memcached
Introduction to memcached
 
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
Replication and Consistency in Cassandra... What Does it All Mean? (Christoph...
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較KafkaとAWS Kinesisの比較
KafkaとAWS Kinesisの比較
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
 
Cassandra ppt 1
Cassandra ppt 1Cassandra ppt 1
Cassandra ppt 1
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送
[Cloud OnAir] Google Cloud における RDBMS の運用パターン 2020年11月19日 放送
 
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion RecordsScylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
Scylla Summit 2022: How to Migrate a Counter Table for 68 Billion Records
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 

Similar a Cassandra NoSQL Tutorial

Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUGStu Hood
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstackShivaling Sannalli
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSDataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1Ruslan Meshenberg
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfCédrick Lunven
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache CassandraStu Hood
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersAnant Corporation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databasesjbellis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introductionfardinjamshidi
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Robbie Strickland
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And BeyondRomain Hardouin
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkRed Hat Developers
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Akash Kant
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasMongoDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixRoopa Tangirala
 

Similar a Cassandra NoSQL Tutorial (20)

Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET Developers
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB Atlas
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
 

Más de Michelle Darling

Más de Michelle Darling (8)

Family pics2august014
Family pics2august014Family pics2august014
Family pics2august014
 
Final pink panthers_03_31
Final pink panthers_03_31Final pink panthers_03_31
Final pink panthers_03_31
 
Final pink panthers_03_30
Final pink panthers_03_30Final pink panthers_03_30
Final pink panthers_03_30
 
Php summary
Php summaryPhp summary
Php summary
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
 
College day pressie
College day pressieCollege day pressie
College day pressie
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
 
V3 gamingcasestudy
V3 gamingcasestudyV3 gamingcasestudy
V3 gamingcasestudy
 

Último

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...apidays
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Último (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Cassandra NoSQL Tutorial

  • 1. Getting to know by Michelle Darling mdarlingcmt@gmail.com August 2013
  • 2. Agenda: ● What is Cassandra? ● Installation, CQL3 ● Data Modelling ● Summary Only 15 min to cover these, so please hold questions til the end, or email me :-) and I’ll summarize Q&A for everyone. Unfortunately, no time for: ● DB Admin ○ Detailed Architecture ○ Partitioning / Consistent Hashing ○ Consistency Tuning ○ Data Distribution & Replication ○ System Tables ● App Development ○ Using Python, Ruby etc to access Cassandra ○ Using Hadoop to stream data into Cassandra
  • 3. What is Cassandra? “Fortuneteller of Doom” from Greek Mythology. Tried to warn others about future disasters, but no one listened. Unfortunately, she was 100% accurate. NoSQL Distributed DB ● Consistency - A__ID ● Availability - High ● Point of Failure - none ● Good for Event Tracking & Analysis ○ Time series data ○ Sensor device data ○ Social media analytics ○ Risk Analysis ○ Failure Prediction Rackspace: “Which servers are under heavy load and are about to crash?”
  • 4. The Evolution of Cassandra 2008: Open-Source Release / 2013: Enterprise & Community Editions Data Model ● Wide rows, sparse arrays ● High performance through very fast write throughput. Infrastructure ● Peer-Peer Gossip ● Key-Value Pairs ● Tunable Consistency 2006 2005 ● Originally for Inbox Search ● But now used for Instagram
  • 5. Other NoSQL vs. Cassandra NoSQL Taxonomy: ● Key-Value Pairs ○ Dynamo, Riak, Redis ● Column-Based ○ BigTable, HBase, Cassandra ● Document-Based ○ MongoDB, Couchbase ● Graph ○ Neo4J Big Data Capable C* Differentiators: ● Production-proven at Netflix, eBay, Twitter, 20 of Fortune 100 ● “Clear Winner” in Scalability, Performance, Availability -- DataStax
  • 6. Architecture ● Cluster (ring) ● Nodes (circles) ● Peer-to-Peer Model ● Gossip Protocol Partitioner: Consistent Hashing
  • 7. Netflix Streaming Video ● Personalized Recommendations per family member ● Built on Amazon Web Services (AWS) + Cassandra
  • 8. Cloud installation using ● Amazon Web Services (AWS) ● Elastic Compute Cloud (EC2) ○ Free for the 1st year! Then pay only for what you use. ○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes, ● Amazon Machine Image (AMI) ○ Preconfigured installation template ○ Choose: “DataStax AMI for Cassandra Community Edition” ○ Follow these *very good* step-by-step instructions from DataStax. ○ AMIs also available for CouchBase, MongoDB (make sure you pick the free tier community versions to avoid monthly charge$$!!!).
  • 11. DataStax AMI Setup --clustername Michelle --totalnodes 1 --version community
  • 12. “Roll your Own” Installation DataStax Community Edition ● Install instructions For Linux, Windows, MacOS: http://www.datastax.com/2012/01/getting- started-with-cassandra ● Video: “Set up a 4- node Cassandra cluster in under 2 minutes” http://www.screenr.com/5G6
  • 13. Invoke CQLSH, CREATE KEYSPACE ./bin/cqlsh cqlsh> CREATE KEYSPACE big_data … with strategy_class = ‘org.apache.cassandra. locator.SimpleStrategy’ … with strategy_options:replication_factor=‘1’; cqlsh> use big_data; cqlsh:big_data>
  • 14. Tip: Skip Thrift -- use CQL3 Thrift RPC // Your Column Column col = new Column(ByteBuffer.wrap("name". getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); // Don't ask ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); // Prepare to be amazed Mutation mutation = new Mutation(); mutation.setColumnOrSuperColumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set("Standard1", mutations); mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map); cassandra.batch_mutate(mutations_map, consistency_level); CQL3 - Uses cqlsh - “SQL-like” language - Runs on top of Thrift RPC - Much more user-friendly. Thrift code on left equals this in CQL3: INSERT INTO (id, name) VALUES ('key', 'value');
  • 15. CREATE TABLE cqlsh:big_data> create table user_tags ( … user_id varchar, … tag varchar, … value counter, … primary key (user_id, tag) …): ● TABLE user_tags: “How many times has a user mentioned a hashtag?” ● COUNTER datatype - Computes & stores counter value at the time data is written. This optimizes query performance.
  • 16. UPDATE TABLE SELECT FROM TABLE cqlsh:big_data> UPDATE user_tags SET value=value+1 WHERE user_id = ‘paul’ AND tag = ‘cassandra’ cqlsh:big_data> SELECT * FROM user_tags user_id | tag | value --------+-----------+---------- paul | cassandra | 1
  • 17. DATA MODELING A Major Paradigm Shift! RDBMS Cassandra Structured Data, Fixed Schema Unstructured Data, Flexible Schema “Array of Arrays” 2D: ROW x COLUMN “Nested Key-Value Pairs” 3D: ROW Key x COLUMN key x COLUMN values DATABASE KEYSPACE TABLE TABLE a.k.a COLUMN FAMILY ROW ROW a.k.a PARTITION. Unit of replication. COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit of storage. Up to 2 billion columns per row. FOREIGN KEYS, JOINS, ACID Consistency Referential Integrity not enforced, so A_CID. BUT relationships represented using COLLECTIONS.
  • 19. Example: “Twissandra” Web App Twitter-Inspired sample application written in Python + Cassandra. ● Play with the app: twissandra.com ● Examine & learn from the code on GitHub. Features/Queries: ● Sign In, Sign Up ● Post Tweet ● Userline (User’s tweets) ● Timeline (All tweets) ● Following (Users being followed by user) ● Followers (Users following this user)
  • 21. Twissandra - RDBMS Version Entities ● USER, TWEET ● FOLLOWER, FOLLOWING ● FRIENDS Relationships: ● USER has many TWEETs. ● USER is a FOLLOWER of many USERs. ● Many USERs are FOLLOWING USER.
  • 22. Twissandra - Cassandra Version Tip: Model tables to mirror queries. TABLES or CFs ● TWEET ● USER, USERNAME ● FOLLOWERS, FOLLOWING ● USERLINE, TIMELINE Notes: ● Extra tables mirror queries. ● Denormalized tables are “pre-formed”for faster performance.
  • 24. What does C* data look like? TABLE Userline “List all of user’s Tweets” ************* Row Key: user_id Columns ● Column Key: tweet_id ● “at” Timestamp ● TTL (Time to Live) - seconds til expiration date. *************
  • 25. Cassandra Data Model = LEGOs? FlexibleSchema
  • 26. Summary: ● Go straight from SQL to CQL3; skip Thrift, Column Families, SuperColumns, etc ● Denormalize tables to mirror important queries. Roughly 1 table per impt query. ● Choose wisely: ○ Partition Keys ○ Cluster Keys ○ Indexes ○ TTL ○ Counters ○ Collections See DataStax Music Service Example ● Consider hybrid approach: ○ 20% - RDBMS for highly structured, OLTP, ACID requirements. ○ 80% - Scale Cassandra to handle the rest of data. Remember: ● Cheap: storage, servers, OpenSource software. ● Precious: User AND Developer Happiness.
  • 27. Resources C* Summit 2013: ● Slides ● Cassandra at eBay Scale (slides) ● Data Modelers Still Have Jobs - Adjusting For the NoSQL Environment (Slides) ● Real-time Analytics using Cassandra, Spark and Shark slides ● Cassandra By Example: Data Modelling with CQL3 Slides ● DATASTAX C*OLLEGE CREDIT: DATA MODELLING FOR APACHE CASSANDRA slides I wish I found these 1st: ● How do I Cassandra? slides ● Mobile version of DataStax web docs (link)