SlideShare una empresa de Scribd logo
1 de 31
An Introduction to 
Apache 
assandra 
Aaron Ploetz
What is Cassandra? 
● Cassandra is a non-relational, partitioned row store. 
● Rows are organized into column families (tables) with a 
required primary key. 
● Data is distributed across multiple master-less, nodes in 
an application-transparent manner. 
● DataStax oversees the development of the Apache 
Cassandra open-source project, provides support to 
companies using Cassandra, and provides an enterprise-ready 
version of Cassandra.
$ whoami 
Aaron Ploetz 
@APloetz 
● Lead Database Engineer 
● B.S.-MCS UW-Whitewater 
● M.S.-SED Regis University 
● Using Cassandra since version 0.8 
● Contributor to the Casandra tag on StackOverflow 
● Contributor to the Apache Cassandra project 
● 2014/15 DataStax MVP for Apache Cassandra
(short) History of Cassandra and 
DataStax 
● Developed at , open sourced in 2008. 
● Design influenced by Google BigTable and Amazon Dynamo. 
● Graduated to Apache “Top-Level Project” status in Feb 2010. 
● DataStax founded by Jonathan Ellis and Matt Pfeil in late 2010, 
offering enterprise Cassandra support. 
● Secured $190 million in VC funding. 
● Started with eight people, now employs more than 350. 
● 400+ Customers, including 25 of the Fortune 100.
Key Features 
● Current release is Cassandra 2.1 (Sept 10). 
● Distributed, decentralized storage; no SPOF. 
● Scalable. 
● High-availability, Fault-tolerance. 
● Tunable Consistency. 
● High-performance. 
● Data center awareness.
Distributed, Decentralized 
Storage DC1 DC2 
● Peer-to-peer, master-less replication. 
● Any node can handle a read or write operation. 
● Supports local read/write ops via “logical” data centers. 
● Gossip protocol allows nodes to be aware of each other. 
● Snitch ensures that data is replicated appropriately.
Scalability 
● Cassandra allows you to easily add nodes to scale your 
application back-end. 
● Benchmark from 2011: 
– 48 node cluster could handle 174,373 writes/sec. 
– 288 node cluster could handle 1,099,837 writes/sec. 
– Indicates that Cassandra scales linearly. 
● Throughput of N nodes = T. 
● Throughput of Nx2 nodes = Tx2.
High Availability 
DC1 DC2 
X 
● Cassandra was designed under the premise that 
hardware failures can and do occur.
High Availability 
DC1 DC2 
X 
X 
● Cassandra was designed under the premise that 
hardware failures can and do occur.
High Availability 
DC1 DC2 
X 
X X 
X 
X 
X X 
●Gossip Protocol keeps live nodes informed of failures. 
●Cassandra 2.0.2 implemented Rapid Read Protection which 
redirects read operations to live nodes.
Tunable Consistency 
● Cassandra allows you alter your consistency level on a 
per-operation basis. 
● Also allows configuration for data center locality: 
ALL QUORUM ONE 
Strong 
Consistency 
High Availability / 
Eventual 
Consistency 
Quorum == 
(nodes / 2) + 1
Eventual Consistency != Hopeful 
Consistency 
● experiment on consistency : 
– Created two data centers with C* 1.1.7 Cluster of 48 nodes in 
each data center. 
– Wrote 1,000,000 records at CL1 in one data center. 
– Read same 1,000,000 records at CL1 in other data center. 
– All records read successfully! 
– “Eventually consistent does not mean a day, minute or 
even a second from now… in most cases, it is 
milliseconds!”- Christos Kalantzis
High Performance 
● Cassandra is optimized from the ground up for 
performance: 
Source: DataStax.com
High Performance 
● All disk writes are sequential, append-only 
operations. 
● No reading before writing. 
● Cassandra is optimized for threading with multi-core/ 
processor machines.
Potential Drawbacks? 
● Some use cases are not appropriate (transient data 
or delete-heavy patterns). 
● Developer learning curve: CQL != SQL 
● Simple queries only. No JOINs or sub-queries. 
● Optimal performance is achieved through de-normalizaiton 
and query-based data modeling.
Cassandra moves beyond disco-era 
data modeling 
●Everything MUST be normalized!!! 
●Redundant data == “bad” 
●Relational Database theory originated when disk space was expensive. In 
1975 some vendors were selling disk space at $11k per MB. 
●By 1980 prices “dropped” so that you could finally buy 1GB of storage for 
under $1 Million. 
●Today I can buy a 1TB disk for $60.
Cassandra Storage Structures 
● Keyspace == Database (in the RDBMS world) 
CREATE KEYSPACE products WITH replication = { 
'class': 'NetworkTopologyStrategy', 
'RFD': '2', 'MKE': '4'}; 
● Column Family == Table 
CREATE TABLE hierarchy ( 
category text, 
subcategory text, 
classification text, 
skumap map<uuid, text>, 
PRIMARY KEY (category, subcategory, classification));
Cassandra Primary Keys 
● Primary Keys are unique. 
● Single Primary Key: 
PRIMARY KEY (keyColumn) 
● Composite Primary Key: 
PRIMARY KEY (myPartitionKey, my1stClusteringKey, 
my2ndClusteringKey) 
● Composite Partitioning Key: 
PRIMARY KEY ((my1stPartitionKey, my2ndPartitionKey), 
myClusteringKey)
Cassandra Secondary Indexes 
● Does allow secondary indexes. 
CREATE INDEX myIndex ON myTable(myNonKeyColumn) 
● Designed for query convenience, not for performance. 
● Does not perform well on high-cardinality columns, because you filter a 
huge volume of records for a small number of results. Extremely low 
cardinality is also not a good idea (ex: customer address [state == good, 
phone == bad, gender == bad]). 
● Works best on a table having many rows that contain the indexed value; 
middle-of-the-road cardinality.
Serenity “crew” 
● Create a table to store data for the crew of “Serenity” from “Firefly.” 
CREATE TABLE crew ( 
crewname TEXT, 
firstname TEXT, 
lastname TEXT, 
phone TEXT, 
PRIMARY KEY (crewname)); 
crewname | firstname | lastname | phone 
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­Mal 
| Malcolm | Reynolds | 111­555­1234 
Jayne | Jayne | Cobb | 111­555­3464 
Sheppard | Derial | Book | 111­555­2349 
Simon | Simon | Tam | 111­555­8899
Serenity “crew” under the hood 
RowKey:Mal 
=> (column, value=, timestamp=1374546754299000) 
=> (column=firstname, value=Malcolm, timestamp=1374546754299000) 
=> (column=lastname, value=Reynolds, timestamp=1374546754299000) 
=> (column=phone, value=111­555­1234, 
timestamp=1374546754299000) 
RowKey:Jayne 
=> (column, value=, timestamp=1374546757815000) 
=> (column=firstname, value=Jayne, timestamp=1374546757815000) 
=> (column=lastname, value=Cobb, timestamp=1374546757815000) 
=> (column=phone, value=111­555­3464, 
timestamp=1374546757815000)
Serenity “crewbyphone” 
● To solve the problem of being able to query crew members by phone:” 
CREATE TABLE crewbyphone ( 
crewname TEXT, 
firstname TEXT, 
lastname TEXT, 
phone TEXT, 
PRIMARY KEY (phone,crewname)); 
crewname | firstname | lastname | phone 
­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­Mal 
| Malcolm | Reynolds | 111­555­1234 
Wash | Hoban | Washburne| 111­555­1212 
Zoey | Zoey | Washburne| 111­555­1212 
Jayne | Jayne | Cobb | 111­555­3464
Serenity “crewbyphone” under 
the hood 
RowKey:111­555­1234 
=> (column=Mal, value=, timestamp=1374546754299000) 
=> (column:Mal:firstname, value=Malcolm, timestamp=... 
=> (column:Mal:lastname, value=Reynolds, timestamp=... 
RowKey:111­555­1212 
=> (column=Wash, value=, timestamp=1374546754299000) 
=> (column=Wash:firstname, value=Hoban, timestamp=... 
=> (column=Wash:lastname, value=Washburne, timestamp=... 
=> (column=Zoey, value=, timestamp=1374546754299000) 
=> (column=Zoey:firstname, value=Zoey, timestamp=... 
=> (column=Zoey:lastname, value=Washburne, timestamp=...
Who Uses Cassandra?
Who else Uses Cassandra?
Cassandra Large Deployments 
● 100+ nodes. 250TB of data, cluster sizes vary from 6 to 32 
nodes. 
● 2,500+ nodes, 420TB of data, 4 DCs, handles 1 trillion 
operations per day. 
● 75,000+ nodes, 10s of PB of data, largest cluster 1000+ nodes.
Additional Reading 
● Amazon Dynamo paper 
● Facebook Cassandra paper 
● Harvest, Yield, and Scalable, Tolerant Systems - Brewer, Fox, 1999 
● DataStax grabs $106M to achieve big-dog status in database country 
● http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html 
● http://planetcassandra.org/blog/a-netflix-experiment-eventual-consistency-hopeful-consistency-● DataStax Documentation 
● KillrVideo.com
Getting Started 
● Community site: http://planetcassandra.org 
● http://datastax.com 
● DataStax community edition: 
http://planetcassandra.org/cassandra 
● DataStax startup program: 
http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/● Apache Cassandra project site: 
http://cassandra.apache.org/
Questions?
Demo
Want to work at AccuLynx? 
We're hiring! 
http://careers.stackoverflow.com/company/acculynx

Más contenido relacionado

La actualidad más candente

Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
Sean Murphy
 

La actualidad más candente (20)

Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
NOSQL Database: Apache Cassandra
NOSQL Database: Apache CassandraNOSQL Database: Apache Cassandra
NOSQL Database: Apache Cassandra
 
Introduction to Cassandra Basics
Introduction to Cassandra BasicsIntroduction to Cassandra Basics
Introduction to Cassandra Basics
 
Cassandra overview
Cassandra overviewCassandra overview
Cassandra overview
 
Cassandra NoSQL Tutorial
Cassandra NoSQL TutorialCassandra NoSQL Tutorial
Cassandra NoSQL Tutorial
 
Introduction to Apache Cassandra
Introduction to Apache Cassandra Introduction to Apache Cassandra
Introduction to Apache Cassandra
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
 
Cassandra
CassandraCassandra
Cassandra
 
Skip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architecturesSkip, residual and densely connected RNN architectures
Skip, residual and densely connected RNN architectures
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
Cassandra Operations at Netflix
Cassandra Operations at NetflixCassandra Operations at Netflix
Cassandra Operations at Netflix
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
Top 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark ApplicationsTop 5 Mistakes When Writing Spark Applications
Top 5 Mistakes When Writing Spark Applications
 
How Impala Works
How Impala WorksHow Impala Works
How Impala Works
 
Cassandra 101
Cassandra 101Cassandra 101
Cassandra 101
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
 
NoSQL Architecture Overview
NoSQL Architecture OverviewNoSQL Architecture Overview
NoSQL Architecture Overview
 

Similar a Intro to cassandra

From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
DataStax
 

Similar a Intro to cassandra (20)

Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan OttTrivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
Trivadis TechEvent 2016 Big Data Cassandra, wieso brauche ich das? by Jan Ott
 
Cassandra
CassandraCassandra
Cassandra
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
 
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra GuruUse Your MySQL Knowledge to Become an Instant Cassandra Guru
Use Your MySQL Knowledge to Become an Instant Cassandra Guru
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
 
Performance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. DatastaxPerformance Testing: Scylla vs. Cassandra vs. Datastax
Performance Testing: Scylla vs. Cassandra vs. Datastax
 
Pythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra ClusterPythian: My First 100 days with a Cassandra Cluster
Pythian: My First 100 days with a Cassandra Cluster
 
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
 
MySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion QueriesMySQL Cluster Scaling to a Billion Queries
MySQL Cluster Scaling to a Billion Queries
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Cassandra at Pollfish
Cassandra at PollfishCassandra at Pollfish
Cassandra at Pollfish
 
Micro-batching: High-performance writes
Micro-batching: High-performance writesMicro-batching: High-performance writes
Micro-batching: High-performance writes
 
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
Micro-batching: High-performance Writes (Adam Zegelin, Instaclustr) | Cassand...
 
Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra Big data analytics with Spark & Cassandra
Big data analytics with Spark & Cassandra
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
 
Apache cassandra
Apache cassandraApache cassandra
Apache cassandra
 

Último

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Último (20)

%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open SourceWSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
WSO2CON 2024 - Freedom First—Unleashing Developer Potential with Open Source
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
WSO2CON 2024 - Building the API First Enterprise – Running an API Program, fr...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 

Intro to cassandra

  • 1. An Introduction to Apache assandra Aaron Ploetz
  • 2. What is Cassandra? ● Cassandra is a non-relational, partitioned row store. ● Rows are organized into column families (tables) with a required primary key. ● Data is distributed across multiple master-less, nodes in an application-transparent manner. ● DataStax oversees the development of the Apache Cassandra open-source project, provides support to companies using Cassandra, and provides an enterprise-ready version of Cassandra.
  • 3. $ whoami Aaron Ploetz @APloetz ● Lead Database Engineer ● B.S.-MCS UW-Whitewater ● M.S.-SED Regis University ● Using Cassandra since version 0.8 ● Contributor to the Casandra tag on StackOverflow ● Contributor to the Apache Cassandra project ● 2014/15 DataStax MVP for Apache Cassandra
  • 4. (short) History of Cassandra and DataStax ● Developed at , open sourced in 2008. ● Design influenced by Google BigTable and Amazon Dynamo. ● Graduated to Apache “Top-Level Project” status in Feb 2010. ● DataStax founded by Jonathan Ellis and Matt Pfeil in late 2010, offering enterprise Cassandra support. ● Secured $190 million in VC funding. ● Started with eight people, now employs more than 350. ● 400+ Customers, including 25 of the Fortune 100.
  • 5. Key Features ● Current release is Cassandra 2.1 (Sept 10). ● Distributed, decentralized storage; no SPOF. ● Scalable. ● High-availability, Fault-tolerance. ● Tunable Consistency. ● High-performance. ● Data center awareness.
  • 6. Distributed, Decentralized Storage DC1 DC2 ● Peer-to-peer, master-less replication. ● Any node can handle a read or write operation. ● Supports local read/write ops via “logical” data centers. ● Gossip protocol allows nodes to be aware of each other. ● Snitch ensures that data is replicated appropriately.
  • 7. Scalability ● Cassandra allows you to easily add nodes to scale your application back-end. ● Benchmark from 2011: – 48 node cluster could handle 174,373 writes/sec. – 288 node cluster could handle 1,099,837 writes/sec. – Indicates that Cassandra scales linearly. ● Throughput of N nodes = T. ● Throughput of Nx2 nodes = Tx2.
  • 8. High Availability DC1 DC2 X ● Cassandra was designed under the premise that hardware failures can and do occur.
  • 9. High Availability DC1 DC2 X X ● Cassandra was designed under the premise that hardware failures can and do occur.
  • 10. High Availability DC1 DC2 X X X X X X X ●Gossip Protocol keeps live nodes informed of failures. ●Cassandra 2.0.2 implemented Rapid Read Protection which redirects read operations to live nodes.
  • 11. Tunable Consistency ● Cassandra allows you alter your consistency level on a per-operation basis. ● Also allows configuration for data center locality: ALL QUORUM ONE Strong Consistency High Availability / Eventual Consistency Quorum == (nodes / 2) + 1
  • 12. Eventual Consistency != Hopeful Consistency ● experiment on consistency : – Created two data centers with C* 1.1.7 Cluster of 48 nodes in each data center. – Wrote 1,000,000 records at CL1 in one data center. – Read same 1,000,000 records at CL1 in other data center. – All records read successfully! – “Eventually consistent does not mean a day, minute or even a second from now… in most cases, it is milliseconds!”- Christos Kalantzis
  • 13. High Performance ● Cassandra is optimized from the ground up for performance: Source: DataStax.com
  • 14. High Performance ● All disk writes are sequential, append-only operations. ● No reading before writing. ● Cassandra is optimized for threading with multi-core/ processor machines.
  • 15. Potential Drawbacks? ● Some use cases are not appropriate (transient data or delete-heavy patterns). ● Developer learning curve: CQL != SQL ● Simple queries only. No JOINs or sub-queries. ● Optimal performance is achieved through de-normalizaiton and query-based data modeling.
  • 16. Cassandra moves beyond disco-era data modeling ●Everything MUST be normalized!!! ●Redundant data == “bad” ●Relational Database theory originated when disk space was expensive. In 1975 some vendors were selling disk space at $11k per MB. ●By 1980 prices “dropped” so that you could finally buy 1GB of storage for under $1 Million. ●Today I can buy a 1TB disk for $60.
  • 17. Cassandra Storage Structures ● Keyspace == Database (in the RDBMS world) CREATE KEYSPACE products WITH replication = { 'class': 'NetworkTopologyStrategy', 'RFD': '2', 'MKE': '4'}; ● Column Family == Table CREATE TABLE hierarchy ( category text, subcategory text, classification text, skumap map<uuid, text>, PRIMARY KEY (category, subcategory, classification));
  • 18. Cassandra Primary Keys ● Primary Keys are unique. ● Single Primary Key: PRIMARY KEY (keyColumn) ● Composite Primary Key: PRIMARY KEY (myPartitionKey, my1stClusteringKey, my2ndClusteringKey) ● Composite Partitioning Key: PRIMARY KEY ((my1stPartitionKey, my2ndPartitionKey), myClusteringKey)
  • 19. Cassandra Secondary Indexes ● Does allow secondary indexes. CREATE INDEX myIndex ON myTable(myNonKeyColumn) ● Designed for query convenience, not for performance. ● Does not perform well on high-cardinality columns, because you filter a huge volume of records for a small number of results. Extremely low cardinality is also not a good idea (ex: customer address [state == good, phone == bad, gender == bad]). ● Works best on a table having many rows that contain the indexed value; middle-of-the-road cardinality.
  • 20. Serenity “crew” ● Create a table to store data for the crew of “Serenity” from “Firefly.” CREATE TABLE crew ( crewname TEXT, firstname TEXT, lastname TEXT, phone TEXT, PRIMARY KEY (crewname)); crewname | firstname | lastname | phone ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­Mal | Malcolm | Reynolds | 111­555­1234 Jayne | Jayne | Cobb | 111­555­3464 Sheppard | Derial | Book | 111­555­2349 Simon | Simon | Tam | 111­555­8899
  • 21. Serenity “crew” under the hood RowKey:Mal => (column, value=, timestamp=1374546754299000) => (column=firstname, value=Malcolm, timestamp=1374546754299000) => (column=lastname, value=Reynolds, timestamp=1374546754299000) => (column=phone, value=111­555­1234, timestamp=1374546754299000) RowKey:Jayne => (column, value=, timestamp=1374546757815000) => (column=firstname, value=Jayne, timestamp=1374546757815000) => (column=lastname, value=Cobb, timestamp=1374546757815000) => (column=phone, value=111­555­3464, timestamp=1374546757815000)
  • 22. Serenity “crewbyphone” ● To solve the problem of being able to query crew members by phone:” CREATE TABLE crewbyphone ( crewname TEXT, firstname TEXT, lastname TEXT, phone TEXT, PRIMARY KEY (phone,crewname)); crewname | firstname | lastname | phone ­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­­Mal | Malcolm | Reynolds | 111­555­1234 Wash | Hoban | Washburne| 111­555­1212 Zoey | Zoey | Washburne| 111­555­1212 Jayne | Jayne | Cobb | 111­555­3464
  • 23. Serenity “crewbyphone” under the hood RowKey:111­555­1234 => (column=Mal, value=, timestamp=1374546754299000) => (column:Mal:firstname, value=Malcolm, timestamp=... => (column:Mal:lastname, value=Reynolds, timestamp=... RowKey:111­555­1212 => (column=Wash, value=, timestamp=1374546754299000) => (column=Wash:firstname, value=Hoban, timestamp=... => (column=Wash:lastname, value=Washburne, timestamp=... => (column=Zoey, value=, timestamp=1374546754299000) => (column=Zoey:firstname, value=Zoey, timestamp=... => (column=Zoey:lastname, value=Washburne, timestamp=...
  • 25. Who else Uses Cassandra?
  • 26. Cassandra Large Deployments ● 100+ nodes. 250TB of data, cluster sizes vary from 6 to 32 nodes. ● 2,500+ nodes, 420TB of data, 4 DCs, handles 1 trillion operations per day. ● 75,000+ nodes, 10s of PB of data, largest cluster 1000+ nodes.
  • 27. Additional Reading ● Amazon Dynamo paper ● Facebook Cassandra paper ● Harvest, Yield, and Scalable, Tolerant Systems - Brewer, Fox, 1999 ● DataStax grabs $106M to achieve big-dog status in database country ● http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html ● http://planetcassandra.org/blog/a-netflix-experiment-eventual-consistency-hopeful-consistency-● DataStax Documentation ● KillrVideo.com
  • 28. Getting Started ● Community site: http://planetcassandra.org ● http://datastax.com ● DataStax community edition: http://planetcassandra.org/cassandra ● DataStax startup program: http://www.datastax.com/what-we-offer/products-services/datastax-enterprise/● Apache Cassandra project site: http://cassandra.apache.org/
  • 30. Demo
  • 31. Want to work at AccuLynx? We're hiring! http://careers.stackoverflow.com/company/acculynx