SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Cassandra
Introduction & Key Features
Meetup Vienna Cassandra Users
13th of January 2014
philipp.potisk@geroba.com
Definition
Apache Cassandra is an open source, distributed,
decentralized, elastically scalable, highly available,
fault-tolerant, tuneably consistent, column-oriented
database that bases its distribution design on Amazon’s
Dynamo and its data model on Google’s Bigtable.
Created at Facebook, it is now used at some of the most
popular sites on the Web [The Definitive Guide, Eben
Hewitt, 2010]
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

2
History
Dynamo, 2007

Bigtable, 2006

OpenSource, 2008

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3
Key Features

Distributed
and
Decentralized
High Performance

CQL – A SQL
like query
interface

Elastic
Scalability

Cassandra

Columnoriented
Key-Value
store
13/01/2014

High
Availability
and Fault
Tolerance

Tuneable
Consistency

Cassandra Introduction & Key Features by Philipp Potisk

4
Distributed and Decentralized
Datacenter 1

• Distributed: Capable of running
on multiple machines
• Decentralized: No single point of
failure
No master-slave issues due to
peer-to-peer architecture
(protocol "gossip")
Single Cassandra cluster may run
across geographically dispersed
data centers
13/01/2014

Datacenter 2

1

7

6

2

5

3

4

12

8

11

9
10

Read- and writerequests to any node

Cassandra Introduction & Key Features by Philipp Potisk

5
Elastic Scalability

1
8

1

• Cassandra scales horizontally,
adding more machines that have
all or some of the data on
• Adding of nodes increase
performance throughput linearly
• De-/ and increasing the
nodecount happen seamlessly

4 Performance
2
throughput = N
3

2

Performance
throughput = N x 2

7

4

6
5

Linearly scales to
terabytes and
petabytes of data
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

3

6
Scaling Benchmark By Netflix*
48, 96, 144 and 288
instances, with 10, 20,
30 and 60 clients
respectively. Each client
generated ~20.000w/s
having 400byte in size

Cassandra scales linearly far
beyond our current capacity
requirements, and very
rapid deployment
automation makes it easy to
manage. In particular,
benchmarking in the cloud
is fast, cheap and scalable,

*http://techblog.netflix.com/201
1/11/benchmarking-cassandrascalability-on.html
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

7
High Availability and Fault Tolerance
• High Availability?
Multiple networked computers
operating in a cluster
Facility for recognizing node
failures
Forward failing over requests to
another part of the system

1
6

2

5

3
4

• Cassandra has High Availability

No single point of failure
due to the peer-to-peer
architecture
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

8
Tunable Consistency
• Choose between strong and eventual
consistency
• Adjustable for read- and writeoperations separately
• Conflicts are solved during reads, as
focus lies on write-performance

TUNABLE

Available

Consistency

Use case dependent
level of consistency
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

9
When do we have strong consistency?
• Simple Formula:

jsmith

(nodes_written + nodes_read) >
replication_factor
jsmith

t1
t2

NW: 2
NR: 2
RF: 3

t1
t2

jsmith

t1

• Ensures that a read always
reflects the most recent write
• If not: Weak consistency
 Eventually consistent
jsmith

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

t2
10
Column-oriented Key-Value Store
Row Key1

Column
Key1
Column
Value1

Column
Key2
Column
Value2

Column
Key3
Column
Value3

…
…

…

• Data is stored in sparse
multidimensional hash tables
• A row can have multiple columns –
not necessarily the same amount of
columns for each row
• Each row has a unique key, which
also determines partitioning
• No relations!

Stored sorted by row key *

Stored sorted by column key/value

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>
* Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

11
CQL – An SQL-like query interface
• “CQL 3 is the default and primary interface into the Cassandra DBMS” *
• Familiar SQL-like syntax that maps to Cassandras storage engine and
simplifies data modelling
CRETE TABLE songs (
id uuid PRIMARY KEY,
title text,
album text,
artist text,
data blob,
tags set<text>
);

INSERT INTO songs
(id, title, artist,
album, tags)
VALUES(
'a3e64f8f...',
'La Grange',
'ZZ Top',
'Tres Hombres'‚
{'cool', 'hot'});

SELECT *
FROM songs
WHERE id = 'a3e64f8f...';

“SQL-like” but NOT
relational SQL

* http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

12
High Performance
• Optimized from the ground up
for high throughput
• All disk writes are sequential,
append only operations
• No reading before writing
• Cassandra`s threading-concept is
optimized for running on
multiprocessor/ multicore
machines
13/01/2014

Optimized for writing,
but fast reads are
possible as well

Cassandra Introduction & Key Features by Philipp Potisk

13
Benchmark from 2011 (Cassandra 0.7.4)*
ops
Cassandra showed
outstanding throughput in
“INSERT-only” with 20,000
ops

Insert: Enter 50 million 1K-sized records
Read: Search key for a one hour period + optional update
Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

*NoSql Benchmarking by Curbit
http://www.cubrid.org/blog/de
v-platform/nosqlbenchmarking/
14
Benchmark from 2013 (Cassandra 1.1.6)*

* Benchmarking Top NoSQL Databases by End Point Corporation,
http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf
Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

15
When do we need these features?
Lots of
Writes,
Statistics, and
Analysis

Geographical
Distribution

Large
Deployments

13/01/2014

Evolving
Applications

Cassandra Introduction & Key Features by Philipp Potisk

16
Who is using Cassandra?

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

17
ebay Data Infrastructure*
•
•
•
•
•
•

Thousands of nodes
> 2K sharded logical host
> 16K tables
> 27K indexes
> 140 billion SQLs/day
> 5 PB provisioned

• 10+ clusters
• 100+ nodes
• > 250 TB provisioned
(local HDD + shared SSD)
• > 9 billion writes/day
• > 5 billion reads/day

• Hundreds of nodes
• Persistent & in-memory
• > 40 billion SQLs/day

Not replacing RDMBS but
complementing!

Hundreds of nodes
> 50 TB
> 2 billion ops/day

• Thousands of nodes
• The world largest cluster
with 2K+ nodes

*by Jay Patel, Cassandra Summit June 2013 San Francisco
13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

18
Cassandra Use Case at Ebay
Application/Use Case
• Time-series data and real-time insights
• Fraud detection & prevention
• Quality Click Pricing for affiliates
• Order & Shipment Tracking
•…
• Server metrics collection
• Taste graph-based next-gen recommendation
system
• Social Signals on eBay Product & Item pages
13/01/2014

Why Cassandra?
• Multi-Datacenter (active-active)
• No SPOF
• Easy to scale
• Write performance
• Distributed Counters

Cassandra Introduction & Key Features by Philipp Potisk

19
Cassandra/Hadoop Deployment

13/01/2014

Cassandra Introduction & Key Features by Philipp Potisk

20
Summary
• History
• Key features of Cassandra
•
•
•
•
•
•
•

Distributed and Decentralized
Elastic Scalability
High Availability and Fault Tolerance
Tunable Consistency
Column-oriented key-value store
CQL interface
High Performance

• Ebay Use Case
13/01/2014

Apache project: http://cassandra.apache.org

Community portal: http://planetcassandra.org

Documentation: http://www.datastax.com/docs

Cassandra Introduction & Key Features by Philipp Potisk

21

Más contenido relacionado

La actualidad más candente

Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Kai Sasaki
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyDataStax Academy
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using KafkaAkash Vacher
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafkapunesparkmeetup
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBasedave_revell
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Data Con LA
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using FabricKarthik .P.R
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpNathan Handler
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformDataStax Academy
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to SparkSky Yin
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talkDataStax Academy
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesShivji Kumar Jha
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMaris Elsins
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidVenu Ryali
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of MillionsErik Onnen
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseAlexander Lukyanchikov
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)Karthik .P.R
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL ServerLynn Langit
 

La actualidad más candente (20)

Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0Managing multi tenant resource toward Hive 2.0
Managing multi tenant resource toward Hive 2.0
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPAS
 
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of DutyCassandra Summit 2014: Deploying Cassandra for Call of Duty
Cassandra Summit 2014: Deploying Cassandra for Call of Duty
 
Change Data Capture using Kafka
Change Data Capture using KafkaChange Data Capture using Kafka
Change Data Capture using Kafka
 
Spark streaming with apache kafka
Spark streaming with apache kafkaSpark streaming with apache kafka
Spark streaming with apache kafka
 
Near-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBaseNear-realtime analytics with Kafka and HBase
Near-realtime analytics with Kafka and HBase
 
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
Big Data Day LA 2016/ NoSQL track - Apache Kudu: Fast Analytics on Fast Data,...
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
PaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at YelpPaaSTA: Autoscaling at Yelp
PaaSTA: Autoscaling at Yelp
 
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE PlatformLarge Scale Data Analytics with Spark and Cassandra on the DSE Platform
Large Scale Data Analytics with Spark and Cassandra on the DSE Platform
 
Migration from Redshift to Spark
Migration from Redshift to SparkMigration from Redshift to Spark
Migration from Redshift to Spark
 
Make 2016 your year of SMACK talk
Make 2016 your year of SMACK talkMake 2016 your year of SMACK talk
Make 2016 your year of SMACK talk
 
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use casesApache Con 2021 : Apache Bookkeeper Key Value Store and use cases
Apache Con 2021 : Apache Bookkeeper Key Value Store and use cases
 
Mining AWR V2 - Trend Analysis
Mining AWR V2 - Trend AnalysisMining AWR V2 - Trend Analysis
Mining AWR V2 - Trend Analysis
 
Real time dashboards with Kafka and Druid
Real time dashboards with Kafka and DruidReal time dashboards with Kafka and Druid
Real time dashboards with Kafka and Druid
 
From 100s to 100s of Millions
From 100s to 100s of MillionsFrom 100s to 100s of Millions
From 100s to 100s of Millions
 
Kafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backboneKafka - Linkedin's messaging backbone
Kafka - Linkedin's messaging backbone
 
Actor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java EnterpriseActor-based concurrency in a modern Java Enterprise
Actor-based concurrency in a modern Java Enterprise
 
MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)MySQL Query Optimization (Basics)
MySQL Query Optimization (Basics)
 
Redis Labs and SQL Server
Redis Labs and SQL ServerRedis Labs and SQL Server
Redis Labs and SQL Server
 

Destacado

System Center 2012 - January Licensing Update
System Center 2012 - January Licensing UpdateSystem Center 2012 - January Licensing Update
System Center 2012 - January Licensing UpdateSoftchoice Corporation
 
SQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni ÖzelliklerSQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni Özelliklerturgaysahtiyan
 
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 ChangesSoftchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 ChangesSoftchoice Corporation
 
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...Softchoice Corporation
 
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter ServerNordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter ServerAndrea Mauro
 
Limewood Event - VMware
Limewood Event - VMware Limewood Event - VMware
Limewood Event - VMware BlueChipICT
 
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOLVMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOLgguglie
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...Findwise
 
Site Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturaleSite Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturalegguglie
 
SQL Server Performans İpuçları
SQL Server Performans İpuçlarıSQL Server Performans İpuçları
SQL Server Performans İpuçlarıturgaysahtiyan
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochranedotCloud
 
vCenter and ESXi network port communications
vCenter and ESXi network port communicationsvCenter and ESXi network port communications
vCenter and ESXi network port communicationsAnimesh Dixit
 
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive AdvantageVirtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive AdvantageSoftchoice Corporation
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld
 
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...Vinh Nguyen
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraMarkus Lanthaler
 
Getting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMSGetting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMSSoftchoice Corporation
 
How to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 secondsHow to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 secondsPositive Hack Days
 
InfoGrid Core Ideas
InfoGrid Core IdeasInfoGrid Core Ideas
InfoGrid Core IdeasInfoGrid.org
 

Destacado (20)

System Center 2012 - January Licensing Update
System Center 2012 - January Licensing UpdateSystem Center 2012 - January Licensing Update
System Center 2012 - January Licensing Update
 
SQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni ÖzelliklerSQL Server 2012 ile Gelen Yeni Özellikler
SQL Server 2012 ile Gelen Yeni Özellikler
 
Softchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 ChangesSoftchoice Webinar Series: VMware vSphere 5.1 Changes
Softchoice Webinar Series: VMware vSphere 5.1 Changes
 
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
You voiced your concerns. VMware listened: Major Adjustments to vSphere 5 lic...
 
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter ServerNordic VMUG User Conference 2014 - Design VMware vCenter Server
Nordic VMUG User Conference 2014 - Design VMware vCenter Server
 
Limewood Event - VMware
Limewood Event - VMware Limewood Event - VMware
Limewood Event - VMware
 
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOLVMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
VMUGIT Meeting Pisa 2015 - SDS secondo VMware: VSAN e VVOL
 
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...Findability Day 2015   Mattias Ellison - Findwise - Enterprise Search and fin...
Findability Day 2015 Mattias Ellison - Findwise - Enterprise Search and fin...
 
Site Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturaleSite Recovery Manager - Una visione architetturale
Site Recovery Manager - Una visione architetturale
 
SQL Server Performans İpuçları
SQL Server Performans İpuçlarıSQL Server Performans İpuçları
SQL Server Performans İpuçları
 
Docker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken CochraneDocker at Djangocon 2013 | Talk by Ken Cochrane
Docker at Djangocon 2013 | Talk by Ken Cochrane
 
vCenter and ESXi network port communications
vCenter and ESXi network port communicationsvCenter and ESXi network port communications
vCenter and ESXi network port communications
 
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive AdvantageVirtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
Virtual Space Race: How IT with The Right Stuff Creates a Competitive Advantage
 
VMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere ReplicationVMworld 2014: Site Recovery Manager and vSphere Replication
VMworld 2014: Site Recovery Manager and vSphere Replication
 
Working Hard or Hardly Networked?
Working Hard or Hardly Networked?Working Hard or Hardly Networked?
Working Hard or Hardly Networked?
 
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
vmware_site_recovery_manager_and_net_app_fas_v-series_se_technical_presentati...
 
Creating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with HydraCreating 3rd Generation Web APIs with Hydra
Creating 3rd Generation Web APIs with Hydra
 
Getting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMSGetting secure in a mobile-first world with EMS
Getting secure in a mobile-first world with EMS
 
How to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 secondsHow to hack VMware vCenter server in 60 seconds
How to hack VMware vCenter server in 60 seconds
 
InfoGrid Core Ideas
InfoGrid Core IdeasInfoGrid Core Ideas
InfoGrid Core Ideas
 

Similar a Cassandra Intro & Key Features

5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency DatabaseScyllaDB
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...Simon Ambridge
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformMaris Elsins
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra nehabsairam
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Fwdays
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical dataOleksandr Semenov
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureEmiliano Fusaglia
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Swiss Data Forum Swiss Data Forum
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandraBrian Enochson
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_finalSergioBruno21
 
Cassandra
Cassandra Cassandra
Cassandra Pooja GV
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkDataStax Academy
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkEvan Chan
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideMohammed Fazuluddin
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkEvan Chan
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraVictor Coustenoble
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun JeongSpark Summit
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsYousun Jeong
 

Similar a Cassandra Intro & Key Features (20)

NoSQL_Night
NoSQL_NightNoSQL_Night
NoSQL_Night
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
 
5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database5 Factors When Selecting a High Performance, Low Latency Database
5 Factors When Selecting a High Performance, Low Latency Database
 
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
Sa introduction to big data pipelining with cassandra &amp; spark   west mins...Sa introduction to big data pipelining with cassandra &amp; spark   west mins...
Sa introduction to big data pipelining with cassandra &amp; spark west mins...
 
Database as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance PlatformDatabase as a Service on the Oracle Database Appliance Platform
Database as a Service on the Oracle Database Appliance Platform
 
Appache Cassandra
Appache Cassandra  Appache Cassandra
Appache Cassandra
 
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
Виталий Бондаренко "Fast Data Platform for Real-Time Analytics. Architecture ...
 
Cassandra for mission critical data
Cassandra for mission critical dataCassandra for mission critical data
Cassandra for mission critical data
 
DBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructureDBaaS - The Next generation of database infrastructure
DBaaS - The Next generation of database infrastructure
 
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ? Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
Aujourd’hui la consolidation de bases de données Oracle c’est quoi ?
 
NoSQL Intro with cassandra
NoSQL Intro with cassandraNoSQL Intro with cassandra
NoSQL Intro with cassandra
 
cassandra_presentation_final
cassandra_presentation_finalcassandra_presentation_final
cassandra_presentation_final
 
Cassandra
Cassandra Cassandra
Cassandra
 
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and SparkTupleJump: Breakthrough OLAP performance on Cassandra and Spark
TupleJump: Breakthrough OLAP performance on Cassandra and Spark
 
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and SparkFiloDB - Breakthrough OLAP Performance with Cassandra and Spark
FiloDB - Breakthrough OLAP Performance with Cassandra and Spark
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
 
Breakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and SparkBreakthrough OLAP performance with Cassandra and Spark
Breakthrough OLAP performance with Cassandra and Spark
 
BI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache CassandraBI, Reporting and Analytics on Apache Cassandra
BI, Reporting and Analytics on Apache Cassandra
 
Big Telco - Yousun Jeong
Big Telco - Yousun JeongBig Telco - Yousun Jeong
Big Telco - Yousun Jeong
 
Big Telco Real-Time Network Analytics
Big Telco Real-Time Network AnalyticsBig Telco Real-Time Network Analytics
Big Telco Real-Time Network Analytics
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

Cassandra Intro & Key Features

  • 1. Cassandra Introduction & Key Features Meetup Vienna Cassandra Users 13th of January 2014 philipp.potisk@geroba.com
  • 2. Definition Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly available, fault-tolerant, tuneably consistent, column-oriented database that bases its distribution design on Amazon’s Dynamo and its data model on Google’s Bigtable. Created at Facebook, it is now used at some of the most popular sites on the Web [The Definitive Guide, Eben Hewitt, 2010] 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 2
  • 3. History Dynamo, 2007 Bigtable, 2006 OpenSource, 2008 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3
  • 4. Key Features Distributed and Decentralized High Performance CQL – A SQL like query interface Elastic Scalability Cassandra Columnoriented Key-Value store 13/01/2014 High Availability and Fault Tolerance Tuneable Consistency Cassandra Introduction & Key Features by Philipp Potisk 4
  • 5. Distributed and Decentralized Datacenter 1 • Distributed: Capable of running on multiple machines • Decentralized: No single point of failure No master-slave issues due to peer-to-peer architecture (protocol "gossip") Single Cassandra cluster may run across geographically dispersed data centers 13/01/2014 Datacenter 2 1 7 6 2 5 3 4 12 8 11 9 10 Read- and writerequests to any node Cassandra Introduction & Key Features by Philipp Potisk 5
  • 6. Elastic Scalability 1 8 1 • Cassandra scales horizontally, adding more machines that have all or some of the data on • Adding of nodes increase performance throughput linearly • De-/ and increasing the nodecount happen seamlessly 4 Performance 2 throughput = N 3 2 Performance throughput = N x 2 7 4 6 5 Linearly scales to terabytes and petabytes of data 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 3 6
  • 7. Scaling Benchmark By Netflix* 48, 96, 144 and 288 instances, with 10, 20, 30 and 60 clients respectively. Each client generated ~20.000w/s having 400byte in size Cassandra scales linearly far beyond our current capacity requirements, and very rapid deployment automation makes it easy to manage. In particular, benchmarking in the cloud is fast, cheap and scalable, *http://techblog.netflix.com/201 1/11/benchmarking-cassandrascalability-on.html 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 7
  • 8. High Availability and Fault Tolerance • High Availability? Multiple networked computers operating in a cluster Facility for recognizing node failures Forward failing over requests to another part of the system 1 6 2 5 3 4 • Cassandra has High Availability No single point of failure due to the peer-to-peer architecture 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 8
  • 9. Tunable Consistency • Choose between strong and eventual consistency • Adjustable for read- and writeoperations separately • Conflicts are solved during reads, as focus lies on write-performance TUNABLE Available Consistency Use case dependent level of consistency 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 9
  • 10. When do we have strong consistency? • Simple Formula: jsmith (nodes_written + nodes_read) > replication_factor jsmith t1 t2 NW: 2 NR: 2 RF: 3 t1 t2 jsmith t1 • Ensures that a read always reflects the most recent write • If not: Weak consistency  Eventually consistent jsmith 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk t2 10
  • 11. Column-oriented Key-Value Store Row Key1 Column Key1 Column Value1 Column Key2 Column Value2 Column Key3 Column Value3 … … … • Data is stored in sparse multidimensional hash tables • A row can have multiple columns – not necessarily the same amount of columns for each row • Each row has a unique key, which also determines partitioning • No relations! Stored sorted by row key * Stored sorted by column key/value Map<RowKey, SortedMap<ColumnKey, ColumnValue>> * Row keys (partition keys) should be hashed, in order to distribute data across the cluster evenly 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 11
  • 12. CQL – An SQL-like query interface • “CQL 3 is the default and primary interface into the Cassandra DBMS” * • Familiar SQL-like syntax that maps to Cassandras storage engine and simplifies data modelling CRETE TABLE songs ( id uuid PRIMARY KEY, title text, album text, artist text, data blob, tags set<text> ); INSERT INTO songs (id, title, artist, album, tags) VALUES( 'a3e64f8f...', 'La Grange', 'ZZ Top', 'Tres Hombres'‚ {'cool', 'hot'}); SELECT * FROM songs WHERE id = 'a3e64f8f...'; “SQL-like” but NOT relational SQL * http://www.datastax.com/documentation/cql/3.0/pdf/cql30.pdf 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 12
  • 13. High Performance • Optimized from the ground up for high throughput • All disk writes are sequential, append only operations • No reading before writing • Cassandra`s threading-concept is optimized for running on multiprocessor/ multicore machines 13/01/2014 Optimized for writing, but fast reads are possible as well Cassandra Introduction & Key Features by Philipp Potisk 13
  • 14. Benchmark from 2011 (Cassandra 0.7.4)* ops Cassandra showed outstanding throughput in “INSERT-only” with 20,000 ops Insert: Enter 50 million 1K-sized records Read: Search key for a one hour period + optional update Hardware: Nehalem 6 Core x 2 CPU, 16GB Memory 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk *NoSql Benchmarking by Curbit http://www.cubrid.org/blog/de v-platform/nosqlbenchmarking/ 14
  • 15. Benchmark from 2013 (Cassandra 1.1.6)* * Benchmarking Top NoSQL Databases by End Point Corporation, http://www.datastax.com/wp-content/uploads/2013/02/WP-Benchmarking-Top-NoSQL-Databases.pdf Yahoo! Cloud Serving Benchmark: https://github.com/brianfrankcooper/YCSB 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 15
  • 16. When do we need these features? Lots of Writes, Statistics, and Analysis Geographical Distribution Large Deployments 13/01/2014 Evolving Applications Cassandra Introduction & Key Features by Philipp Potisk 16
  • 17. Who is using Cassandra? 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 17
  • 18. ebay Data Infrastructure* • • • • • • Thousands of nodes > 2K sharded logical host > 16K tables > 27K indexes > 140 billion SQLs/day > 5 PB provisioned • 10+ clusters • 100+ nodes • > 250 TB provisioned (local HDD + shared SSD) • > 9 billion writes/day • > 5 billion reads/day • Hundreds of nodes • Persistent & in-memory • > 40 billion SQLs/day Not replacing RDMBS but complementing! Hundreds of nodes > 50 TB > 2 billion ops/day • Thousands of nodes • The world largest cluster with 2K+ nodes *by Jay Patel, Cassandra Summit June 2013 San Francisco 13/01/2014 Cassandra Introduction & Key Features by Philipp Potisk 18
  • 19. Cassandra Use Case at Ebay Application/Use Case • Time-series data and real-time insights • Fraud detection & prevention • Quality Click Pricing for affiliates • Order & Shipment Tracking •… • Server metrics collection • Taste graph-based next-gen recommendation system • Social Signals on eBay Product & Item pages 13/01/2014 Why Cassandra? • Multi-Datacenter (active-active) • No SPOF • Easy to scale • Write performance • Distributed Counters Cassandra Introduction & Key Features by Philipp Potisk 19
  • 21. Summary • History • Key features of Cassandra • • • • • • • Distributed and Decentralized Elastic Scalability High Availability and Fault Tolerance Tunable Consistency Column-oriented key-value store CQL interface High Performance • Ebay Use Case 13/01/2014 Apache project: http://cassandra.apache.org Community portal: http://planetcassandra.org Documentation: http://www.datastax.com/docs Cassandra Introduction & Key Features by Philipp Potisk 21