Evaluating Apache Cassandra as a Cloud Database

Evaluating Apache
Cassandra as a
Cloud Database

Overview of DataStax
 Founded in April 2010
 Commercial leader in Apache Cassandra™,
the popular open-source “big data” database
 100+ customers
 30+ employees
 Home to Apache Cassandra Chair & most
committers
 Headquartered in San Francisco Bay area
 Secured $11M in Series B funding in Sep 2011

Why DataStax?
DataStax delivers database products and services
based on Apache Cassandra from experts who
are at the forefront of today's data revolution.

Database Software & Tools Support & Services

 DataStax Enterprise  Production Support
 DataStax Community  Consultative Help
 DataStax OpsCenter  Professional Training
 Drivers & Connectors  Online Documentation

What Constitutes a
Cloud Database?

What a Cloud Database is not
A Cloud database is not simply taking a traditional RDBMS
and running it in a Cloud provider’s environment.

Key Attributes of a Cloud Database
 Transparent elasticity – can add and subtract nodes online with load
balancing
 Transparent scalability – addition of nodes increases both (1)
performance throughput; (2)ability to handle Big Data and maintain
high performance
 High availability – always up; no single point of failure
 Multi-geography/zone aware – able to span multiple geographies, data
centers, and cloud provider zones. Can read/write to any node
 Data redundancy – data is protected via multiple copies held at
different physical locations
 Dynamic schema – able to manage structured, semi-structured, and
unstructured data
 Simple manageability – easy to administer a logical database across
many nodes
 Software support – supports popular public and private Cloud providers
 Low cost – won’t break the bank

How does Apache
Cassandra stack
up?

What is Cassandra?
Apache Cassandra™ is a free
 Distributed…
 High performance…
 Extremely scalable…
 Fault tolerant (i.e. no single point of failure)…

post-relational database solution. Cassandra can serve
as both real-time datastore for online/transactional
applications, and as a read-intensive database for
business intelligence systems.

The History of Cassandra
Bigtable Dynamo

Cassandra Technical Advantages
Key technical attributes of Cassandra
include:
 Big Data scalability
 Fast /Linear scale performance
 No single point of failure
 Enterprise / multi-data center / Cloud data distribution
 Read/Write Anywhere capable
 Flexible schema
 Tunable data consistency
 Data compression
 Familiar SQL-Like language – CQL
 Easy setup
 No special hardware needed
 No special caching layer needed

Cassandra Architecture Overview
 Cassandra was designed with the understanding that
system/hardware failures can and do occur
 Peer-to-peer, distributed system
 All nodes the same
 Data partitioned among all nodes in the cluster
 Custom data replication to ensure fault tolerance
 Read/Write-anywhere design

 Each node communicates with each other through the
Gossip protocol, which exchanges information across the
cluster every second
 A commit log is used on each node to capture write
activity. Data durability is assured
 Data also written to an in-memory structure (memtable)
and then to disk once the memory structure is full (an
SStable)

 The schema used in Cassandra is mirrored after Google
Bigtable. It is a row-oriented, column structure that can
store structured, semi-structured, and unstructured data
 A keyspace is akin to a database in the RDBMS world
 A column family is similar to an RDBMS table but is more
flexible/dynamic
 A row in a column family is indexed by its key. Other
columns may be indexed as well

Portfolio Keyspace
Customer Column Family

ID Name SSN DOB

Transparent Elasticity
Nodes can be added and removed from Cassandra
online, with no downtime being experienced.

1
12 2

1
11 3

6 2

10
4

5 3

5
9
4

6
8

7

Transparent Scalability
Addition of Cassandra nodes increases performance
linearly and ability to manage TB’s-PB’s of data.

1
12 2

1
11 3

6 2

Performance Performance
throughput = N throughput = N x 2
10
4

5 3

5
9
4

6
8

7

Transparent Scalability

Over 1
million
writes/se
c!

http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html

High Availability
Cassandra, with its peer-to-peer architecture has no
single point of failure.

Multi-Geography/Zone Aware
Cassandra allows a single logical database to span 1-N
datacenters that are geographically dispersed. Also
supports a hybrid on-premise/Cloud implementation.

Data Redundancy
Cassandra allows for customizable data redundancy so
that data is completely protected. Also supports rack
awareness (data can be replicated between different
racks to guard against machine/rack failures).

Dynamic Schema
Cassandra’s data model – based on Google’s Bigtable –
allows a user to store structured, semi-structured, and
unstructured data with ease.

Portfolio Keyspace
Customer Column Family

ID Name SSN DOB

Simple Manageability
AMI installers install and configure an entire multi-node
Cloud implementation in minutes. All can be managed
and monitored via Web-based console.

Cloud Provider/Software Support
Cassandra is supported on popular Cloud provider
platforms and operating systems.

Low Cost
Cassandra is open source software and is freely
available. Commercial/advanced versions of Cassandra
are available from DataStax along with support and
other services.

How Does Cassandra Stack Up?
Cloud Database Attribute Meet? Info
Transparent elasticity Nodes can be added/removed
online with auto load balancing
Transparent scalability Performance increases linearly with
node additions. Big Data capable
High availability No single point of failure. Offers high
degree of availability
Multi-geo/zone Supports multi data centers, geos,
Cloud zones, read-write anywhere
Data redundancy Customizable data replication /
redundancy
Dynamic Schema Able to manage all key types of
data
Simple manageability Easy install, setup and managed via
Web console
Cloud provider/software support Support for all key providers and
operating systems
Low cost Free if use community; very low cost
if using DataStax for advanced
functionality and/or support

Next Steps
Download Cassandra and try it in your own
environment or on your Cloud provider’s platform.

 Go to
www.datastax.com/do
wnload
 Downloads available for
both Cassandra installs
that are on premise and
AMI for Amazon EC2

Evaluating Apache Cassandra as a Cloud Database

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (13)

Similar a Evaluating Apache Cassandra as a Cloud Database

Similar a Evaluating Apache Cassandra as a Cloud Database (20)

Más de DataStax

Más de DataStax (20)

Último

Último (20)

Evaluating Apache Cassandra as a Cloud Database