2. Overview of DataStax
Founded in April 2010
Commercial leader in Apache Cassandra™,
the popular open-source “big data” database
100+ customers
30+ employees
Home to Apache Cassandra Chair & most
committers
Headquartered in San Francisco Bay area
Secured $11M in Series B funding in Sep 2011
3. Why DataStax?
DataStax delivers database products and services
based on Apache Cassandra from experts who
are at the forefront of today's data revolution.
Database Software & Tools Support & Services
DataStax Enterprise Production Support
DataStax Community Consultative Help
DataStax OpsCenter Professional Training
Drivers & Connectors Online Documentation
6. What a Cloud Database is not
A Cloud database is not simply taking a traditional RDBMS
and running it in a Cloud provider’s environment.
7. Key Attributes of a Cloud Database
Transparent elasticity – can add and subtract nodes online with load
balancing
Transparent scalability – addition of nodes increases both (1)
performance throughput; (2)ability to handle Big Data and maintain
high performance
High availability – always up; no single point of failure
Multi-geography/zone aware – able to span multiple geographies, data
centers, and cloud provider zones. Can read/write to any node
Data redundancy – data is protected via multiple copies held at
different physical locations
Dynamic schema – able to manage structured, semi-structured, and
unstructured data
Simple manageability – easy to administer a logical database across
many nodes
Software support – supports popular public and private Cloud providers
Low cost – won’t break the bank
9. What is Cassandra?
Apache Cassandra™ is a free
Distributed…
High performance…
Extremely scalable…
Fault tolerant (i.e. no single point of failure)…
post-relational database solution. Cassandra can serve
as both real-time datastore for online/transactional
applications, and as a read-intensive database for
business intelligence systems.
11. Cassandra Technical Advantages
Key technical attributes of Cassandra
include:
Big Data scalability
Fast /Linear scale performance
No single point of failure
Enterprise / multi-data center / Cloud data distribution
Read/Write Anywhere capable
Flexible schema
Tunable data consistency
Data compression
Familiar SQL-Like language – CQL
Easy setup
No special hardware needed
No special caching layer needed
12. Cassandra Architecture Overview
Cassandra was designed with the understanding that
system/hardware failures can and do occur
Peer-to-peer, distributed system
All nodes the same
Data partitioned among all nodes in the cluster
Custom data replication to ensure fault tolerance
Read/Write-anywhere design
13. Cassandra Architecture Overview
Each node communicates with each other through the
Gossip protocol, which exchanges information across the
cluster every second
A commit log is used on each node to capture write
activity. Data durability is assured
Data also written to an in-memory structure (memtable)
and then to disk once the memory structure is full (an
SStable)
14. Cassandra Architecture Overview
The schema used in Cassandra is mirrored after Google
Bigtable. It is a row-oriented, column structure that can
store structured, semi-structured, and unstructured data
A keyspace is akin to a database in the RDBMS world
A column family is similar to an RDBMS table but is more
flexible/dynamic
A row in a column family is indexed by its key. Other
columns may be indexed as well
Portfolio Keyspace
Customer Column Family
ID Name SSN DOB
15. Transparent Elasticity
Nodes can be added and removed from Cassandra
online, with no downtime being experienced.
1
12 2
1
11 3
6 2
10
4
5 3
5
9
4
6
8
7
16. Transparent Scalability
Addition of Cassandra nodes increases performance
linearly and ability to manage TB’s-PB’s of data.
1
12 2
1
11 3
6 2
Performance Performance
throughput = N throughput = N x 2
10
4
5 3
5
9
4
6
8
7
17. Transparent Scalability
Over 1
million
writes/se
c!
http://techblog.netflix.com/2011/11/benchmarking-cassandra-scalability-on.html
19. Multi-Geography/Zone Aware
Cassandra allows a single logical database to span 1-N
datacenters that are geographically dispersed. Also
supports a hybrid on-premise/Cloud implementation.
20. Data Redundancy
Cassandra allows for customizable data redundancy so
that data is completely protected. Also supports rack
awareness (data can be replicated between different
racks to guard against machine/rack failures).
21. Dynamic Schema
Cassandra’s data model – based on Google’s Bigtable –
allows a user to store structured, semi-structured, and
unstructured data with ease.
Portfolio Keyspace
Customer Column Family
ID Name SSN DOB
22. Simple Manageability
AMI installers install and configure an entire multi-node
Cloud implementation in minutes. All can be managed
and monitored via Web-based console.
24. Low Cost
Cassandra is open source software and is freely
available. Commercial/advanced versions of Cassandra
are available from DataStax along with support and
other services.
25. How Does Cassandra Stack Up?
Cloud Database Attribute Meet? Info
Transparent elasticity Nodes can be added/removed
online with auto load balancing
Transparent scalability Performance increases linearly with
node additions. Big Data capable
High availability No single point of failure. Offers high
degree of availability
Multi-geo/zone Supports multi data centers, geos,
Cloud zones, read-write anywhere
Data redundancy Customizable data replication /
redundancy
Dynamic Schema Able to manage all key types of
data
Simple manageability Easy install, setup and managed via
Web console
Cloud provider/software support Support for all key providers and
operating systems
Low cost Free if use community; very low cost
if using DataStax for advanced
functionality and/or support
26. Next Steps
Download Cassandra and try it in your own
environment or on your Cloud provider’s platform.
Go to
www.datastax.com/do
wnload
Downloads available for
both Cassandra installs
that are on premise and
AMI for Amazon EC2