Learn Cassandra at edureka!

What are we going to learn today?
 New Problems which can’t be handled by traditional RDBMS
 Tradeoff between Consistency, Availability, Partition Tolerance ( CAP theorem)
 What are the different solutions available?
 What is Cassandra?
 Use-Cases for Cassandra
 Cassandra Features – Tunable Consistency, P2P Architecture, Elastic Scalability, Column Orientation
 Demo Application using Cassandra

Twitter – Massive Scale, High Availability

Travel Booking – Scale and Availability

Movie Booking – Consistency and Scale

Facebook Graph Search – Fast, Complex Querying

Facebook Messenger- Consistency and Scale

So, What Is Common?
 Huge Data
 Fast Random access
 Variable Schema
 Need of Compression
 High Availability
 Need for Consistency
 Need of Distribution (Sharding)

Brewer’s CAP Theorem
http://www.w3resource.com/mongodb/nosql.php
Consistency
Partition
Tolerance
Availability
CA CP
AP
RDBMS MongoDB
HBase
Redis
CouchDB Cassandra DynamoDB Riak

NoSQL Landscape
Scalability&Speed
Query and Navigational Complexity
Performance
Key-Value
Stores
Dynamo (Amazon),
Voldemort
(LinkedIn), Citrusleaf,
Membase, Riak,
Tokyo Cabinet
Big Table
Clones
BigTable
(Google),
Cassandra,
HBase,
Hypertable Document
Database
CouchOne,
MongoDB,
Terrastore,
OrientDB
Graph
Databases
FlockDB (Twitter),
AllegroGraph,
DEX, InfoGrid,
Neo4J, Sones

Cassandra Usecase – Deep Dive
5000 TPS
Caching Layer
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
1000 TPS
WEB APPLICATION
RDBMS1
Applications Changing Data
RDBMS1
Elastic Scale

Using Cassandra
1000 TPS
Elastic Scale WEB APPLICATION
Applications Changing Data
Elastic Scale
CASSANDRA
300 ~ 500 SQL
Transaction
100 ~ 200 SQL
Transaction
5000 TPS

 eCommerce (Travel Portal)
 Both B2B & B2C Consumers
 High volume of shopping transactions ( > 500 Million Visits / Day)
 High volume supply changes (Manual & System) generated.
 Huge Inventory Database ( Millions of hotels)
 High Read/Write (Thousands Reads & Writes/Second)
 Application has to 99.99% Available
 Fault Tolerant & Reliable.
 Fast & Quick Shopping Experience.
 Elastic Scale
 Innovative Recommendations & Algorithms.
 Should be fast for new changes
 Should be cost effective for maintenance.
 Development Approaches
 Legacy Way (Pure RDBMS)
 Augmented (RDBMS + Caching, Heavy Database Hardware)
 Using Cassandra
Cassandra Use Case -Summary

Apache Cassandra is an open source, distributed, decentralized, elastically scalable, highly
available, fault-tolerant, Tuneably consistent, column-oriented database.
What is Apache Cassandra
Cassandra Features
Open
Source
Distributed
Decentralized
Elastically
Scalable
Highly
Scalable
Fault
Tolerant
Tuneably
Consistent
Column
Oriented

Distributed And Decentralised
Post Office
Decentralised
Post Office
Centralised
CCY
Exchange stationary Letter/Couriers
Ccy Courier Stationary
CCY, Stationary, Lette
r/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers

 Every Node Is Identical.
 Peer to Peer Protocol and uses Gossip Protocol to
maintain and keep the List of nodes in Sync.
 No Single Point of Failure.
 No Special Host to Coordinate Activities.
 Easier to Operate and Maintain because all
nodes are same.
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
Distributed And Decentralised

 Types of Scalability
 Vertical Scalability
 Horizontal Scalability
 What is Elastic Scalability?
 This is special property of Horizontal Scalability.
 The cluster can seamlessly scale up and scale back down without major disruption.
Elastic Scalability

 Cluster must accept new nodes without major
disruption or reconfiguration.
ADD A NODE AND MOVE ON!!
CCY, Stationary,
Letter/Couriers
CCY, Stationary, Le
tter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary, Le
tter/Couriers
 Process should not be restarted
 Do not have to change application charges
 Don’t have to rebalance data
Elastic Scalability

 Highly Available
 No Downtime
High Availability And Fault Tolerance
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers
CCY, Stationary,
Letter/Couriers

Tunable Consistency
Strong
Consistency
Eventual
Consistency
Cassandra enables us to tune the Consistency based on the Application Requirement

 Cassandra was designed specifically from the ground up to take full
advantage of multiprocessor/ multicore machines, and to run across many
dozens of these machines housed in multiple data centres.
 It scales consistently and seamlessly to hundreds of terabytes.
 Shows exceptional performance under heavy loads.
 Consistently shows very fast throughput for writes per second on a basic
commodity workstation.
High Performance

Cassandra Terminologies
Cluster / Server (Datacenters, Racks, Nodes & Virtual Nodes)
Client (Thrift, CQL)
Data Model
• Key Spaces
• Column Families / Super Column Families / System Key Spaces
• Primary & Secondary Indexes
Fault Tolerance / High Availability
• Replication (Simple, Network)
• Partitioning (Token Ring, Token Ranges, Random, Ordered, Murmer3)
• Snitches (Simple, EC2 etc)
• Cluster Communications (Gossip, Seed Nodes)
Consistency & Reliability
• Any, One, Two, Three, QOURUM, Hinted Handoff
• Strong Consistency (Read vs Write)
• Anti-Entropy / Read Repairs & Hinted Handoffs.
• HeadLog, Bloom Filter, MemTable, SSTable
• Compaction (SSTable, Snappy)
• Tombstones, Row & Key Caches

Use if your application has :-
 Big Data (Billions Of Records Rows & Columns)
 Very High Velocity Random Reads & Writes.
 Flexible Sparse / Wide Column Requirements.
 No Multiple Secondary Index Needs.
 Low Latency
Use Cases
 eCommerce Inventory Cache Use Cases
 Time Series / Events Use Cases.
 Feed Based Activities / Use Cases.
Where to use Cassandra

Where NOT to use Cassandra
Don’t Use if you application has :-
• Secondary Indexes.
• Relational Data.
• Transactional (Rollback, Commit)
• Primary & Financial Records.
• Stringent Security & Authorization Needs On Data
• Dynamic Queries on Columns.
• Searching Column Data
• Low Latency

Cassandra Installation & Configuration
• conf/cassandra.yaml
• Tools
Key Space Setup
Column Family / Data Model Setup
• Key
• Columns & Data Types
• Indexes (Primary & Secondary)
• Programmatic Consistency
Thrift Hector API
CQL3 API
Application Demo

Learn Cassandra at edureka!

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (18)

Similar a Learn Cassandra at edureka!

Similar a Learn Cassandra at edureka! (20)

Más de Edureka!

Más de Edureka! (20)

Último

Último (20)

Learn Cassandra at edureka!

Notas del editor