Apache Cassandra Scalability, Performance and Fault Tolerance in Distributed Databases

APACHE CASSANDRA
Scalability, Performance and Fault Tolerance
in Distributed databases
Jihyun.An (jihyun.an@kt.com)
18, June 2013

TABLE OF CONTENTS
 Preface
 Basic Concepts
 P2P Architecture
 Primitive Data Model & Architecture
 Basic Operations
 Fault Management
 Consistency
 Performance
 Problem handling

TABLE OF CONTENTS (NEXT TIME)
 Maintaining
 Cluster Management
 Node Management
 Problem Handling
 Tuning
 Playing (for Development, Client stance)
 Designing
 Client
 Thrift
 Native
 CQL
 3rd party
 Hector
 OCM
 Extension
 Baas.io
 Hadoop

OUR WORLD
 Traditional DBMS is very valuable
 Storage(+Memory) and Computational Resources cost is cheap (than before)
 But we meet new section
 Big data
 (near) Real time
 Complex and various requirement
 Recommendation
 Find FOAF
 …
 Event Driven Trigging
 User Session
 …

OUR WORLD (CONT)
 Complex applications combine difference types of problems
 Different language -> more productive
 ex: Functional language, Multiprocessing optimized language
 Polyglot persistent layer
 Performance vs Durability?
 Reliability?
 …

TRADITIONAL DBMS
 Relational Model
 Well-defined Schema
 Access with Selection/Projection
 Derived from Joining/Grouping/Aggregating(Counting..)
 Small data (from refined)
 …
 But
 Painful data model changes
 Hard to scale out
 Ineffective in handling large volumes of data
 Not considered with hardware
 …

TRADITIONAL DBMS (CONT)
 Has many constraints for ACID
 PK/FK & checking
 Domain Type checking
 .. checking checking
 Lots of IO / Processing
 OODBMS, ORDBMS
 Good but .. more more checking / processing
 Not well with Disk IO

NOSQL
 Key-value store
 Column : Cassandra, Hbase, Bigtable …
 Others : Redis, Dynamo, Voldemort, Hazelcast …
 Document oriented
 MongoDB, CouchDB …
 Graph store
 Neo4j, Orient DB, BigOWL, FlockDB ..

NOSQL (CONT)
Benefits
 Higher performance
 Higher scalability
 Flexible Datamodel
 More effective for some case
 Less administrative overhead
Drawbacks
 Limited Transactions
 Relaxed Consistency
 Unconstrained data
 Limited ad-hoc query capabilities
 Limited administrative aid tools

CAP
Brewer’s theorem
We can pick two of
Consistency
Availability
Partition tolerance
A
C P
Amazon Dynamo derivatives
Cassandra, Voldemort, CouchDB
, Riak
Neo4j, Bigtable
Bigtable derivatives : MongoDB, Hbase
Hypertable, Redis
Relational:
MySQL, MSSQL,
Postgres

Dynamo
(Architecture)
BigTable
(Data model)
Cassandra
(Apache) Cassandra is a free, open-source, high scalable,
distributed database system for managing large amounts of data
Written in JAVA
Running on JVM
References :
BigTable (http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//archive/bigtable-osdi06.pdf)
Dynamo (http://web.archive.org/web/20120129154946/http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dynamo-sosp2007.pdf)

DESIGN GOALS
 Simple Key/Value(Column) store
 limited on storage
 No support anything (aggregating, grouping …) but basic operation
(CRUD, Range access)
 But extendable
 Hadoop (MR, HDFS, Pig, Hive ..)
 ESP
 Distributed Processing Interface (ex: BSP, MR)
 Baas.io
 …

DESIGN GOALS (CONT)
 High Availability
 Decentralized
 Everyone can accessor
 Replication & Their access
 Multi DC support
 Eventual consistency
 Less write complexity
 Audit and repair when read
 Possible tuning -> Trade offs between consistency, durability and latency

DESIGN GOALS (CONT)
 Incremental scalability
 Equal Member
 Linear Scalability
 Unlimited space
 Write / Read throughput increase linearly by add node(member)
 Low total cost
 Minimize administrative work
 Automatic partitioning
 Flush / compaction
 Data balancing / moving
 Virtual nodes (since v1.2)
 Middle powered nodes make good performance
 Collaborating work will make powerful performance and huge space

FOUNDER & HISTORY
 Founder
 Avinash Lakshman (one of the authors of Amazon's Dynamo)
 Prashant Malik ( Facebook Engineer )
 Developer
 About 50
 History
 Open sourced by Facebook in July 2008
 Became an Apache Incubator project in March 2009
 Graduated to a top-level project in Feb 2010
 0.6 released (added support for integrated caching, and Apache Hadoop MapReduce) in Apr 2010
 0.7 released (added secondary indexes and online schema change) in Jan 2011
 0.8 released (added the Cassandra Query Language (CQL), self-tuning memtables, and support for zero-downtime upgrades) in Jun 2011
 1.0 released (added integrated compression, leveled compaction, and improved read performance) in Oct 2011
 1.1 released (added self-tuning caches, row-level isolation, and support for mixed ssd/spinning disk deployments) in Apr 2012
 1.2 released (added clustering across virtual nodes, inter-node communication, atomic batches, and request tracing) in Jan 2013

PROMINENT USERS
User Cluster size Node count Usage Now
Facebook >200 ? Inbox search Abandoned,
Moved to HBase
Cisco WebEx ? ? User feed, activity OK
Netflix ? ? Backend OK
Formspring ? (26 million
account with 10 m
responsed per day)
? Social-graph data OK
Urban airship,
Rackspace, Open X,
Twitter (preparing
move to)

P2P ARCHITECTURE
 All nodes are same (has equality)
 No single point of failure / Decentralized
 Compare with
 mongoDB
 broker structure (cubrid …)
 Master / slave
 …

P2P ARCHITECTURE
 Driven linear scalability
References :
http://dev.kthcorp.com/2011/12/07/cassandra-on-aws-100-million-writ/

PRIMITIVE DATA MODEL & ARCHITECTURE

COLUMN
 Basic and primitive type (the smallest increment of data)
 A tuple containing a name, a value and a timestamp
 Timestamp is important
 Provided by client
 Determine the most recent one
 If meet the collision, DBMS chose the latest one
Name
Value
Timestamp

COLUMN (CONT)
 Types
 Standard: A column has a name (UUID or UTF8 …)
 Composite: A column has composite name (UUID+UTF8 …)
 Expiring: TTL marked
 Counter: Only has name and value, timestamp managed by server
 Super: Used to manage wide rows, inferior to using composite
columns (DO NOT USE, All sub-columns serialized)
Counter Name
Value
Name
Name
Value
Timestamp
Name
Value
Timestamp

COLUMN (CONT)
 Types (CQL3 based)
 Standard: Has one primary key.
 Composite: Has more than one primary key,
recommended for managing wide rows.
 Expiring: Gets deleted during compaction.
 Counter: Counts occurrences of an event.
 Super: Used to manage wide rows, inferior to using
composite columns (DO NOT USE, All sub-columns
serialized)
DDL : CREATE TABLE test (
user_id varchar,
article_id uuid,
content varchar,
PRIMARY KEY (user_id, article_id)
);
user_id article_id content
Smith <uuid1> Blah1..
Smith <uuid2> Blah2..
{uuid1,content}
Blah1…
Timestamp
{uuid2,content}
Blah2…
Timestamp
Smith
<Logical>
<Physical>
SELECT user_id,article_id from test order
by article_id DESC LIMIT 1;

ROWS
 A row containing a represent key and a set of columns
 A row key must be unique (usually UUID)
 Supports up to 2 billion columns per (physical) row.
 Columns are sorted by their name (Column’s Name indexed)
 Primitive
 Secondary Index
 Direct Column Access
Name
Value
Timestamp
Name
Value
Timestamp
Name
Value
Timestamp
Row
Key

COLUMN FAMILY
 Container for columns and rows
 No fixed schema
 Each row is uniquely identified by its row key
 Each row can have a different set of columns
 Rows are sorted by row key
 Comparator / Validator
 Static/Dynamic CF
 If columns type is super column, CF called “Super Column Familty”
 Like “Table” in Relational world
Name
Value
Timestamp
Name
Value
Timestamp
Name
Value
Timestamp
Row
Key
Name
Value
Timestamp
Row
Key

DISTRIBUTION
Row
Row
Row
Row
Row
Row
Server
1
Server
3
Server
2
Server
4
How to
map?

TOKEN RING
 Node is a instance (typically same as a server)
 Used to map between each row and node
 Range from 0 to 2127-1
 Associated with a row key
 Node
 Assigned a unique token (ex: token 5 to Node 5)
 Range is from previous node token to their token
 token 4 < Node 5’range <= token 5
Node 1
Node 2
Node 3
Node 4Node 5
Node 6
Node 7
Node 8
Token 5
Token 4

PARTITIONING
Row
Key
Random
Partitioners
(MD5,
Murmur3)
Order
Preserving
Partitioner /
Byte
Ordered
Partitioner
Default
Row
Key
Row
Key
Row
Key

REPLICATION
 Any node has read/write role is called
coordinator node (by client)
 Locator determine where located the replica
 Replica is used at
 Consistency check
 Repair
 Ensure W + R > N for consistency
 Local Cache (Row cache)
Node 1
Node 2
Node 3
Node 4Node 5
Node 6
Node 7
Node 8
Replica Factor is 4 (N-1 will be replicated)
Simple Locator treat strategy order as proximity
Locator
(Simple)
Coordinator node
Locating first one
1
2
Here is original

REPLICATION (CONT)
 Multi DC support
 Allow to Specify how many replcas in each DC
 Within DC replicas are placed on different racks
 Relies on snitch to place replicas
 Strategy (provided from Snitch)
 Simple (Single DC)
 RackInferringSnitch
 PropertyFileSnitch
 EC2Snitch
 EC2MultiRegionSnitch
DC1
DC2
Entire

ADD / REMOVE NODE
 Data transfer between nodes called “Streaming”
 If add node 5,
node 3 and node 4, 1 (suppose RF is 2) involved in streaming
 If remove node 2
node 3(got higher token and their replica container) serve instead
Node 1
Node 2
Node 3
Node 4
Node 1
Node 2
Node 3
Node 4
Node 5
Node 1
Node 3
Node 4

VIRTUAL NODES
 Support since v1.2
 Real time migration support?
 Shuffle utility
 One node has many tokens
 => one node has many ranges Node 1 Node 2
Number of token is 4
Cluster
Node 2
Node 1

VIRTUAL NODES (CONT)
 Less administrative works
 Save cost
 When Add/Remove node
 many node co-works
 No need to determine the token
 Shuffle to re-balance
 Less changing time
 Smart balancing
 No need to balance
(Sufficiently number of token should be higher)
Number of token is 4
Node 2
Node 1
Cluster
Node 2
Node 1
Node 3
Add node 3

KEYSPACE
 A namespace for column families
 Authorization
 CF? yeah
 Replication
 Key oriented schema (see right)
{
"row_key1":
{
"Users":{
"emailAddress":{"name":"emailAddress","value":"foo@bar.co
m"
},
"webSite":{"name":"webSite", "value":http://bar.com}
},
"Stats":{ "visits":{"name":"visits", "value":"243"} }
},
"row_key2":
{
"Users":{
"emailAddress":{"name":"emailAddress",
"value":"user2@bar.com"},
"twitter":{"name":"twitter", "value":"user2"}
}
}
}
Row Key
Column Family
Column

CLUSTER
 Total amount of data managed by the cluster is represented as a
ring
 Cluster of nodes
 Has multiple(or single) Keyspace
 Partitioning Strategy defined
 Authentication

GOSSIP
 Gossip protocol is used for cluster membership.
 Failure detection on service level (Alive or Not)
 Responsible
 Every node in the system knows every other node’s status
 Implemented as
 Sync -> Ack -> Ack2
 Information : status, load, bootstraping
 Basic status is Alive/Dead/Join
 Runs every second
 Status disseminated in O(logN) (N is the number of nodes)
 Seed
 PHI is used for auditing dead or alive in time window
(5 -> detecting in 15~16 s)
 Data structure
 HeartBeat<Application Status<Endpoint Status<Endpoint StatusMap
N1
N2
N3
N4
N6
N5

WRITE / UPDATE
 CommitLog
 Abstracted Mmaped Type
 File & Memory Sync -> On system failure? This is angel for U ^^.
 Java NIO
 C-Heap used (=Native Heap)
 Log Data (Write->Delete? But exists)
 Segment Rolling structure
 Memtable
 In memory buffer and workspace
 Sorted order by row key
 If reach threshold or period point, written to disk to a persistent table
structure(SSTable)

WRITE / UPDATE (LOCAL LEVEL)
Write
CommitLog
Write : “1”:{“name”:”fullname”,”value”:”smith”}
Write : “2”:{“name”:”fullname”,”value”:”mike”}
Delete : “1”
Write : “3”:{“name”:”fullname”,”value”:”osang”}
… Key Name Value
1 fullname smith
2 fullname mike
3 fullname Osang
… … …
Memtable
SSTable SSTable SSTable
1 Write to commitLog
2
Write/Update to Memtable
3Write to Disk (flush)

SSTABLE
 SSTable is Sorted String Table
 Best for log structured DB
 Store large numbers of key-value pairs
 Immutable
 Create with “Flush”
 Merges by (major/minor) compaction
 Has one or more column has different version (timestamp)
 Choose recent one

READ (LOCAL LEVEL)
Key Name Value
2 fullname mike
3 fullname Osang
… … …
SSTable
BF
IDX
SSTable
BF
IDX
Read
Memtable

READ (CLUSTER LEVEL, +READ REPAIR)
Replica
(Original, Right)
Replica
(Right)
Replica
(Wrong)
Digest Comparing
Choose the right one if digests differ
(the most recent)
Recover
Read
Operation
Coordinator
Locator
1 Transferred from original/replica node (with consistency level)
2
3

DELETE
 Add tomstone (this is some type of column)
 Garbage collected when compacting
 GC grace seconds : 864000 (default 10 days)
 Issue
 If the fault node recover after GCGraceSeconds, the deleted data can
be resurrected

DETECTION
 Dynamic threshold for marking nodes
 Accrual Detection Mechanism calculates a per-node threshold
 Automatic take into account Network condition, workload and
other conditions might affect perceived heartbeat rate.
 From 3rd party client
 Hector
 Failover

HINTED-HANDOFF
 The coordinator will store a hint for if the node down or failed to
acknowledge the write
 Hint consists of the target replica and the mutation(column
object) to be replayed
 Use java heap (might next to be off-heap)
 Only saved within limited time (default, 1 hour) after a replica fails
 When failed node is alive again, it will begin streaming the miss
writes

REPAIR
 Support triangle method
 CommitLog Replaying (by administrator)
 Read Repair (realtime)
 Anti-entropy Repair (by administrator)

READ REPAIR
 Background work
 Configured per CF
 Choose most recently written value if they are inconsistent, and
replace it.

ANTI-ENTROPY REPAIR
 Ensure all data on a replica is made consistent
 Merkle tree used
 Tree of data block’s hashes
 Verify inconsistent
 Repair node request merkle hash (piece of CF)
to replicas and comparing, streaming from a
replica if inconsistent, do Read-repair
Block
1
Block
2
Block
3
…
CF
hash hash hash hash
hash hash
hash

BASIC
 Full ACID compliance in distributed system is a bad idea.
(network, … )
 Single row updates are atomic (include internal indexes),
everything else is not
 Relaxing consistency does not equal data corruption
 Tunable Consistency
 Speed vs precision
 Any read and write operation decides how consistent the requested
data should be (from client)

CONDITION
 Consistency ensure if
 (W + R) > N
 W is nodes written (succeed)
 R is nodes read
 N is replica factor

CONDITION (CONT)
N is 3
Operations
1. Write 3
2. Write 5
3. Write 1
3 5 1
Worst case
W is 1
1 5 1W is 2 3 1 1or
W is 2 1 1 1
R is 1
Possible case
3 5 1or or
R is 21
1 R is 3
Written Read
(W+R)>N ensure that at lease one latest value can be selected
This is eventual consistency

READ CONSISTENCY LEVELS
 One
 Two
 Three
 Quorum
 Local Quorum
 Each Quorum
 All
Specify how many replicas must response
before a result is return to the client
Quorum : (Replication Factor / 2) + 1
Local Quorum / Each Quorum is used at Multi-
DC
Round down to a whole number processing
(If satisfied, return right away)

WRITE CONSISTENCY LEVELS
 ANY
 One
 Two
 Three
 Quorum
 Local Quorum
 Each Quorum
 All
Specify how many replicas must succeed
before returning acknowledge to client
Quorum : (Replication Factor / 2) + 1
Local Quorum / Each Quorum is used at Multi-
DC
ANY level contain hinted-handoff condition
Round down to a whole number processing
(If satisfied, return right away)

CACHE
 Key/Row Cache can save their data to files
 Key Cache
 Accessed Frequently
 Hold the location of keys (indicating to columns)
 In memory, on JVM heap
 Row Cache
 Optional
 Hold entire columns of the row
 In memory, on Off-heap (since v1.1) or JVM heap
 If you have huge column, this will make OOME (Out Of Memory Event)

CACHE
 Mmaped Disk Access
 On 64bit JVM, used for data and index summary (default)
 Provide virtual mmaped space in Memory for SSTable
 On C-Heap(native heap)
 GC make this as cache
 Data accessed frequently live long period, otherwise GC will purge that
 If the data exists in memory, return it (=cache)
 (Problem) GC C-Heap when its full only
 (Problem) handle open SSTable, this mean Cassandra can allocate the entire size
of open SSTables, otherwise native OOME
 If you wanna have efficient Key/Row/Mmaped Access cache, add
sufficient nodes to cluster

BLOOM FILTERS
 Each SSTable has this
 Used to check if a requested row key exists in the SSTable before
doing any seeks (disk)
 Per row key, generate several hashes and mark the buckets for
the key
 Check each bucket for the key’s hashes, if any is empty the key
does not exists
 False positive are possible, but false negative are not
Key 1 Key 2 Key 2
Hash A Hash B Hash C
1 1 1
Same hashes
Only has

INDEX
 Primary Index
 Per CF
 The index of CF’s row key
 Efficient access with Index summary (1 row key out of every 128 is
sampled)
 In memory, on JVM heap (next move to Off-heap)
Read BF
KeyCache
SSTable
Index
Summary
Primary
Index
Offset
Calculator

INDEX (CONT)
 Secondary Index
 For Column’s value(s)
 Support composite type
 Hidden CF
 Implemented by CF’name index
 Value is the CF’name
 Write/Update/Delete operation is atomic
 Share value for many rows is good for
 On the contrary unique value for indexing is poor (-> use Dynamic CF for
indexing)

COMPACTION
 Combines data from SSTables
 Merge row fragments
 Rebuild primary and secondary indexes
 Remove expired columns marked with tomestone
 Delete old SSTable if complete
 “Minor” only compactions merge SSTables of similar size, “Major” compactions
merge all SSTables in a given CF
 Size-tiered compaction
 Leveled compaction
 Since v1.0
 Based on LevelDB
 Temporary use maximum twice space and spike in disk IO.

ARCHITECTURE
 Write : no race conditions, not handled by disk IO
 Read : Slow than write, but fast (DHT, cache …)
 Load balancing
 Virtual Nodes
 Replication
 Multi-DC

BENCHMARK
References :
http://www.google.co.kr/url?sa=t&rct=j&q=&esrc=s&frm=1&source=web&cd=1&cad=rja&sqi=2&ved=0CCsQFjAA&url=http%3A%2F%2F68.18
0.206.246%2Ffiles%2Fycsb.pdf&ei=O_nAUYqlPI2okQWO-ICwCA&usg=AFQjCNGySLHho0zZ-
eMsJIm4VjsoNEOyKw&sig2=6p45QMDvTN963EqbM8YpDg/
Workload A—update heavy: (a) read
operations, (b) update operations.
Throughput in this (and
all figures) represents total operations
per second, including reads and
writes.
Workload B—read heavy: (a) read
operations, (b) update operations
By YCSB (Yahoo Cloud Serving Benchmark)

BENCHMARK (CONT)
References :
Workload E—short scans.
Read performance as cluster size increases.

BENCHMARK (CONT)
Elastic speedup:
Time series showing
impact of adding
servers online.
References :

BENCHMARK (CONT) By NoSQLBenchmarking.com
References :
http://www.nosqlbenchmarking.com/2011/02/new-results-for-cassandra-0-7-2//

BENCHMARK (CONT) By Cubrid
References :
http://www.cubrid.org/blog/dev-platform/nosql-benchmarking/

BENCHMARK (CONT) By VLDB
References :
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf/
Read latency Write latencyThroughput (95% read, 5% write)

BENCHMARK (LAST) By VLDB
References :
http://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf/
Throughput (50% read, 50% write) Throughput (100% write)

RESOURCE
 Memory
 Off-heap & Heap
 OOME Problem
 CPU
 GC
 Hashing
 Compression / Compaction
 Network Handling
 Context Switching
 Lazy Problem
 IO
 Bottleneck for everything

MEMORY
 Heap (GC management)
 Permanent (-XX:PermSize, -XX:MaxPermSize)
 JVM Heap (-Xmx, -Xms, -Xmn)
 C-Heap (=Native Heap)
 OS Shared
 Thread Stack (-Xss)
 Objects that access with JNI
 Off-Heap
 OS Shared
 GC managed by Cassandra

MEMORY (CONT)
 Heap
 Permanent
 JVM Heap
 Memtable
 KeyCache
 IndexSummary(move to Off-heap on next
release)
 Buffer
 Transport
 Socket
 Disk
 C-Heap
 Thread Stack
 File Memory Map (Virtual space)
 Data / Index buffer (default)
 CommitLog
v1.2
 Off-Heap (OS shared)
 RowCache
 BloomFilter
 Index->CompressionMetaData-
>ChuckOffset

MEMORY (CONT)
 Memtable
 Managed
 total size (default 1/3 JVM heap, flush largest memtable for CF if reached)
 Emergency, heap usage above the fraction of the max after full GC(CMS) -> flush
largest memtable (each time) -> prevent full GC / OOME
 KeyCache
 Managed
 total size (100M or 5% of the max)
 Emergency, heap usage above the fraction of the max after full GC(CMS) ->
reduce max cache size -> prevent full GC / OOME
 RowCache/CommitLog
 Managed
 total size (default disabled) -> prevent OOME

MEMORY (CONT)
 Thread Stack
 Not managed
 But XSS set as 180k (default)
 Check thrift (transport level, RPC server)’s server serving type (sync,
hsha, async(has bugs))
 Set min/max threads for connection (default unlimited)
v1.2

MEMORY (CONT)
 Transport buffer
 Thrift
 Support many languages and crossing
 Provide server/client interface, serializing
 Apache project, created by Facebook
 Framed buffer (default max 16M, variable size)
 4k, 16k, 32k, … 16M
 Determine by client
 Per connection
 Adjust max frame buffer size (client, server)
 Set min/max threads for connection (default unlimited)
v1.2
Data Service
Client
Data Service
Thrift

MEMORY (LAST)
 C-Heap/Off-Heap
 OS Shared -> Other application possible to make some problem
 File Memory Map (Virtual space)
 GC when Full GC
 0 <= total size <= the size of opened SSTables
 If cannot allocate? -> Native OOME
 But
 Generally access limited space of SSTable
 GC make space
 Worst case? (If OOME occur)
 yaml->disk_access_mode : standard (restart required)
 Add sufficient nodes
 Yaml->disk_access_mode : auto After joining
v1.2

CPU
 GC
 CMS
 Marking phase : low thread priority -> but high usage rate (it’s not a problem)
 CMSInitiatingOccupancyFraction is 75 (default)
 UseCMSInitiatingOccupancyOnly
 Full GC
 Frequency is important -> may has a problem (eg: thrift transport buffer)
 Add nodes or analyze memory usage to adjust configuration for
 Minor GC
 It’s OK
 Compaction
 If do slow, okay
 So priority down with “-XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Dcassandra.compaction.priority=1”
 High CPU Load -> sustaining? -> When U need to add nodes

SWAPPING
 Swapping make big problem for real-time application
 IO block -> Thread block -> Gossip/Compaction/Flush … delaying ->
make other problem
 Disable or Set minimum Swapping
 Disable Swap partition
 Or Enable JNA + Kernel Configuration
 JNA : Mlockall (keep heap memory in physical memory)
 Kernel
 vm.swappiness=0 (but distress -> possible to swapping)
 vm.overcommit_memory=1
 Or vm.overcommit_memory=2 (overcommit managed)
 vm.overcommit_ratio=? (eg 0.75)
 Max memory = swap partition size + ratio*physical memory size
 Eg: 8G = 2G + 0.75*8G

MORNITERING
 System Monitoring
 CPU / Memory / Disk
 Nagios, Ganglia, Cacti, Zabbix
 Network Monitoring
 Per Client
 NfSen (network flow monitoring, see:
http://nfsen.sourceforge.net/#mozTocId376385)
 Cluster Monitoring / Maintaining
 OpsCenter

CHECK THREAD
 “top” command
 “H” key command to spread per thread
 “P” key command to sort by CPU usage rate
 Choose heavy rate thread’s PID
 PID convert to in Hex (http://www.binaryhexconverter.com/decimal-to-hex-converter)
 “jstack <Parent PID> > filename.log” command to save java stack to file
 Search PID in Hex
313C

CHECK HEAP
 Use dump file that from “jmap” or OOME
 Use “jhat” or another tool to analyze
 Check [B
 and their reference object

For development, maintaining
Sorry..
I have just two days to write this presentation.
Next time I will write and speak to U.
See U next time

Question or Talk about anything with Cassandra

Thank you
If you have any problem or question for me, please contact my email.
jihyun.an@kt.com

Apache Cassandra Scalability, Performance and Fault Tolerance in Distributed Databases

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Apache Cassandra Scalability, Performance and Fault Tolerance in Distributed Databases

Similar a Apache Cassandra Scalability, Performance and Fault Tolerance in Distributed Databases (20)

Último

Último (20)

Apache Cassandra Scalability, Performance and Fault Tolerance in Distributed Databases