The document provides information about a Cassandra summit including details about:
1) The Cassandra platform and how it achieves consistency, availability and partition tolerance through an eventual consistency model and replication.
2) The Cassandra storage engine which is optimized for writes using memtables, SSTables and compaction.
3) Tools for monitoring and troubleshooting Cassandra including logging, GC logging, nodetool commands for viewing cluster information and statistics.
Cassandra SF 2013 - In Case Of Emergency Break Glass
1. CASSANDRA SUMMIT 2013
IN CASE OF EMERGENCY
BREAK GLASS
Aaron Morton
@aaronmorton
www.thelastpickle.com
#Cassandra13
Licensed under a Creative Commons Attribution-NonCommercial 3.0 New Zealand License
39. ParNew GC Starting
{Heap before GC invocations=224115 (full 111):
par new generation total 873856K, used 717289K ...)
eden space 699136K, 100% used ...)
from space 174720K, 10% used ...)
to space 174720K, 0% used ...)
#Cassandra13
40. Tenuring Distribution
240217.053: [ParNew
Desired survivor size 89456640 bytes, new threshold 4 (max 4)
- age 1: 22575936 bytes, 22575936 total
- age 2: 350616 bytes, 22926552 total
- age 3: 4380888 bytes, 27307440 total
- age 4: 1155104 bytes, 28462544 total
#Cassandra13
41. ParNew GC Finishing
Heap after GC invocations=224116 (full 111):
par new generation total 873856K, used 31291K ...)
eden space 699136K, 0% used ...)
from space 174720K, 17% used ...)
to space 174720K, 0% used ...)
#Cassandra13
42. nodetool info
Token : 0
Gossip active : true
Load : 130.64 GB
Generation No : 1369334297
Uptime (seconds) : 29438
Heap Memory (MB) : 3744.27 / 8025.38
Data Center : east
Rack : rack1
Exceptions : 0
Key Cache : size 104857584 (bytes), capacity 104857584
(bytes), 25364985 hits, 34874180 requests, 0.734 recent hit
rate, 14400 save period in seconds
Row Cache : size 0 (bytes), capacity 0...
#Cassandra13
43. nodetool ring
Note: Ownership information does not include topology, please specify a keyspace.
Address DC Rack Status State Load Owns Token
10.1.64.11 east rack1 Up Normal 130.64 GB 12.50% 0
10.1.65.8 west rack1 Up Normal 88.79 GB 0.00% 1
10.1.64.78 east rack1 Up Normal 52.66 GB 12.50% 212...216
10.1.65.181 west rack1 Up Normal 65.99 GB 0.00% 212...217
10.1.66.8 east rack1 Up Normal 64.38 GB 12.50% 425...432
10.1.65.178 west rack1 Up Normal 77.94 GB 0.00% 425...433
10.1.64.201 east rack1 Up Normal 56.42 GB 12.50% 638...648
10.1.65.59 west rack1 Up Normal 74.5 GB 0.00% 638...649
10.1.64.235 east rack1 Up Normal 79.68 GB 12.50% 850...864
10.1.65.16 west rack1 Up Normal 62.05 GB 0.00% 850...865
10.1.66.227 east rack1 Up Normal 106.73 GB 12.50% 106...080
10.1.65.226 west rack1 Up Normal 79.26 GB 0.00% 106...081
10.1.66.247 east rack1 Up Normal 66.68 GB 12.50% 127...295
10.1.65.19 west rack1 Up Normal 102.45 GB 0.00% 127...297
10.1.66.141 east rack1 Up Normal 53.72 GB 12.50% 148...512
10.1.65.253 west rack1 Up Normal 54.25 GB 0.00% 148...513
#Cassandra13
44. nodetool ring KS1
Address DC Rack Status State Load Effective-Ownership Token
10.1.64.11 east rack1 Up Normal 130.72 GB 12.50% 0
10.1.65.8 west rack1 Up Normal 88.81 GB 12.50% 1
10.1.64.78 east rack1 Up Normal 52.68 GB 12.50% 212...216
10.1.65.181 west rack1 Up Normal 66.01 GB 12.50% 212...217
10.1.66.8 east rack1 Up Normal 64.4 GB 12.50% 425...432
10.1.65.178 west rack1 Up Normal 77.96 GB 12.50% 425...433
10.1.64.201 east rack1 Up Normal 56.44 GB 12.50% 638...648
10.1.65.59 west rack1 Up Normal 74.57 GB 12.50% 638...649
10.1.64.235 east rack1 Up Normal 79.72 GB 12.50% 850...864
10.1.65.16 west rack1 Up Normal 62.12 GB 12.50% 850...865
10.1.66.227 east rack1 Up Normal 106.72 GB 12.50% 106...080
10.1.65.226 west rack1 Up Normal 79.28 GB 12.50% 106...081
10.1.66.247 east rack1 Up Normal 66.73 GB 12.50% 127...295
10.1.65.19 west rack1 Up Normal 102.47 GB 12.50% 127...297
10.1.66.141 east rack1 Up Normal 53.75 GB 12.50% 148...512
10.1.65.253 west rack1 Up Normal 54.24 GB 12.50% 148...513
#Cassandra13
45. nodetool status
$ nodetool status
Datacenter: ams01 (Replication Factor 3)
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.70.48.23 38.38 GB 256 19.0% 7c5fdfad-63c6-4f37-bb9f-a66271aa3423 RAC1
UN 10.70.6.78 58.13 GB 256 18.3% 94e7f48f-d902-4d4a-9b87-81ccd6aa9e65 RAC1
UN 10.70.47.126 53.89 GB 256 19.4% f36f1f8c-1956-4850-8040-b58273277d83 RAC1
Datacenter: wdc01 (Replication Factor 3)
=================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 10.24.116.66 65.81 GB 256 22.1% f9dba004-8c3d-4670-94a0-d301a9b775a8 RAC1
UN 10.55.104.90 63.31 GB 256 21.2% 4746f1bd-85e1-4071-ae5e-9c5baac79469 RAC1
UN 10.55.104.27 62.71 GB 256 21.2% 1a55cfd4-bb30-4250-b868-a9ae13d81ae1 RAC1
#Cassandra13
46. nodetool cfstats
Keyspace: KS1
Column Family: CF1
SSTable count: 11
Space used (live): 32769179336
Space used (total): 32769179336
Number of Keys (estimate): 73728
Memtable Columns Count: 1069137
Memtable Data Size: 216442624
Memtable Switch Count: 3
Read Count: 95
Read Latency: NaN ms.
Write Count: 1039417
Write Latency: 0.068 ms.
Bloom Filter False Postives: 345
Bloom Filter False Ratio: 0.00000
Bloom Filter Space Used: 230096
Compacted row minimum size: 150
Compacted row maximum size: 322381140
Compacted row mean size: 2072156
#Cassandra13
56. Compaction Error
ERROR [CompactionExecutor:36] 2013-04-29 07:50:49,060 AbstractCassandraDaemon.java
(line 132) Exception in thread Thread[CompactionExecutor:36,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(138024912283272996716128964353306009224, 6138633035613062
2d616666362d376330612d666531662d373738616630636265396535) >= current key
DecoratedKey(127065377405949402743383718901402082101,
64323962636163652d646561372d333039322d386166322d663064346132363963386131) writing
into *-tmp-hf-7372-Data.db
at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:160)
at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompaction
Task.java:50)
at org.apache.cassandra.db.compaction.CompactionManager
$2.runMayThrow(CompactionManager.java:164)
#Cassandra13
68. jmx-term
$ java -jar jmxterm-1.0-alpha-4-uber.jar
Welcome to JMX terminal. Type "help" for available commands.
$>open localhost:7199
#Connection to localhost:7199 is opened
$>bean org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies
#bean is set to
org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies
$>get BloomFilterFalseRatio
#mbean =
org.apache.cassandra.db:columnfamily=CF2,keyspace=KS2,type=ColumnFamilies:
BloomFilterFalseRatio = 0.5693801541828607;
#Cassandra13
69. Back to cfstats
Column Family: page_views
Read Count: 270075
Bloom Filter False Positives: 131294
#Cassandra13
71. Fix
Changed read queries to select by column
name to limit SSTables per query.
Long term, migrate to Cassandra v1.2 for off
heap Bloom Filters.
#Cassandra13
73. WARN
WARN [ScheduledTasks:1] 2013-03-29 18:40:48,158
GCInspector.java (line 145) Heap is 0.9355130159566108 full.
You may need to reduce memtable and/or cache sizes.
INFO [ScheduledTasks:1] 2013-03-26 16:36:06,383
GCInspector.java (line 122) GC for ConcurrentMarkSweep: 207 ms
for 1 collections, 10105891032 used; max is 13591642112
INFO [ScheduledTasks:1] 2013-03-28 22:18:17,113
GCInspector.java (line 122) GC for ParNew: 256 ms for 1
collections, 6504905688 used; max is 13591642112
#Cassandra13
74. Serious GC Problems
INFO [ScheduledTasks:1] 2013-04-30 23:21:11,959
GCInspector.java (line 122) GC for ParNew: 1115 ms for 1
collections, 9355247296 used; max is 12801015808
#Cassandra13
75. Flapping Node
INFO [GossipTasks:1] 2013-03-28 17:42:07,944 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
INFO [GossipStage:1] 2013-03-28 17:42:54,740 Gossiper.java
(line 816) InetAddress /10.1.20.144 is now UP
INFO [GossipTasks:1] 2013-03-28 17:46:00,585 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
INFO [GossipStage:1] 2013-03-28 17:46:13,855 Gossiper.java
(line 816) InetAddress /10.1.20.144 is now UP
INFO [GossipStage:1] 2013-03-28 17:48:48,966 Gossiper.java
(line 830) InetAddress /10.1.20.144 is now dead.
#Cassandra13
76. “GC Problems are the result
of workload and
configuration.”
Aaron Morton, Just Now.
#Cassandra13
78. Compaction Correlation?
Slow down Compaction to improve stability.
concurrent_compactors: 2
compaction_throughput_mb_per_sec: 8
in_memory_compaction_limit_in_mb: 32
(Monitor and reverse when resolved.)
#Cassandra13
79. GC Logging Insights
Slow down rate of tenuring and enable full
GC logging.
HEAP_NEWSIZE="1200M"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=4"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=4"
#Cassandra13
80. GC’ing Objects in ParNew
{Heap before GC invocations=7937 (full 205):
par new generation total 1024000K, used 830755K ...)
eden space 819200K, 100% used ...)
from space 204800K, 5% used ...)
to space 204800K, 0% used ...)
Desired survivor size 104857600 bytes, new threshold 4 (max 4)
- age 1: 8090240 bytes, 8090240 total
- age 2: 565016 bytes, 8655256 total
- age 3: 330152 bytes, 8985408 total
- age 4: 657840 bytes, 9643248 total
#Cassandra13
81. GC’ing Objects in ParNew
{Heap before GC invocations=7938 (full 205):
par new generation total 1024000K, used 835015K ...)
eden space 819200K, 100% used ...)
from space 204800K, 7% used ...)
to space 204800K, 0% used ...)
Desired survivor size 104857600 bytes, new threshold 4 (max 4)
- age 1: 1315072 bytes, 1315072 total
- age 2: 541072 bytes, 1856144 total
- age 3: 499432 bytes, 2355576 total
- age 4: 316808 bytes, 2672384 total
#Cassandra13
82. Cause
Nodes had wide rows & 1.3+
Billion rows and 3+GB of
Bloom Filters.
(Using older bloom_filter_fp_chance of 0.000744.)
#Cassandra13
83. Fix
Increased FP chance to 0.1 on
one CF’s and .01 on others.
(One CF reduced from 770MB to 170MB of Bloom Filters.)
#Cassandra13
86. Anatomy of a Partition.
(From a 1.0 cluster)
#Cassandra13
87. Node 23 Was Up
cassandra23# bin/nodetool -h localhost info
Token : 28356863910078205288614550619314017621
Gossip active : true
Load : 275.44 GB
Generation No : 1762556151
Uptime (seconds) : 67548
Heap Memory (MB) : 2926.44 / 8032.00
Data Center : DC1
Rack : RAC_unknown
Exceptions : 0
#Cassandra13
88. Other Nodes Saw It Down
cassandra20# nodetool -h localhost ring
Address DC Rack Status State Load
10.37.114.8 DC1 RAC20 Up Normal 285.86 GB
10.29.60.10 DC2 RAC23 Down Normal 277.86 GB
10.6.130.70 DC1 RAC21 Up Normal 244.9 GB
10.29.60.14 DC2 RAC24 Up Normal 296.85 GB
10.37.114.10 DC1 RAC22 Up Normal 255.81 GB
10.29.60.12 DC2 RAC25 Up Normal 316.88 GB
#Cassandra13
89. And Node 23 SawThem Up
cassandra23# nodetool -h localhost ring
Address DC Rack Status State Load
10.37.114.8 DC1 RAC20 Up Normal 285.86 GB
10.29.60.10 DC2 RAC23 Up Normal 277.86 GB
10.6.130.70 DC1 RAC21 Up Normal 244.9 GB
10.29.60.14 DC2 RAC24 Up Normal 296.85 GB
10.37.114.10 DC1 RAC22 Up Normal 255.81 GB
10.29.60.12 DC2 RAC25 Up Normal 316.88 GB
#Cassandra13
90. Still Available
Node 23 could serve requests at
LOCAL_QUORUM, QUORUM and ALL
Consistency.
Other nodes could serve requests at
LOCAL_QUOURM and QUORUM but not ALL
Consistency.
#Cassandra13
93. Gossip Logs On Node 20?
log4j.logger.org.apache.cassandra.gms.Gossiper=TRACE
TRACE [GossipStage:1] 2011-12-13 00:58:49,636 Gossiper.java
(line 647) local heartbeat version 526912 greater than 7951
for /10.29.60.10
#Cassandra13
94. More Gossip Logs On Node 20?
log4j.logger.org.apache.cassandra.gms.GossipDigestSynVerbHandler=TRACE
log4j.logger.org.apache.cassandra.gms.FailureDetector=TRACE
TRACE [GossipStage:1] 2011-12-13 02:14:37,033 GossipDigestSynVerbHandler.java
(line 46) Received a GossipDigestSynMessage from /10.29.60.10
TRACE [GossipStage:1] 2011-12-13 02:14:37,033 GossipDigestSynVerbHandler.java
(line 76) Gossip syn digests are : /10.29.60.10:1762556151:12552 /
10.29.60.14:1323732392:10208 /10.37.114.8:1323731527:11082 /
10.37.114.10:1323736718:5830 /10.6.130.70:1323732220:10379 /
10.29.60.12:1323733099:9493
//Expected call to the FailureDetector
TRACE [GossipStage:1] 2011-12-13 02:14:37,033 GossipDigestSynVerbHandler.java
(line 90) Sending a GossipDigestAckMessage to /10.29.60.10
#Cassandra13
95. Cause.
Generation is initialised at bootstrap to
seconds past the Epoch.
1762556151 is Fri, 07 Nov 2025 22:55:51
GMT.
cassandra23# bin/nodetool -h localhost info
Generation No : 1762556151
TRACE [GossipStage:1] 2011-12-13 02:14:37,033 GossipDigestSynVerbHandler.java
(line 76) Gossip syn digests are : /10.29.60.10:1762556151:12552 /
#Cassandra13
103. Changing the Snitch
Do Not change the DC or
Rack for an existing node.
(Cassandra will not be able to find your data.)
#Cassandra13
104. Moving to the GossipingPropertyFileSnitch
Update cassandra-
topology.properties
on existing nodes with existing DC/Rack
settings for all existing nodes.
Set default to new DC.
#Cassandra13
105. Moving to the GossipingPropertyFileSnitch
Update cassandra-
rackdc.properties
on existing nodes with existing DC/Rack for
the node.
#Cassandra13
106. Moving to the GossipingPropertyFileSnitch
Use a rolling restart to upgrade existing nodes
to GossipingPropertyFileSnitch
#Cassandra13
107. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
#Cassandra13
108. Got NTS ?
Must use
NetworkTopologyStrategy
for Multi DC deployments.
#Cassandra13
111. NetworkTopologyStrategy
Order Token Ranges in the DC.
Start with range that contains the Row Key.
Add first unselected Token Range from each
Rack.
Repeat until RF selected.
#Cassandra13
114. Changing the Replication Strategy
Be Careful if using existing
configuration has multiple
Racks.
(Cassandra may not be able to find your data.)
#Cassandra13
115. Changing the Replication Strategy
Update Keyspace configuration to use
NetworkTopologyStrategy with
datacenter1:3 and new_dc:0.
#Cassandra13
117. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
#Cassandra13
118. Configuring New Nodes
Add auto_bootstrap: false to
cassandra.yaml.
Use GossipingPropertyFileSnitch.
Three Seeds from each DC.
(Use cluster_name as a safety.)
#Cassandra13
119. Configuring New Nodes
Update cassandra-
rackdc.properties
on new nodes with new DC/Rack for the
node.
(Ignore cassandra-topology.properties)
#Cassandra13
120. StartThe New Nodes
New Nodes in the Ring in the
new DC without data or
traffic.
#Cassandra13
121. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
#Cassandra13
122. Change the Replication Factor
Update Keyspace configuration to use
NetworkTopologyStrategy with
dataceter1:3 and new_dc:3.
#Cassandra13
123. Change the Replication Factor
New DC nodes will start
receiving writes from old DC
coordinators.
#Cassandra13
124. Expand to Multi DC
Update Snitch
Update Replication Strategy
Add Nodes
Update Replication Factor
Rebuild
#Cassandra13