Cassandra and OpsCenter has a range of backup and restore topics. I will start with a basic overview of Cassandra backup/restore, walking through the operational steps to provide the understanding required to perform an on disk backup and restore. Expanding on this overview, I'll cover the limitations (including schema requirements) and their impact on the restore process. Further, I'll discuss commit log archiving and point in time restore operations. After covering the underlying operations, I'll wrap up with a discussion of how OpsCenter automates this process and leverages S3.
3. SnapshotsSnapshots
Nodetool Snapshot Basics
Performs a flush, then hard links sstables to
More at
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsSnapShot.html
org.apache.cassandra.db
->StorageService
->takeSnapshot
<data_file_directories>/<ks>/<table>/snapshots/<snapshot-name>/
Under the hood, mbeans
4. Snapshots in OpscenterSnapshots in Opscenter
Under Services -> Backup
Displays backup history, allows backup and restore.
Advanced settings we'll cover later
Backup Service is an Enterprise Feature
More at
http://docs.datastax.com/en/opscenter/5.2/opsc/online_help/services/opscBackupService.
html
5. Snapshots in OpscenterSnapshots in Opscenter
Schedule repeated backups
or create ad hoc backup
Select keyspaces
Set location (on server vs
s3)
Uses the mbean to perform
the snapshot rather than
shelling out.
Coordinates the snapshot
on all nodes.
Backs up the schema
to schema.json
Keeps a log for audit
7. Remote SnapshotsRemote Snapshots
Opscenter can also
backup to s3
Specify s3 bucket name,
aws credentials
Optional transfer throttle
and compression
Not all SSTables need to
be backed up, because
they are immutable only
part of the data may
require it.
8. SSTables need to be stored per node to avoid name
collisions.
However dropping and recreating a table can lead to
a naming collision as well, OPSC can attach a
timestamp.
If your data is encrypted, make sure that the
encryption key is also put somewhere safe.
Opsc backs up schemas
Topologies change over time (more on this in restore).
9. Restore OperationsRestore Operations
SSTableloader Basics
Expects the schema to already exist for the sstables.
Expects a directory structure different from that
created by the snapshot, specifically
<Keyspace>/<Table>/<files>
Can stream data to other nodes, doesn't just move
files into place
Leaves files in place as they are restored, possible
disk penalty.
More at
http://docs.datastax.com/en/cassandra/2.1/cassandra/tools/toolsBulkloader_t.html
11. Restore OperationsRestore Operations
Attempts to recreate the
schema or do a schema
comparison. The latter is
extremely difficult with
thrift.
Creates symbolic links in a
temporary directory to
match what SSTableloader
expects.
Logs/audit trail to follow.
Uses SSTableloader
12. Remote RestoreRemote Restore
Topologies change over time.
When topologies shrink multiple nodes worth of data
will have to be sent to a single node (sstable naming
collisions).
13. Remote RestoreRemote Restore
When topologies grow some nodes may be idle
during a restore.
Replacement nodes will have a different host ID and
will need to be matched to host ID of the snapshot.
Opscenter handles all of these cases.
14. Commit Log ArchivingCommit Log Archiving
Cassandra an execute a script
when writing commit log
segments
set in
commitlog_archiving.properties
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.
html
15. Commit Log ArchivingCommit Log Archiving
Opscenter can enable that also
under services->backups
service->settings
Opscenter can also send these
to s3 as well.
http://docs.datastax.com/en/cassandra/2.1/cassandra/configuration/configLogArchive_t.
html
16. Point in Time RestorePoint in Time Restore
2 step operation, restore snapshot, then replay
commit logs.
Find the nearest snapshot that happens prior to the
point in time desired, perform a restore.
Update commitlog_archiving.properties with the
location of the commit logs as well as the point in
time to restore.
Restart cassandra.
More At
http://docs.datastax.com/en//cassandra/2.0/cassandra/configuration/configLogArchive_t.
html
17. PiT in OpscenterPiT in Opscenter
OpsCenter can
automate the PiT
restore process
Set time (in UTC)
OpsCenter will verify
that it is capable of
restoring to that point
in time.
Commit logs or
Snapshots can be local
or on S3
18. PiT Restore ChallengesPiT Restore Challenges
Commit log replays don't stream data around the
ring, this makes topology changes difficult to handle.
Comparing schemas can be tricky if the reply contains
schema changes.