Castle is an open-source project that provides an alternative to the lower layers of the storage stack -- RAID and POSIX filesystems -- for big data workloads, and distributed data stores such as Apache Cassandra.
This presentation from Berlin Buzzwords 2012 provides a high-level overview of Castle and how it is used with Cassandra to improve performance and predictability.
24. Big Data
write optimizing
• 7, 500 - 10,000 RPM
• 5ms - 9ms seeks
• ~150MB/s (sequential)
• 75-150 random IOPS
25. Big Data
write optimizing
A G
A C G K
A B D E G H K L
26. Big Data
write optimizing
A G
query(K)
A C G K
A B D E G H K L
27. Big Data
write optimizing
A G
A C G K
A B D E G H K L
28. Big Data
write optimizing
Memory
Disk
S1 S2 S3 S4 S5
29. Big Data
write optimizing
Memory
Disk
S1 S2 S3 S4 S5
30. Two Revolutions
2010
Distributed, shared-nothing databases
Write-optimised indexes Write-optimised indexes
BTree file systems BTree file systems
RAID ... RAID
New hardware New hardware
Monday, 6 February 2012
31. Bridging the Gap
2011
Distributed, shared-nothing databases
Castle Castle
...
New hardware New hardware
Monday, 6 February 2012
33. Castle is...
• Filesystem (no, not really)
• Key-value store for the Linux kernel
• Write-optimized
• for rotational disks
• for SSDs
• Versioned (clones, snapshots)
• Disk aggregation
• for redundancy
• for performance
• FLOSS!
53. Small random inserts
Small random inserts
3Inserting 3 billion rows
billion rows
Acunu powered Cassandra -
‘standard’ Cassandra -
Monday, 6 February 2012
When thresholds are hit, live memtables are “switched-out”, queued for flushing, and then left to the JVM’s garbage collector for cleanup.\n
Garbage collection ramifications.\n
Replacing CFS with a Castle-backend; Memory is (de)allocated by Castle in kernel-space\n
Cassandra holds all bloomfilters and indexes in-memory; Castle does not have this requirement \n