SlideShare a Scribd company logo
1 of 59
Download to read offline
Squeezing the Most Out of
the Storage Engine with
State of the Art Compaction
Raphael S. Carvalho, Software Engineer
Raphael Carvalho
■ Syslinux, suite of bootloaders
■ OSv, an operating system for the cloud
■ Seastar, the framework powering ScyllaDB
■ ScyllaDB, the best database in the world
“In order to make good use of the computer
resources, one must organize files intelligently,
making the retrieval process efficient.”
The Ubiquitous B-Tree paper, 1979
■ Short & precise definition from aforementioned paper:
■ “allow users to store, update, and recall”
Storage Engines
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
(k1,v1)(k2,v2)
■ Two approaches for handling updates
■ In-place structure (B+-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1, v3)
■ Two approaches for handling updates
■ In-place structure (ex: B+-tree)
Storage Engines
(k1,v3)(k2,v2)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Two approaches for handling updates
■ Out-of-place structure (ex: LSM-tree)
Storage Engines
(k1,v1)(k2,v2)
(k1,v3)
■ Out-of-place update isn’t new.
■ 1976 paper “Differential files” shows its applicability in the real world
■ “shown to be an efficient method for storing a large and changing
database”
Storage Engines
■ A good analogy is presented in the paper
Storage Engines
■ The Log-Structured Merge-Tree (LSM-Tree)
paper is then published in 1996
Storage Engines
Storage Engines
THE LSM-TREE
writes
C0
C1
C2
Ck
MEMORY
DISK
merge sort
Storage Engines
THE LSM-TREE
C1 is T times bigger than C0.
C(K) is T times bigger than C(K-1).
C0
C1
C2
Ck
MEMORY
DISK
merge sort
■ Immutability of LSM tree components (ex: SSTables) simplifies
■ Concurrency control
■ Recovery
Storage Engines
Query on LSM Tree
(k1, v2)
(k1, v1)
MEMORY
DISK
Query
k1
■ A compaction policy (or strategy) defines the shape of LSM tree
■ Any policy is composed of 4 primitives
■ Trigger (when to compact)
■ File picking policy (which data to compact)
■ Granularity (how much data at once)
■ Layout (how data is laid out)
LSM-tree compaction policy
Pure Leveled in Original LSM Design
ONLY 1 COMPONENT PER LEVEL!
C0
C1
C2
Ck
MEMORY
DISK
merge sort
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
Flexible Leveled in Modern LSM Design
MEMORY
DISK
L0
L1
■ Partitions the LSM-tree components into (usually fixed-size) fragments
■ Subset of a level can be merged into the next one (partial merge)
■ Bounds:
■ compaction operation time
■ temporary disk space during compaction lifetime
Partitioning Optimization for Leveled
Partitioning Optimization for Leveled
MEMORY
DISK
L1
L2
KEY RANGE
SST
SST SST SST
SST
SST
Partitioning Optimization for Leveled
MEMORY
DISK
L1
L2
KEY RANGE
SST
SST SST SST
SST
SST
Leveled Policy - Cost Analysis
■ Let T be the size ratio between adjacent levels
■ Let L be the number of levels for a given LSM tree
■ Write amplification:
■ Space amplification:
O(T * L)
O(T + 1)
------ = ~1.1
T
Stepped-Merge Algorithm
■ 1997 paper Incremental organization for data recording and
warehousing -> a new approach to LSM tree layout
■ “Our goal is to design a technique that supports both insertion and
queries with reasonable efficiency, and without the delays of periodic
batch processing.”
■ Gives birth to the tiered compaction policy
Tiered Compaction Policy
MEMORY
DISK
L0
L1
SST
FILE SIZE
Tiered Compaction Policy
MEMORY
DISK
L0
L1
SST
FILE SIZE
SST
Tiered Compaction Policy
MEMORY
DISK
L0
L1
FILE SIZE
SST
Tiered Policy - Cost Analysis
■ Let T be the size ratio between adjacent levels
■ Let L be the number of levels for a given LSM tree
■ Write amplification:
■ Space amplification:
O(L)
O(T * L)
Now ScyllaDB journey begins
The database inherited all the LSM-tree
improvements described so far…
But they weren’t enough
Tiered - Temporary Space Problem!
MEMORY
DISK
L0
L1
FILE SIZE
SST SST
Tiered - Temporary Space Problem!
MEMORY
DISK
L0
L1
FILE SIZE
SST SST
SST
100% TEMP SPACE OVERHEAD
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S S T S S T
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S S T S S T
S
Partitioning Optimization for Tiered
MEMORY
DISK
L0
L1
FILE SIZE
S T S T
S
Tiered Policy - Partitioning Optimization
■ Bounds temporary space overhead significantly
■ Allows disk space usage from 50% to 80% and beyond.
■ Available in ScyllaDB as Incremental Compaction Strategy (ICS)
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
LEVELED
LSM tree - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
LEVELED
PURE
TIERED
But the world is not only black and white
There are shades of gray in between…
Hybrid LSM-tree data layout
■ Largest level is space optimized
■ Other levels are write optimized
■ Addresses O(K) space amplification in tiered in overwrite workloads
■ Where K = number of components per level
Hybrid LSM-tree data layout
L1
L2
FILE SIZE
L0
SST
SST
SST SST
WRITE OPTIMIZED LEVELS
SPACE OPTIMIZED LEVEL
Hybrid LSM-tree data layout
L1
L2
FILE SIZE
L0
SST
SST
WRITE OPTIMIZED LEVELS
SPACE OPTIMIZED LEVEL
SST
Hybrid LSM - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
TIERED
PURE
LEVELED
HYBRID
Hybrid LSM - Efficiency Space
SPACE
OPTIMIZED
WRITE
OPTIMIZED
PURE
TIERED
PURE
LEVELED
HYBRID
Hybrid LSM-tree data layout
■ Reduces space amplification in overwrite-intensive workloads
■ = less space amplification
■ = increased storage density per node
■ = more money in your pocket.
■ Available as space amplification goal (SAG) option of Incremental
Compaction Strategy.
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
TOMBSTONE
LSM-tree & tombstones
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
Suboptimal LSM-tree tombstone handling
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
GARBAGE
COLLECTION
Efficient LSM-tree tombstone handling
MEMORY
DISK
L0
L1
FILE SIZE
KEY A
KEY A
GARBAGE
COLLECTION
Efficient LSM-tree tombstone handling
■ Piggyback on incremental compaction, to bound temporary disk
space.
■ Triggers (avoids write amplification issues):
■ File staleness
■ Tombstone density threshold
■ Available in Incremental Compaction Strategy (ICS) by default.
Thank You
Stay in Touch
Raphael Carvalho
raphaelsc@scylladb.com
@raphael_scarv
raphaelsc

More Related Content

What's hot

Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryScyllaDB
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsJavier González
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringAnne Nicolas
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InSage Weil
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringShapeBlue
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseenissoz
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanVerverica
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashCeph Community
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesSean Chittenden
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraQuentin Ambard
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilDatabricks
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageKernel TLV
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersCloudera, Inc.
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberFlink Forward
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overviewGluster.org
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...DataWorks Summit/Hadoop Summit
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 

What's hot (20)

Crimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent MemoryCrimson: Ceph for the Age of NVMe and Persistent Memory
Crimson: Ceph for the Age of NVMe and Persistent Memory
 
Optimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDsOptimizing RocksDB for Open-Channel SSDs
Optimizing RocksDB for Open-Channel SSDs
 
Kernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uringKernel Recipes 2019 - Faster IO through io_uring
Kernel Recipes 2019 - Faster IO through io_uring
 
BlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year InBlueStore, A New Storage Backend for Ceph, One Year In
BlueStore, A New Storage Backend for Ceph, One Year In
 
Boosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uringBoosting I/O Performance with KVM io_uring
Boosting I/O Performance with KVM io_uring
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
AF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on FlashAF Ceph: Ceph Performance Analysis and Improvement on Flash
AF Ceph: Ceph Performance Analysis and Improvement on Flash
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
 
Hive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas PatilHive Bucketing in Apache Spark with Tejas Patil
Hive Bucketing in Apache Spark with Tejas Patil
 
The Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast StorageThe Linux Block Layer - Built for Fast Storage
The Linux Block Layer - Built for Fast Storage
 
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation BuffersHBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
HBase HUG Presentation: Avoiding Full GCs with MemStore-Local Allocation Buffers
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, UberDemystifying flink memory allocation and tuning - Roshan Naik, Uber
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Netflix Data Pipeline With Kafka
Netflix Data Pipeline With KafkaNetflix Data Pipeline With Kafka
Netflix Data Pipeline With Kafka
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 

Similar to Scaling ScyllaDB Storage Engine with State-of-Art Compaction

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Amazon Web Services
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...In-Memory Computing Summit
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_clusterPrabhat gangwar
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Databricks
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compactionMIJIN AN
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar storesIstvan Szukacs
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Amazon Web Services
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftAmazon Web Services
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintScyllaDB
 
DB2 and Storage Management
DB2 and Storage ManagementDB2 and Storage Management
DB2 and Storage ManagementCraig Mullins
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataAmazon Web Services
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftAmazon Web Services
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Web Services
 
Learn about log structured file system
Learn about log structured file systemLearn about log structured file system
Learn about log structured file systemGang He
 

Similar to Scaling ScyllaDB Storage Engine with State-of-Art Compaction (20)

Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
Best practices for Data warehousing with Amazon Redshift - AWS PS Summit Canb...
 
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
IMC Summit 2016 Breakout - Andy Pavlo - What Non-Volatile Memory Means for th...
 
Oracle real application_cluster
Oracle real application_clusterOracle real application_cluster
Oracle real application_cluster
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
RocksDB compaction
RocksDB compactionRocksDB compaction
RocksDB compaction
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Optimizing columnar stores
Optimizing columnar storesOptimizing columnar stores
Optimizing columnar stores
 
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
Best Practices for Data Warehousing with Amazon Redshift | AWS Public Sector ...
 
SRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon RedshiftSRV405 Deep Dive on Amazon Redshift
SRV405 Deep Dive on Amazon Redshift
 
How Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage FootprintHow Incremental Compaction Reduces Your Storage Footprint
How Incremental Compaction Reduces Your Storage Footprint
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
DB2 and Storage Management
DB2 and Storage ManagementDB2 and Storage Management
DB2 and Storage Management
 
Data Warehousing in the Era of Big Data
Data Warehousing in the Era of Big DataData Warehousing in the Era of Big Data
Data Warehousing in the Era of Big Data
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Data Warehousing with Amazon Redshift
Data Warehousing with Amazon RedshiftData Warehousing with Amazon Redshift
Data Warehousing with Amazon Redshift
 
Deep Dive on Amazon Redshift
Deep Dive on Amazon RedshiftDeep Dive on Amazon Redshift
Deep Dive on Amazon Redshift
 
Amazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech TalksAmazon Redshift Deep Dive - February Online Tech Talks
Amazon Redshift Deep Dive - February Online Tech Talks
 
Learn about log structured file system
Learn about log structured file systemLearn about log structured file system
Learn about log structured file system
 

More from ScyllaDB

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLScyllaDB
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasScyllaDB
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...ScyllaDB
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityScyllaDB
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptxScyllaDB
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDBScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationScyllaDB
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsScyllaDB
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesScyllaDB
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsScyllaDB
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101ScyllaDB
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesScyllaDB
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesScyllaDB
 

More from ScyllaDB (20)

Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
What Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQLWhat Developers Need to Unlearn for High Performance NoSQL
What Developers Need to Unlearn for High Performance NoSQL
 
Low Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & PitfallsLow Latency at Extreme Scale: Proven Practices & Pitfalls
Low Latency at Extreme Scale: Proven Practices & Pitfalls
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDBBeyond Linear Scaling: A New Path for Performance with ScyllaDB
Beyond Linear Scaling: A New Path for Performance with ScyllaDB
 
Dissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance DilemmasDissecting Real-World Database Performance Dilemmas
Dissecting Real-World Database Performance Dilemmas
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Replacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDBReplacing Your Cache with ScyllaDB
Replacing Your Cache with ScyllaDB
 
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear ScalabilityPowering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
Powering Real-Time Apps with ScyllaDB_ Low Latency & Linear Scalability
 
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx7 Reasons Not to Put an External Cache in Front of Your Database.pptx
7 Reasons Not to Put an External Cache in Front of Your Database.pptx
 
Getting the most out of ScyllaDB
Getting the most out of ScyllaDBGetting the most out of ScyllaDB
Getting the most out of ScyllaDB
 
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a MigrationNoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
NoSQL Database Migration Masterclass - Session 2: The Anatomy of a Migration
 
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration LogisticsNoSQL Database Migration Masterclass - Session 3: Migration Logistics
NoSQL Database Migration Masterclass - Session 3: Migration Logistics
 
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and ChallengesNoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
NoSQL Data Migration Masterclass - Session 1 Migration Strategies and Challenges
 
ScyllaDB Virtual Workshop
ScyllaDB Virtual WorkshopScyllaDB Virtual Workshop
ScyllaDB Virtual Workshop
 
DBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & TradeoffsDBaaS in the Real World: Risks, Rewards & Tradeoffs
DBaaS in the Real World: Risks, Rewards & Tradeoffs
 
Build Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDBBuild Low-Latency Applications in Rust on ScyllaDB
Build Low-Latency Applications in Rust on ScyllaDB
 
NoSQL Data Modeling 101
NoSQL Data Modeling 101NoSQL Data Modeling 101
NoSQL Data Modeling 101
 
Top NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling MistakesTop NoSQL Data Modeling Mistakes
Top NoSQL Data Modeling Mistakes
 
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & PrinciplesNoSQL Data Modeling Foundations — Introducing Concepts & Principles
NoSQL Data Modeling Foundations — Introducing Concepts & Principles
 

Recently uploaded

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

Scaling ScyllaDB Storage Engine with State-of-Art Compaction

  • 1. Squeezing the Most Out of the Storage Engine with State of the Art Compaction Raphael S. Carvalho, Software Engineer
  • 2. Raphael Carvalho ■ Syslinux, suite of bootloaders ■ OSv, an operating system for the cloud ■ Seastar, the framework powering ScyllaDB ■ ScyllaDB, the best database in the world
  • 3. “In order to make good use of the computer resources, one must organize files intelligently, making the retrieval process efficient.” The Ubiquitous B-Tree paper, 1979
  • 4. ■ Short & precise definition from aforementioned paper: ■ “allow users to store, update, and recall” Storage Engines
  • 5. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines
  • 6. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2)
  • 7. ■ Two approaches for handling updates ■ In-place structure (B+-tree) Storage Engines (k1,v1)(k2,v2) (k1, v3)
  • 8. ■ Two approaches for handling updates ■ In-place structure (ex: B+-tree) Storage Engines (k1,v3)(k2,v2)
  • 9. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines
  • 10. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2)
  • 11. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 12. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 13. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 14. ■ Two approaches for handling updates ■ Out-of-place structure (ex: LSM-tree) Storage Engines (k1,v1)(k2,v2) (k1,v3)
  • 15. ■ Out-of-place update isn’t new. ■ 1976 paper “Differential files” shows its applicability in the real world ■ “shown to be an efficient method for storing a large and changing database” Storage Engines
  • 16. ■ A good analogy is presented in the paper Storage Engines
  • 17. ■ The Log-Structured Merge-Tree (LSM-Tree) paper is then published in 1996 Storage Engines
  • 19. Storage Engines THE LSM-TREE C1 is T times bigger than C0. C(K) is T times bigger than C(K-1). C0 C1 C2 Ck MEMORY DISK merge sort
  • 20. ■ Immutability of LSM tree components (ex: SSTables) simplifies ■ Concurrency control ■ Recovery Storage Engines
  • 21. Query on LSM Tree (k1, v2) (k1, v1) MEMORY DISK Query k1
  • 22. ■ A compaction policy (or strategy) defines the shape of LSM tree ■ Any policy is composed of 4 primitives ■ Trigger (when to compact) ■ File picking policy (which data to compact) ■ Granularity (how much data at once) ■ Layout (how data is laid out) LSM-tree compaction policy
  • 23. Pure Leveled in Original LSM Design ONLY 1 COMPONENT PER LEVEL! C0 C1 C2 Ck MEMORY DISK merge sort
  • 24. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 25. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 26. Flexible Leveled in Modern LSM Design MEMORY DISK L0 L1
  • 27. ■ Partitions the LSM-tree components into (usually fixed-size) fragments ■ Subset of a level can be merged into the next one (partial merge) ■ Bounds: ■ compaction operation time ■ temporary disk space during compaction lifetime Partitioning Optimization for Leveled
  • 28. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  • 29. Partitioning Optimization for Leveled MEMORY DISK L1 L2 KEY RANGE SST SST SST SST SST SST
  • 30. Leveled Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(T * L) O(T + 1) ------ = ~1.1 T
  • 31. Stepped-Merge Algorithm ■ 1997 paper Incremental organization for data recording and warehousing -> a new approach to LSM tree layout ■ “Our goal is to design a technique that supports both insertion and queries with reasonable efficiency, and without the delays of periodic batch processing.” ■ Gives birth to the tiered compaction policy
  • 35. Tiered Policy - Cost Analysis ■ Let T be the size ratio between adjacent levels ■ Let L be the number of levels for a given LSM tree ■ Write amplification: ■ Space amplification: O(L) O(T * L)
  • 36. Now ScyllaDB journey begins The database inherited all the LSM-tree improvements described so far… But they weren’t enough
  • 37. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST
  • 38. Tiered - Temporary Space Problem! MEMORY DISK L0 L1 FILE SIZE SST SST SST 100% TEMP SPACE OVERHEAD
  • 39. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T
  • 40. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S S T S S T S
  • 41. Partitioning Optimization for Tiered MEMORY DISK L0 L1 FILE SIZE S T S T S
  • 42. Tiered Policy - Partitioning Optimization ■ Bounds temporary space overhead significantly ■ Allows disk space usage from 50% to 80% and beyond. ■ Available in ScyllaDB as Incremental Compaction Strategy (ICS)
  • 43. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED
  • 44. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED
  • 45. LSM tree - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE LEVELED PURE TIERED
  • 46. But the world is not only black and white There are shades of gray in between…
  • 47. Hybrid LSM-tree data layout ■ Largest level is space optimized ■ Other levels are write optimized ■ Addresses O(K) space amplification in tiered in overwrite workloads ■ Where K = number of components per level
  • 48. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL
  • 49. Hybrid LSM-tree data layout L1 L2 FILE SIZE L0 SST SST WRITE OPTIMIZED LEVELS SPACE OPTIMIZED LEVEL SST
  • 50. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  • 51. Hybrid LSM - Efficiency Space SPACE OPTIMIZED WRITE OPTIMIZED PURE TIERED PURE LEVELED HYBRID
  • 52. Hybrid LSM-tree data layout ■ Reduces space amplification in overwrite-intensive workloads ■ = less space amplification ■ = increased storage density per node ■ = more money in your pocket. ■ Available as space amplification goal (SAG) option of Incremental Compaction Strategy.
  • 54. LSM-tree & tombstones MEMORY DISK L0 L1 FILE SIZE KEY A KEY A TOMBSTONE
  • 56. Suboptimal LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  • 57. Efficient LSM-tree tombstone handling MEMORY DISK L0 L1 FILE SIZE KEY A KEY A GARBAGE COLLECTION
  • 58. Efficient LSM-tree tombstone handling ■ Piggyback on incremental compaction, to bound temporary disk space. ■ Triggers (avoids write amplification issues): ■ File staleness ■ Tombstone density threshold ■ Available in Incremental Compaction Strategy (ICS) by default.
  • 59. Thank You Stay in Touch Raphael Carvalho raphaelsc@scylladb.com @raphael_scarv raphaelsc