Making the Most Out of ScyllaDB's Awesome Concurrency at Optimizely
1. Making the Most Out of
ScyllaDB’s Awesome
Concurrency at Optimizely
Brian Taylor, Principal Engineer
2. Brian Taylor
■ I am married with 3 young children
■ I have created 2 programming languages and 2
databases for legitimate business reasons
■ I love discovering some property in the solution space
that, when maintained, makes everything simpler
3. ■ ScyllaDB Loves Concurrency
■ To a point
■ How to keep the wheels on the bus
■ Mommy, where does concurrency come from?
■ Easy way
■ Good way
Presentation Agenda
6. Why Does Concurrency Matter?
S
R
Rtotal
Reff = Rtotal /4 = (R + 3S) / 4
It lets us hide round trip latency!
7. Throughput - X - measured in things per second (what you probably care about)
Concurrency - N - number of independent requests happening at the same time (the tool of this talk)
Service Time - S - how long ScyllaDB says a thing took
Request Time - R - how long something takes from the client’s perspective
R = S + round trip time
X = 1 / R (pure sequential)
N / R (in the linear region)
X maxX (in the saturation region)
💩 (in the retrograde region)
Definitions
8. User
User
Classic Closed Loop Load Testing
SUT
User
Request
Response
■ Users can have up to one request in flight at any time
■ Users must receive the response to their request before
issuing the next request
■ Not directly useful for modern capacity planning
9. Universal Scaling Law
Generalization of Amdahl’s Law discovered by Dr. Neil
Gunther. As number of users (N) increases, the
system throughput (X) will:
■ Enjoy a period of near linear scaling
■ Eventually saturate some resource such that
increasing N doesn’t increase X. This defines
maxX
■ Possibly encounter a coordination cost that
drives down X with further increasing N
Saturation
Region
Linear
Region
ScyllaDB-bench, 3x i3.large,
average throughput vs
concurrency
Retrograde
Region
maxX
10. Modern Open Loop Load Testing
Constant
Throughput
Source
SUT
Does not model users or think times. Instead models load as a
constant throughput load source. Good match for capacity planning
internet connected systems where we typically know requests per
second but don’t really care how many users are behind that load.
■ The start time of every request in the test is pre-ordained and
does not depend on how the SUT is handling previous requests
■ Concurrency is theoretically unbounded
https://github.com/optimizely/scylla-bench-crate
11. Closed-loop testing: choose concurrency (N)
Open-loop testing: choose throughput (X)
■ X in the linear region will imply a bounded N (and R is very
stable)
■ X in the saturation region can have unbounded N and R is
very chaotic
■ When X exceeds saturation, N and R are undefined
because the system will not keep up
■ The USL is not a single valued function of X: This has
interesting implications as X approaches saturation
Relating the Testing Models
12. Network Execution
S
R
Q
Q Network Q
Q
4 kops/s
R99 = 1.487 ms
2 <= N <= 22
40 kops/s
R99 = 2.527 ms
23 <= N <= 255
S99 = 0.634 ms
Independent of
kops/s
Linear Region
13. Linear Region
Throughput is directly proportional to
concurrency
■ The size of the cluster (in shards) and its
aggregate SSD throughput will determine
how large the linear region is
■ You should engineering your system to
take full advantage of the linear region
SSD Throughput
Allocation
4 kops/s
40 kops/s
15. Saturation Region
Throughput is approximately constant, regardless of
concurrency
■ At this point, assuming a well tuned cluster and
workload, we are writing to disk as fast as the
disks can take it
■ Concurrency is no longer a mechanism to
increase throughput and is now becoming a risk
to stability
■ You should engineer your system to stay out of
the saturation region
Saturation
Region
Linear
Region
Retrograde
Region
100 kops/s
17. Retrograde Region
Network Execution
S
R
Q
Q Network Q
Q
100 kops/s
R99 = 55.8 s
75 <= N <= 4096
Once something
bounces us into the
retrograde region,
S99 becomes 1000x
worse than linear
18. Retrograde Region
Increasing concurrency now decreases throughput. A
system that enters this region is likely to get stuck
here until demand declines
■ The harder we push, the less throughput we get
and the more demand builds which makes us
want to push harder
■ “Pushing harder” consumes more client
resources (threads, futures, ScyllaDB driver
state). The road to hell will terminate with an
OOM unless there’s some other limiting factor
Saturation
Region
Linear
Region
Retrograde
Region
100 kops/s
Road to hell
19. Stay in the linear region and you’ll enjoy consistent latencies and bounded concurrency. Stray into
saturation and you’re likely to get stuck in the retrograde region until load subsides.
■ Scale ScyllaDB such that you’re “always” going to be operating in the linear region for your
expected load
■ Design concurrency limiting mechanisms that keep you out of the retrograde region during
unexpected spikes in load
■ If you have work to do and can do it in the linear region: DO IT
What Have We Learned?
21. These are the boring code wonk answers.
■ Threads
■ Cheap end: 8kb per go-routine. Low thousands is reasonable
■ Expensive end: 1MB per java thread. Low hundreds is reasonable
■ Reactors
■ Rust tokio, java futures, seastar, nodejs callbacks: typically <1kb / instance. Tens of
thousands is reasonable
■ Nodes
■ $$ limited
Mechanisms of Concurrency
22. Data dependency is the mother of sound concurrency
■ Easy: No dependency: logging facts at independent keys. Write only
■ Medium: Partitionable dependency. As long as we process each independent streams
sequentially, everything will be fine: maintain latest state at a single key
■ Hard: Arbitrary “happens-before” relationships: add a relationship between two nodes in a
graph
Mother of Concurrency
23. ■ Command: Represents an atomic unit of work. Contains IOPs. Always concludes with a write,
may contain reads.
■ IOP: IO Operation. A unit of work for the database.
■ Batch: A group of IOPs that may be executed concurrently. Write IOPs within a batch may
literally be combined into a batch operation to ScyllaDB. Batches execute sequentially with
other batches
■ Slot: A cubby for data. Has a name. Can be read or written (partition + clustering key in
ScyllaDB)
■ Concurrency strategy: How we group IOPs into batches such that the final slot state is
consistent with commands having all been executed sequentially
Definitions
24. No Dependency
Write A
Command 1 Command 2 Command 4 Command 5
Batch 1
Batch 2
Command 3
Write B Write C
Write D Write E
When commands contain no reads and always write to different slots there can be no data
dependency. The decision about when to switch from batch 1 to 2 can be arbitrary, or driven by
a desire to minimize latency, or to work within the ScyllaDB batch size constraint
25. For a read/modify/write (RMW) operation to yield correct results, reads must be able to observe the
writes of prior RMWs.
Most streaming platforms (storm, flink, kafka-streams, spark) trivially solve this by partitioning
commands into guaranteed independent streams. This means that:
■ Every command has a happens-before relationship with every following command for the
partition key
■ Cross command concurrency is impossible within a partition
■ Concurrency is limited by the cardinality of the partition key
Read/Modify/Write and “Happens Before”
26. Command 1
/ Tenant B
Partitioned Concurrency
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
The work for each tenant is executed strictly in the order it was received. This guarantees that reads will always
see prior writes but misses opportunities for greater concurrency by being ignorant of non-interacting slot usage
Batch 3
Batch 4 Write B
Read D
Write D
Command 1
/ Tenant A
Happens before Happens before
27. ■ Reads of a slot must be able to observe any writes to that slot that came before them.
■ Writes create a happens-before relationship with any reads that follow them
■ The final value in a slot must reflect last write in the sequence
■ Writes create a happens-before relationship with any writes that follow them
Golden Rules of Data Dependency
28. Command 1
/ Tenant B
Golden Rule Happens Before
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
By examining slot usage and applying the golden rules we can eliminate a batch and get concurrency
within a partition
Batch 3 Write B
Read D
Write D
Command 1
/ Tenant A
Happens before
29. Each of the golden rules implies a simplification rule that we can use to further compress command
execution
■ Reads of a slot must be able to observe any writes to that slot that came before them.
■ Reads do not have to observe prior writes by literally reading the database
■ The final value in a slot must reflect last write in the sequence
■ If a prior write is not read from the database, it can be omitted as long as the final write
happens
Simplification Rules of Data Dependency
30. Command 1
/ Tenant B
Data Dependency Simplification
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
Reads can directly read prior writes. Overwritten writes can be skipped.
Write B
Read D
Write D
Command 1
/ Tenant A
Happens before
31. Data dependency is the mother of sound concurrency. If you can find enough sound concurrency in
your problem then you can exploit the full linear region of ScyllaDB’s awesome concurrency.
■ Case 1: 64 cores, thread and partition concurrency. Maximum throughput about 8
kcommands/s. Very sensitive to “data-shape” aka partition cardinality.
■ Case 2: 15 cores, reactor and happens-before concurrency. Maximum throughput about 30
kcommands/s. Insensitive to most practical “data-shape” issues.
What’s the Point?
32. Thank You
Stay in Touch
Brian Taylor
brian.taylor@optimizely.com
@netguy204
netguy204
www.linkedin.com/in/brian-ttaylor