Having a database that’s capable of high concurrency is one thing, but actually tapping all that potential concurrency is another. Fortunately, Optimizely Engineering has developed practical strategies that can help other teams.
Learn how Optimizely Engineering takes full advantage of the high concurrency that’s possible with their NoSQL database, ScyllaDB – while also guaranteeing correctness and protecting the quality of service. Brian Taylor, Principal Software Engineer, will offer a technical deep dive on:
- Understanding concurrency and its impact on throughput and latency
- Closed loop load testing, open loop load testing & the Universal Scaling Law
- The type of load testing you should be performing for capacity planning
- How to identify the region where your database can make the best use of concurrency
- Strategies for optimizing sound concurrency based on your data dependencies
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Optimizely Safely Maximizes Database Concurrency
1. How Optimizely (Safely)
Maximizes Database
Concurrency
Brian Taylor, Principal Software Engineer at Optimizely
Felipe Mendes, Solution Architect at ScyllaDB
3. How Optimizely (Safely)
Maximizes Database
Concurrency
Brian Taylor, Principal Software Engineer at Optimizely
Felipe Mendes, Solution Architect at ScyllaDB
4. + For data-intensive applications that require high
throughput and predictable low latencies
+ Close-to-the-metal design takes full advantage of
modern infrastructure
+ >5x higher throughput
+ >20x lower latency
+ >75% TCO savings
+ Compatible with Apache Cassandra and Amazon
DynamoDB
+ DBaaS/Cloud, Enterprise and Open Source
solutions
The Database for Gamechangers
4
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor
5. 5
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations
Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine
6. Introductions
Felipe Mendes, Solution Architect at ScyllaDB
+ Years of experience with Linux and other distributed systems
+ An open source enthusiast
+ Passion towards helping businesses to achieve their most challenging
goals
Brian Taylor, Principal Software Engineer at Optimizely
+ I am married with 3 young children
+ I have created 2 programming languages and 2 databases for
legitimate business reasons
+ I love discovering some property in the solution space that, when
maintained, makes everything simpler
7. Agenda
■ ScyllaDB Loves Concurrency
• To a point
• How to keep the wheels on the
bus
■ Mommy, where does concurrency
come from?
• Easy way
• Good way
10. Why Does Concurrency Matter?
S
R
Rtotal
Reff = Rtotal /4 = (R + 3S) / 4
It lets us hide round trip latency!
11. Throughput - X - measured in things per second (what you probably care about)
Concurrency - N - number of independent requests happening at the same time (the tool of this talk)
Service Time - S - how long ScyllaDB says a thing took
Request Time - R - how long something takes from the client’s perspective
One at a time
R = S + round trip time
X = 1 / R (pure sequential)
With concurrency
N / R (in the linear region)
X maxX (in the saturation region)
💩 (in the retrograde region)
Definitions
12. User
User
Classic Closed Loop Load Testing
SUT
User
Request
Response
● Users can have up to one request in flight at any
time
● Users must receive the response to their request
before issuing the next request
● Not directly useful for modern capacity planning
13. Universal Scaling Law
Generalization of Amdahl’s Law discovered by
Dr. Neil Gunther. As number of users (N)
increases, the system throughput (X) will:
● Enjoy a period of near linear scaling
● Eventually saturate some resource such
that increasing N doesn’t increase X. This
defines maxX
● Possibly encounter a coordination cost
that drives down X with further increasing
N
Saturation
Region
Linear
Region
ScyllaDB-bench, 3x i3.large,
average throughput vs
concurrency
Retrograde
Region
maxX
14. Modern Open Loop Load Testing
Constant
Throughput
Source
SUT
Does not model users or think times. Instead models load as
a constant throughput load source. Good match for capacity
planning internet connected systems where we typically
know requests per second but don’t really care how many
users are behind that load.
● The start time of every request in the test is
pre-ordained and does not depend on how the SUT is
handling previous requests
● Concurrency is theoretically unbounded
https://github.com/optimizely/scylla-bench-crate
15. Closed-loop testing: choose concurrency (N)
Open-loop testing: choose throughput (X)
● X in the linear region will imply a bounded N (and R
is very stable)
● X in the saturation region can have unbounded N
and R is very chaotic
● When X exceeds saturation, N and R are undefined
because the system will not keep up
● The USL is not a single valued function of X: This
has interesting implications as X approaches
saturation
Relating the Testing Models
16. Network Execution
S
R
Q
Q Network Q
Q
4 kops/s
R99 = 1.487 ms
2 <= N <= 22
40 kops/s
R99 = 2.527 ms
23 <= N <= 255
S99 = 0.634 ms
Independent of
kops/s
Linear Region
17. Linear Region
Throughput is directly proportional to
concurrency
● The size of the cluster (in shards)
and its aggregate SSD throughput
will determine how large the linear
region is
● You should engineering your system
to take full advantage of the linear
region
SSD Throughput
Allocation
4 kops/s
40 kops/s
19. Saturation Region
Throughput is approximately constant,
regardless of concurrency
● At this point, assuming a well tuned cluster
and workload, we are writing to disk as fast
as the disks can take it
● Concurrency is no longer a mechanism to
increase throughput and is now becoming
a risk to stability
● You should engineer your system to stay
out of the saturation region
Saturation
Region
Linear
Region
Retrograde
Region
100 kops/s
21. Retrograde Region
Network Execution
S
R
Q
Q Network Q
Q
100 kops/s
R99 = 55.8 s
75 <= N <= 4096
Once something
bounces us into the
retrograde region,
S99 becomes 1000x
worse than linear
22. Retrograde Region
Increasing concurrency now decreases
throughput. A system that enters this region is
likely to get stuck here until demand declines
● The harder we push, the less throughput
we get and the more demand builds which
makes us want to push harder
● “Pushing harder” consumes more client
resources (threads, futures, ScyllaDB
driver state). The road to hell will terminate
with an OOM unless there’s some other
limiting factor
Saturation
Region
Linear
Region
Retrograde
Region
100 kops/s
Road to hell
23. Stay in the linear region and you’ll enjoy consistent latencies and bounded concurrency.
Stray into saturation and you’re likely to get stuck in the retrograde region until load
subsides.
● Scale ScyllaDB such that you’re “always” going to be operating in the linear region
for your expected load
● Design concurrency limiting mechanisms that keep you out of the retrograde region
during unexpected spikes in load
● If you have work to do and can do it in the linear region: DO IT
What Have We Learned?
25. These are the boring code wonk answers.
● Threads
○ Cheap end: 8kb per go-routine. Low thousands is reasonable
○ Expensive end: 1MB per java thread. Low hundreds is reasonable
● Reactors
○ Rust tokio, java futures, seastar, nodejs callbacks: typically <1kb / instance.
Tens of thousands is reasonable
● Nodes
○ $$ limited
Mechanisms of Concurrency
26. Data dependency is the mother of sound concurrency
● Easy: No dependency: logging facts at independent keys. Write only
● Medium: Partitionable dependency. As long as we process each independent
streams sequentially, everything will be fine: maintain latest state at a single key
● Hard: Arbitrary “happens-before” relationships: add a relationship between two
nodes in a graph
Mother of Concurrency
27. ● Command: Represents an atomic unit of work. Contains IOPs. Always concludes
with a write, may contain reads.
● IOP: IO Operation. A unit of work for the database.
● Batch: A group of IOPs that may be executed concurrently. Write IOPs within a
batch may literally be combined into a batch operation to ScyllaDB. Batches
execute sequentially with other batches
● Slot: A cubby for data. Has a name. Can be read or written (partition + clustering key
in ScyllaDB)
● Concurrency strategy: How we group IOPs into batches such that the final slot state
is consistent with commands having all been executed sequentially
Definitions
28. No Dependency
Write A
Command 1 Command 2 Command 4 Command 5
Batch 1
Batch 2
Command 3
Write B Write C
Write D Write E
When commands contain no reads and always write to different slots there can be no data
dependency. The decision about when to switch from batch 1 to 2 can be arbitrary, or driven
by a desire to minimize latency, or to work within the ScyllaDB batch size constraint
29. For a read/modify/write (RMW) operation to yield correct results, reads must be able to
observe the writes of prior RMWs.
Most streaming platforms (storm, flink, kafka-streams, spark) trivially solve this by
partitioning commands into guaranteed independent streams. This means that:
● Every command has a happens-before relationship with every following command
for the partition key
● Cross command concurrency is impossible within a partition
● Concurrency is limited by the cardinality of the partition key
Read/Modify/Write and “Happens Before”
30. Command 1
/ Tenant B
Partitioned Concurrency
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
The work for each tenant is executed strictly in the order it was received. This guarantees that reads will always
see prior writes but misses opportunities for greater concurrency by being ignorant of non-interacting slot usage
Batch 3
Batch 4 Write B
Read D
Write D
Command 1
/ Tenant A
Happens before Happens before
31. ● Reads of a slot must be able to observe any writes to that slot that came before
them.
○ Writes create a happens-before relationship with any reads that follow them
● The final value in a slot must reflect last write in the sequence
○ Writes create a happens-before relationship with any writes that follow them
Golden Rules of Data Dependency
32. Command 1
/ Tenant B
Golden Rule Happens Before
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
By examining slot usage and applying the golden rules we can eliminate a batch and get
concurrency within a partition
Batch 3 Write B
Read D
Write D
Command 1
/ Tenant A
Happens before
33. Each of the golden rules implies a simplification rule that we can use to further compress
command execution
● Reads of a slot must be able to observe any writes to that slot that came before
them.
○ Reads do not have to observe prior writes by literally reading the database
● The final value in a slot must reflect last write in the sequence
○ If a prior write is not read from the database, it can be omitted as long as the
final write happens
Simplification Rules of Data Dependency
34. Command 1
/ Tenant B
Data Dependency Simplification
Read A
Batch 1
Batch 2
Command 2
/ Tenant A
Write B
Read C
Reads can directly read prior writes. Overwritten writes can be skipped.
Write B
Read D
Write D
Command 1
/ Tenant A
Happens before
35. Data dependency is the mother of sound concurrency. If you can find enough sound
concurrency in your problem then you can exploit the full linear region of ScyllaDB’s
awesome concurrency.
● Case 1: 64 cores, thread and partition concurrency. Maximum throughput about 8
kcommands/s. Very sensitive to “data-shape” aka partition cardinality.
● Case 2: 15 cores, reactor and happens-before concurrency. Maximum throughput
about 30 kcommands/s. Insensitive to most practical “data-shape” issues.
What’s the Point?