Should I use more, smaller instances, or fewer, bigger instances? Is 1Gbps enough for my network cards? Should I use batches? Can I have a collection with 3GB in size? Those are just some of the many questions we see users asking themselves on a daily basis over our mailing list, slack, and corporate ticket requests. In this talk, I will explore the answers to these common questions and help you make sure that your deployment is up to the highest standards.
How to Troubleshoot Apps for the Modern Connected Worker
How to be Successful with Scylla
1. How to be successful
with Scylla
Glauber Costa, VP Field Engineering
2. Presenter
Glauber Costa, VP Field Engineering
Glauber Costa is VP of Field Engineering at ScyllaDB. He shares
his time between the engineering department working on
upcoming Scylla features and helping customers succeed.
Before ScyllaDB, Glauber worked with Virtualization in the Linux
Kernel for 10 years, with contributions ranging from the Xen
Hypervisor to all sorts of guest functionality and containers.
4. Welcome Cassandra users
What to remember, what to forget?
■ Remember: Data model and consistency issues.
■ Forget: Operational best practices
● Users insist on tuning the system in the exact way as before
● Many times a change was done to work around an issue.
● Issue may not exist in Scylla
■ Example: compactions being too slow, compactions being too fast, etc.
5. Corollary: comparing databases
Wrong way to compare databases:
■ I will now run Scylla the same way I ran Cassandra for the past 5 years
● It will work.
● It will be suboptimal.
Right way to compare databases:
■ This is the work that I need to do, and this is how much it costs me
● Run each offering in their operational sweet spot.
8. Use an updated version
■ Policy is that only two versions receive updates.
■ For enterprise:
● 2019.1 and 2018.1 supported
■ For Open Source: 3.1 is released:
● 3.0 and 3.1 are supported
● 2.3 is EOLd
■ It’s fine to be conservative, but:
● Running 3.0 is conservative
● Running 2.3 is dangerous.
■ Patchlevel updates are very safe, do them.
● but don’t stream between minor versions.
10. Hardware Selection
What is your bottleneck?
■ CPU
● Understand the per-core capacity of your workload
■ Storage
● Latency: NVMe
● Throughput: SSD
● Forget HDDs.
■ Network
● Forget anything below 1Gbps.
11. Storage layout
How to best organize many disks
■ RAID0
● Database is replicated, why mirror disks?
■ LVM (in striping mode)
Split commitlog and data?
■ Generally not worth it
● Can maybe help with overwrite heavy workloads
■ If you have super fast disks lying around that’s fine
12. Hardware Sizing
If you knew you’d need 1,000 USD in a trip, would you take 1,000 USD?
■ See Eyal’s presentation on how to size
● But then remember to add spare!
● Test your performance under bootstrap and decommission
● Make sure you know how long does it take to bootstrap and decommission
■ Asking us and estimates are not good enough: very data-model dependent
14. Packing the iron
In which situations should I run more, smaller nodes?
■ Never
Time to ingest. Dataset grows 2x as machine grows 2x.
15. Packing the iron
In which situations should I run more, smaller nodes?
■ Never - if choice is the same amount of resources
■ In practice, ok to smooth out expansion
32 cores
32 cores
32 cores
16 cores
16 cores
16 cores
16 cores
total 64
cores.
total 48 cores.
expansion
expansion
17. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
Rack 1
Rack 2
Rack 3
Replica 1
Replica2
Replica 3
Data
18. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
rack failure:
1 copy gone.
QUORUM maintained.
Rack 1
Rack 2
Rack 3
Replica 1
Replica2
Replica 3
Data
19. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
■ RF=3 and 2 Racks:
● Scylla will place a full copy in each rack and a third copy spread in both racks
■ Rack failure can lead to decreased HA: needed quorum is down
■ Rack failure can lead to data loss: two copies are lost.
Rack 1
Rack 2
Replica 1
Replica 2
Replica 3
Data
20. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
■ RF=3 and 2 Racks:
● Scylla will place a full copy in each rack and a third copy spread in both racks
■ Rack failure can lead to decreased HA: needed quorum is down
■ Rack failure can lead to data loss: two copies are lost.
Rack failure, Scenario 1:
QUORUM maintained
Rack 1
Rack 2
Replica 1
Replica 2
Replica 3
Data
21. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
■ RF=3 and 2 Racks:
● Scylla will place a full copy in each rack and a third copy spread in both racks
■ Rack failure can lead to decreased HA: needed quorum is down
■ Rack failure can lead to data loss: two copies are lost.
Rack 1
Rack 2
Replica 1
Replica 2
Replica 3
Data
Rack failure, Scenario 2:
You may have lost your
job.
22. Rack awareness
Run as many racks as you have replicas.
■ RF=3 and 3 Racks:
● Scylla will place a full copy in each rack. Perfect balancing, perfect resiliency
■ RF=3 and 2 Racks:
● Scylla will place a full copy in each rack and a third copy spread in both racks
■ Rack failure can lead to decreased HA: needed quorum is down
■ Rack failure can lead to data loss: two copies are lost.
■ All conditioned to NetworkTopologyStrategy
■ When expanding the cluster, add 3 nodes
23. Run the setup tool
scylla_setup is constantly updated with knowledge of what’s important
■ If I had to choose one configuration to always enforce:
● SET_NIC_AND_DISKS=yes
● SET_NIC in older versions
■ What does that do:
Scylla
CPU time
Linux
SoftIRQ
time
Scylla
CPU time
Scylla
CPU time OR
24. Run the setup tool
scylla_setup is constantly updated with knowledge of what’s important
■ If I had to choose one configuration to always enforce:
● SET_NIC_AND_DISKS=yes
● SET_NIC in older versions
■ Not yet in setup:
● Taking timestamps
■ TSC clocksource: 26ns
■ Xen clocksource: 100ns
$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
xen
27. Partition sizing
Large partitions per se are (almost) not a problem anymore.
■ SELECT * from table where pk = ? and ck = ? OK
■ SELECT * from table where pk = ? That’s the issue
28. Partition distribution
Bad partition distribution are not a unique Scylla issue
■ But they can show up sooner in Scylla due to shards.
● That’s a good thing
29. Understand the caching basics
■ Cache is LRU on rows
● Use BYPASS CACHE for analytical workloads
■ Careful with moving range queries
● SELECT * from time_series where pk = ? and time >= past and time >= now; BAD
● SELECT * from time_series where pk = ? and time >= past;
GOOD
30. Control parallelism
■ Low parallelism hurts Scylla
● Fewer units will be working, database will not be efficient
■ Is there such a thing as too high?
31. Control parallelism
■ Infinite parallelism is asking for trouble
● I don’t mean ∞
● 4 trillion concurrent requests is infinite in the real world
■ No need to guess:
● C = T x L
● Example: 200,000 requests/s at 10ms average latency:
■ C = 200,000 * 0.001
■ C = 2,000.
■ Driver settings:
● Number of connections x maximum requests per connection
● Remaining requests will be queued in the application side.
32. Be very careful with retries
■ If client timeout < server timeout:
● effectively increase parallelism
● Know the difference between reads, range reads, writes, etc.
■ Be very careful with speculative retry
● Remember it will be retried in the same replica set that just took long to respond
● Now it will take even longer because of the extra request
■ And more speculative retries.
33. Batching writes
Should writes be batched?
■ Latency of the operation is the latency of the batch
● Therefore it may fail.
■ Scylla is token aware, shard aware
● Batches may break that
● Summary: only batch updates to the same partition
34. Thank you Stay in touch
Any questions?
Glauber Costa
glauber@scylladb.com
@glcst