Distributed systems are usually optimized with particular workloads in mind. At the same time, the system should still behave in a sane way when the assumptions about workload do not hold - notably, one user shouldn't be able to ruin the whole system's performance. Buggy parts of the system can be a source of the overload as well, so it is worth considering overload protection on a per-component basis. For example, ScyllaDB's shared-nothing architecture gives it great scalability, but at the same time makes it prone to a "hot partition" problem: a single partition accessed with disproportionate frequency can ruin performance for other requests handled by the same shards. This talk will describe how we implemented rate limiting on a per-partition basis which reduces the performance impact in such a case, and how we reduced the CPU cost of handling failed requests such as timeouts (spoiler: it's about C++ exceptions).