Understanding Microservice Performance

Understanding Microservice Performance
Rob Harrop

The performance of a distributed
system is the combined performance of
its collaborating services and their
communication links

Services
and
aggregations of services

Who am I?
▸ CTO @ Skipjaq
▸ ML-driven performance optimisation
▸ Co-founder of SpringSource
▸ Once upon a time I…
▸ Contributed to Spring Framework
▸ Wrote a book about Spring
▸ Talked a lot about Spring

Who am I?
▸ I’m on Twitter:
▸ @robertharrop
▸ I’m on Github:
▸ github.com/robharrop
▸ I write about maths and performance
▸ https://robharrop.github.io
If you have questions after the session, {grab, tweet} me.

After this talk you will know how to:
▸ Measure performance correctly
▸ Find potential performance disasters
▸ Identify the best candidates for optimisation
▸ Model complex micro services systems
▸ Forecast system scalability

How fast can I do a thing,
and
how many things can I do every period?
What do we mean by performance?

What measures performance?
Latency (how fast)
and
Throughput (how many)

Throughput
▸ The rate of processing: x per y
▸ Requests per second
▸ Records per minute
▸ Messages per second
▸ Tasks per day

Latency
▸ Time taken… for something
▸ Service time?
▸ First byte?
▸ First response complete?
▸ Last byte?
▸ Render?
▸ Moral of the story: define what you mean by latency

This is where everything goes wrong

Crib Sheet
▸ Record timestamped requests with observed latency and success/error
▸ Throughput
▸ Min, max, mean
▸ Varying time windows (10s, 30s, 1m, 5m, …)
▸ Latency
▸ Min, max, 95th, 99th, 99.9th and other tail percentiles
▸ Mean just means meaningless

We need to talk about latency
▸ Latency isn’t exponentially-distributed
▸ And it certainly isn’t normally-distributed
▸ Latency distributions have heavy tails
▸ Latency distributions are multi-modal
▸ Customers see tail latencies way more than you think
▸ Don’t let percentiles trick you
▸ Understand what latency means to your business

Is my customer getting good service?

What does this mean in reality?

Is my customer getting good service?
95th percentile
42 requests

Most customers will see tail latencies

Are latency and throughput useful when
considered in isolation?

Attribution: http://www.cowboyjedi.com/comics/2010-03-10-i-made-the-kessel.gif

Queueing Theory - Little’s Law

What can we conclude?
▸ Latency and throughput work in tandem
▸ At high throughput, latency degrades considerably

High utilisation
is an
early-warning sign

Service latencies stack
▸ For simple cases (feed-forward networks), latencies are additive
▸ Analytical models are available
▸ http://robharrop.github.io/maths/performance/2016/03/15/queue-networks.html
▸ For most interesting cases this cannot be assumed
▸ Simulation is the best option
▸ Pretty Damn Quick (PDQ) is a great tool, but requires a chunk of effort
▸ Guesstimate is great for quick and dirty models

How much improvement can we get from
an optimisation?

Amdahl’s Law
▸ Theoretical improvement in latency given a fixed workload
Theoretical max
system speedup
Speedup of part under optimisation
Percentage of execution time in part
under optimisation

Amdahl’s Law in the Limit
▸ Theoretical max is limited by parts of the system not under improvement
Theoretical max
system speedup Percentage of execution time in part
under optimisation
Speedup of part under optimisation

Maximum theoretical system speedup
[1.11]
[1.67]
[1.25]
[1.43]

Drive optimisation choices by
service utilisation

Can increasing capacity reduce
performance?

Crosstalk
overhead
Universal Scalability Law
Contention
overhead
Relative capacity
Number of users

Measure crosstalk and target it for
optimisation

Summary
▸ Measurements are critical
▸ Garbage in, garbage out
▸ Monitor utilisation for early-warning of disaster
▸ Little’s Law
▸ Monitor latency per-user, not just per-request
▸ Select optimisation targets carefully
▸ Amdahl’s Law
▸ Monitor crosstalk to forecast scalability
▸ Universal Scalability Law

Reading List and Q&A
▸ Release It! - Michael Nygard
▸ Systems Performance - Brendan Gregg
▸ Guerrilla Capacity Planning - Dr. Neil Gunther
▸ Practical Scalability Analysis - Baron Schwartz

Understanding Microservice Performance

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (18)

Similar a Understanding Microservice Performance

Similar a Understanding Microservice Performance (20)

Último

Último (20)

Understanding Microservice Performance