The Prometheus monitoring system collects and stores time series data to give valuable insights over hosts, containers, and applications. Its storage engine was designed to be multiple orders of magnitude faster and more space efficient than, say, RRD or SQL storage. However, with the rise of orchestration systems such as Docker Swarm and Kubernetes, and their extensive use of techniques like rolling updates and auto-scaling, environments are becoming increasingly dynamic. This increases the strain on metrics collection systems. To deal with the challenges, a new storage engine has been developed from scratch, bringing a sharp increase in performance and enabling new features.
This talk will describe this new storage engine, its architecture, its data structures, and explain why and how it is well suited to gracefully handle high turnover rates of monitoring targets and provide consistent query performance.
12. What you expose:
requests_total{path="/status", method="GET"}
requests_total{path="/status", method="POST"}
requests_total{path="/", method="GET"}
What prometheus scrapes:
requests_total{path="/status", method="GET", instance="10.0.0.1:80"}
requests_total{path="/status", method="POST", instance="10.0.0.1:80"}
requests_total{path="/", method="GET", instance="10.0.0.1:80"}
13. Scale
5 million active time series
30 second scrape interval
1 month of retention
166,000 samples/second
432 billion samples
8 byte timestamp + 8 byte value ⇒ 7 TB on disk
3,000 - 15,000
microservice instances
14. Scale
5 million active time series
30 second scrape interval
6 month of retention
166,000 samples/second
432 billion samples
8 byte timestamp + 8 byte value ⇒ 42 TB on disk
3,000 - 15,000
microservice instances
20. Scale
5 million active time series
30 second scrape interval
1 month of retention
166,000 samples/second
432 billion samples
8 byte timestamp + 8 byte value ⇒ 600GB on disk
3,000 - 15,000
microservice instances
22. What you expose:
requests_total{path="/status", method="GET"}
requests_total{path="/status", method="POST"}
requests_total{path="/", method="GET"}
What prometheus scrapes:
requests_total{path="/status", method="GET", instance="10.0.0.1:80"}
requests_total{path="/status", method="POST", instance="10.0.0.1:80"}
requests_total{path="/", method="GET", instance="10.0.0.1:80"}
23. The new era
● Docker Swarm and Kubernetes
● New IP everytime a service is updated
● Dynamic scaling!
● Which means broken series and new series
24. 5 million active time series
150 million total time series
30 second scrape interval
1 month of retention
166,000 samples/second
432 billion samples
8 byte timestamp + 8 byte value ⇒ 600GB on disk
Scale
3,000 - 15,000
microservice instances
29. Querying
1. Get series labels
2. Calculate Series ID
3. Add the ID against
the label
{
__name__=”requests_total”,
pod=”nginx-34534242-abc723
job=”nginx”,
path=”/api/v1/status”,
status=”200”,
method=”GET”,
}
Series ID: 3300
58. Benchmarks
Kubernetes cluster + dedicated Prometheus nodes
800 microservice instances + Kubernetes components
120,000 samples/second
300,000 active time series
Swap out 50% of pods every 10 minutes
63. CPU: Cores Used
● Assembly accelerated compression.
@beorn7 @dgryski
64. CPU: Cores Used
● Series ---> Series ID cache
Prometheus 1.x:
1. id = hash(series)
2. ingest(id, t, v)
● Hashing millions of times a minute!
65. CPU: Cores Used
● Series ---> Series ID cache
Prometheus 2.0:
C = map[string] --> seriesID
id, ok := C[`series{labels}`]
if ok {
ingest(id, t, v)
}
69. On Disk Size: GB
● An empty file takes 4KB!
● We are having several small files which adds a
per-file storage overhead.
● Not a concern for real environments.
82. Granular deletes
Series-Ref ---> Deleted Ranges
{
190: [{100, 200}, {300, 600}],
250: [{100, 5000}],
}
When the querier runs it pickups these ranges. If something is deleted, we skip
that range in the query.