How and Why Prometheus' New Storage Engine Pushes the Limits of Time Series Databases

1. Storage in Prometheus 2.0 Goutham V Student, IIT Hyderabad gouthamve putadent Add picture here

2. Introduction What and how much are we storing?

3. Time Series (t0, v0), (t1, v1), (t2, v2), (t3, v3), .... time

4. Time Series time series A time series

5. Time Series requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} ...

6. Time Series requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} Select: requests_total

7. Time Series requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.3:80"} requests_total{path="/", method="GET", instance="10.0.0.2:80"} Select: requests_total{method="GET"}

8. Time Series time series

12. What you expose: requests_total{path="/status", method="GET"} requests_total{path="/status", method="POST"} requests_total{path="/", method="GET"} What prometheus scrapes: requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.1:80"} requests_total{path="/", method="GET", instance="10.0.0.1:80"}

13. Scale 5 million active time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 7 TB on disk 3,000 - 15,000 microservice instances

14. Scale 5 million active time series 30 second scrape interval 6 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 42 TB on disk 3,000 - 15,000 microservice instances

15. Compression - Timestamp 1496163646 1496163676 1496163706 1496163735 1496163765

16. Compression - Timestamp 1496163646 +30 +30 +29 +30Δ 1496163646 1496163676 1496163706 1496163735 1496163765

17. Compression - Timestamp 1496163646 +30 +0 -1 +1ΔΔ 1496163646 +30 +30 +29 +30Δ 1496163646 1496163676 1496163706 1496163735 1496163765

18. Compression - Value http://www.vldb.org/pvldb/vol8/p1816-teller.pdf

19. Compression - Value http://www.vldb.org/pvldb/vol8/p1816-teller.pdf bytes/sample Raw: 16 Compressed: 1.37 7TB => 600GB

20. Scale 5 million active time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 600GB on disk 3,000 - 15,000 microservice instances

21. Churn! time series

22. What you expose: requests_total{path="/status", method="GET"} requests_total{path="/status", method="POST"} requests_total{path="/", method="GET"} What prometheus scrapes: requests_total{path="/status", method="GET", instance="10.0.0.1:80"} requests_total{path="/status", method="POST", instance="10.0.0.1:80"} requests_total{path="/", method="GET", instance="10.0.0.1:80"}

23. The new era ● Docker Swarm and Kubernetes ● New IP everytime a service is updated ● Dynamic scaling! ● Which means broken series and new series

24. 5 million active time series 150 million total time series 30 second scrape interval 1 month of retention 166,000 samples/second 432 billion samples 8 byte timestamp + 8 byte value ⇒ 600GB on disk Scale 3,000 - 15,000 microservice instances

25. Storage in 1.0 Layout and Problems

26. Time Series time series 1 file/series Holding compressed chunks

27. Time Series time series 1 file/series Holding compressed chunks

28. Problem Too many files!

29. Querying 1. Get series labels 2. Calculate Series ID 3. Add the ID against the label { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } Series ID: 3300

30. Querying 1. Get series labels 2. Calculate Series ID 3. Add the ID against the label

31. Querying { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } status=”200”: 1000 500000 99 1 1500 2 1001 5 1502 method=”GET”: 2 999999 4 3 1502 9 6 5 10 ... 1. Get series labels 2. Calculate Series ID 3. Add the ID against the label

32. 1.0 Problems ● Too many files ● Huge, unsorted global Index

33. Storage in 2.0 Layout and how it fixes the problems

34. Data and Queries time series

35. 2.0: Time Shared Blocks time series

36. 2.0 Layout mutable prometheuswrite t0 t1 t2 t3 now query merge

37. 2.0 Layout ● index, metadata and tombstones ● A bunch of files holding actual data ● ~100 files per block even in very large setups

38. 2.0 Layout ● index, metadata and tombstones ● A bunch of files holding actual data ● ~100 files per block even in very large setups

39. 2.0 Layout ● Fixed-size (512MB, fx) per block ● 512MB vs block/series

40. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations

41. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations ID: 5

42. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... ID: 5

43. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... ID: 5

44. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 … Intersect:

45. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 … Intersect:

46. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2

49. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5

52. 2.0 Index { __name__=”requests_total”, pod=”nginx-34534242-abc723 job=”nginx”, path=”/api/v1/status”, status=”200”, method=”GET”, } ● Assign block-scoped ID to each series ● Maintain sorted lists from label pair to IDs ● Efficient k-way set operations status=”200”: 1 2 5 99 1000 1001 1500 1502 500000 method=”GET”: 2 3 4 5 6 9 10 1502 999999 ... Intersect: 2 5 1502

55. Why Time-shard? time series

56. Problems Fixed ● Just a bunch of files ● Time-sharding reduces index size

57. Benchmarks Great! But does it work?

58. Benchmarks Kubernetes cluster + dedicated Prometheus nodes 800 microservice instances + Kubernetes components 120,000 samples/second 300,000 active time series Swap out 50% of pods every 10 minutes

59. Prometheuses? Prometheus servers? Prometheis Aside: Plural of Prometheus?

60. Memory: GB 1.5 Queried 1.5 Unqueried 2.0 Queried 2.0 Unqueried 15GB 5GB

61. Memory: GB ● MMAP rather than manual memory management ● Fundamentally different design

62. CPU: Cores Used 1.5 Queried 1.5 Unqueried 2.0 Queried 2.0 Unqueried 6 Cores 1 Cores

63. CPU: Cores Used ● Assembly accelerated compression. @beorn7 @dgryski

64. CPU: Cores Used ● Series ---> Series ID cache Prometheus 1.x: 1. id = hash(series) 2. ingest(id, t, v) ● Hashing millions of times a minute!

65. CPU: Cores Used ● Series ---> Series ID cache Prometheus 2.0: C = map[string] --> seriesID id, ok := C[`series{labels}`] if ok { ingest(id, t, v) }

66. Disk Writes: MB 1.5 Queried 1.5 Unqueried 2.0 (both) 80MB 20MB

67. Disk Writes: MB ● Checkpointing vs WAL ● Checkpoint: Dump all memory to disk every 5 minutes ● WAL: Write the values that come just once

68. On Disk Size: GB 1.5 (both) 2.0 (both) 25GB 5GB

69. On Disk Size: GB ● An empty file takes 4KB! ● We are having several small files which adds a per-file storage overhead. ● Not a concern for real environments.

70. Query Latency: secs 1.5 2.0 25 secs 5 secs

71. Moar 2.0! ● Backups ● Granular deletes ● Staleness handling ● …..

72. Compactions time series

75. Retention time series

76. Backups! mutable prometheuswrite t0 t1 t2 t3 now query merge

77. Backups! mutable prometheuswrite t0 t1 t2 t3 now query merge Immutable

78. Backups! Data dir Backup dir

81. Granular Deletes time series

82. Granular deletes Series-Ref ---> Deleted Ranges { 190: [{100, 200}, {300, 600}], 250: [{100, 5000}], } When the querier runs it pickups these ranges. If something is deleted, we skip that range in the query.

83. Tombstones Removal time series

84. Tombstones Removal time series time series

85. So yeah 2.0! Try it out! ● RC.1 cut TODAY for #DOCKERCON ● Lots of features: * Staleness handling * Backups * Better churn handling (du’h!)

86. Try it!!!!

87. Questions? https://github.com/prometheus/tsdb https://fabxc.org/blog/2017-04-10-writing-a-tsd b/

How and Why Prometheus' New Storage Engine Pushes the Limits of Time Series Databases

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a How and Why Prometheus' New Storage Engine Pushes the Limits of Time Series Databases

Similar a How and Why Prometheus' New Storage Engine Pushes the Limits of Time Series Databases (20)

Más de Docker, Inc.

Más de Docker, Inc. (20)

Último

Último (20)

How and Why Prometheus' New Storage Engine Pushes the Limits of Time Series Databases