SlideShare una empresa de Scribd logo
1 de 34
LongTerm
storage for
Prometheus
Far long ago….. in all datacentres
bigger average = less details
CONTENT
 Introduction
 Long-Term Storage Overview
 Thanos Architecture and Resources Usage
 VictoriaMetrics Architecture and Resources Usage
 Price comparison
INTRODUCTION
Thanos is a set of components that can be composed into a highly available metric system with
unlimited storage capacity, which can be added seamlessly on top of existing Prometheus
deployments.
Curren release 0.5.0 is designed to store old metrics (which reached retention period on
Prometheus nodes) on some S3 like storage for long-term.
Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will
show only data stored on Prometheus instances.
VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long-
term remote storage for Prometheus. It uses own data compression, it allows to store more data
on the same disk size.
Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for
Prometheus.
Prometheus
node
Monitored service 1
Monitored service 2
Monitored service ...
Monitored service N
Storage
Grafana
Long-Term Storage
DataSource 1
DataSource 2
Alerts
Alertmanager
Store data after retention is reached
 Why do we need Long-Term storage:
 To store a historical data about your workloads
 To review an incidents
 To plan a scaling based on seasonal load
 To find a bottlenecks into infrastructure during continuous run/load
 What solutions can be used for storing Long-Term historical timeseries:
 Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics *
* https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage
LONG-TERM STORAGE OVERVIEW
THANOS ARCHITECTURE
Prometheus POD
Prometheus Prometheus config reloader
Configmap reloader Thanos sidecar
Thanos query POD
Thanos query
Thanos compact
Thanos compact
Thanos Store Gateway POD
Thanos store gateway
* https://thanos.io/getting-started.md/
Storages for Thanos:
(stable)
- Google Cloud Storage
- AWS S3
- Azure Storage Account
(beta)
- OpenStack Swift
- Tencent COS
Thanos query POD
Thanos query
Thanos Store Gateway POD
Thanos store gateway
Prometheus 2
Grafana or Thanos UI
Prometheus 1
Bucket
AVANTAGES AND DISAVANTAGES
- Infinity retention without reconfiguring
srorage
- Collected data is available even if
infrastucture recreated (data is into bucket)
- Global query view over data collected from
multiple Prometheus instances and bucket
- Horizontal scalability
- Metrics compaction
- Full monitoring stack
- Complicated infrastructure
HOW IT WAS TESTED
NODE_0
NODE_2
NODE...
NODE_498
NODE_499
METRIC_0
METRIC_1
METRIC_2
METRIC_...
METRIC_999
NODE_49
9
NODE_
4
500
NODES
1000 METRIC PER
NODE
each 15 seconds
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
LOAD ON CLUSTER NODES
QUERIES VIA THANOS FROM BUCKET
MEMORY USAGE STABILIZATION ON CLUSTER NODES
SCRAPE DURATION
GKE CLUSTER DETAILS
pay attention on allocation )))
Between scrapes 30 sec, during this time we have 2 15-sec intervals,
So 4.37 sec prometheus needs to scrape 1 000 000 metrics
VICTORIAMETRICS ARCHITECTURE
VM-select_2VM-select_1 VM-select_3
VM-storage_2VM-storage_1 VM-storage_3
VM-insert_2VM-insert_1 VM-insert_3
LB/ClusterIP
LB/ClusterIP
STATEFUL
STATELESS
STATELESS
READ OPERATIONS
WRITE OPERATIONS
AVANTAGES AND DISAVANTAGES
- Infinity retention with reconfiguring storage
- Global query view over data collected from
storage
- Horizontal scalability
- Metrics compaction (multpile times better)
(floating to integer)
- Simple infrastructure
- No integration with Alert Manager
- Cloud storages are not supported yet
https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129
- More load on hosts
24 Hours
Scroll Bar (500 reporters)
500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
THANOS VICTORIAMETRICS
- 12-15 GiB metrics per 1 day (2.88Bil)
- 16GiB memory used on nodes
- 2.1 – 2.4 CPU cores are used on nodes
- 2.8-3 GiB metrics per 1 day on each
storage(2.88Bil)
- 16GiB memory used on nodes
- 2.8 – 4 CPU cores are used on nodes
Storage price (Cloud Storage*):
15*365=5475 ~5500Gib
Storage total: $126.50 per month;
~$1500 in 1 year
* Based on retention we can move data to a
cold line storage class
Storage price (Persistent Disk Standard):
3*365=1095 ~1100Gib
$52.80 per month * Numer_of_Storages
Storage total: 52.8*3=158.4 per 1 month
* https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134
If one of the storages lost – some part of data became unavailable
PRICE COMPARISON
Thanos Vicrotiametrics
Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate
3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60
CPU usage
Memory usage
50%
16GB
65%
16GB
Metrics per day 15GB 9GB
Metrics per minute 2 000 000 2 000 000
Metrics per one day 2 880 000 000 2 880 000 000
Scrape interval (1M metrics) 4.373 s 4.553 s
Historical data access 303-525 ms
(500 timeseies)
179-492 ms
(500 timeseies)
STORAGE PRICING
Q & A
To produce downsampled data, the Compactor continuously aggregates series down to five
minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR
compression, it stores different types of aggregations, e.g. min, max, or sum in a single block.
This allows Querier to automatically choose the aggregate that is appropriate for a given
PromQL query.
VM Gorilla compression analysis
VM Gorilla compression analysis
VM Gorilla compression analysis
The only problem is the result may exceed 64 bits — default integer size used in modern computers.
How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that
allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.

Más contenido relacionado

La actualidad más candente

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codesNAVER D2
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentSiheon Kim
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with GnocchiGordon Chung
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGordon Chung
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2Gordon Chung
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOHugo González Labrador
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrQAware GmbH
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBJason Terpko
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and presentGordon Chung
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Gruter
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reducevlaskinvlad
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API akvalex
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDBJason Terpko
 

La actualidad más candente (20)

[241]large scale search with polysemous codes
[241]large scale search with polysemous codes[241]large scale search with polysemous codes
[241]large scale search with polysemous codes
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Designing and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift DeploymentDesigning and Building Multi-Region Swift Deployment
Designing and Building Multi-Region Swift Deployment
 
Storing metrics at scale with Gnocchi
Storing metrics at scale with GnocchiStoring metrics at scale with Gnocchi
Storing metrics at scale with Gnocchi
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
The new time series kid on the block
The new time series kid on the blockThe new time series kid on the block
The new time series kid on the block
 
Gnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.xGnocchi Profiling 2.1.x
Gnocchi Profiling 2.1.x
 
Gnocchi Profiling v2
Gnocchi Profiling v2Gnocchi Profiling v2
Gnocchi Profiling v2
 
Testing data and metadata backends with ClawIO
Testing data and metadata backends with ClawIOTesting data and metadata backends with ClawIO
Testing data and metadata backends with ClawIO
 
A Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache SolrA Fast and Efficient Time Series Storage Based on Apache Solr
A Fast and Efficient Time Series Storage Based on Apache Solr
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Gnocchi v4 - past and present
Gnocchi v4 - past and presentGnocchi v4 - past and present
Gnocchi v4 - past and present
 
Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105Tajo case study bay area hug 20131105
Tajo case study bay area hug 20131105
 
Cassandra&map reduce
Cassandra&map reduceCassandra&map reduce
Cassandra&map reduce
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Object multifunctional indexing with an open API
Object multifunctional indexing with an open API Object multifunctional indexing with an open API
Object multifunctional indexing with an open API
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
NBITSearch. Features.
NBITSearch. Features.NBITSearch. Features.
NBITSearch. Features.
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 

Similar a ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon RedshiftAmazon Web Services
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackShapeBlue
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationDenodo
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Amazon Web Services
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostDatabricks
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prepJason Wong
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexAnant Corporation
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Thomas Riley
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheAlluxio, Inc.
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleJuraj Hantak
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scaleAdam Hamsik
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayC4Media
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...NETWAYS
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...SoftwareONEPresents
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbZhangZhengming
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...InfluxData
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffTimescale
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsBuurst
 

Similar a ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019 (20)

(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift(DAT201) Introduction to Amazon Redshift
(DAT201) Introduction to Amazon Redshift
 
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStackAdam Dagnall: Advanced S3 compatible storage integration in CloudStack
Adam Dagnall: Advanced S3 compatible storage integration in CloudStack
 
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical DemonstrationMaximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
Maximizing Data Lake ROI with Data Virtualization: A Technical Demonstration
 
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
Integrating On-premises Enterprise Storage Workloads with AWS (ENT301) | AWS ...
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low CostHow The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
How The Weather Company Uses Apache Spark to Serve Weather Data Fast at Low Cost
 
958 and 959 sales exam prep
958 and 959 sales exam prep958 and 959 sales exam prep
958 and 959 sales exam prep
 
Data Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/CortexData Engineer's Lunch #23: Thanos/Cortex
Data Engineer's Lunch #23: Thanos/Cortex
 
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
Prometheus in Practice: High Availability with Thanos (DevOpsDays Edinburgh 2...
 
RaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cacheRaptorX: Building a 10X Faster Presto with hierarchical cache
RaptorX: Building a 10X Faster Presto with hierarchical cache
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Monitoring with prometheus at scale
Monitoring with prometheus at scaleMonitoring with prometheus at scale
Monitoring with prometheus at scale
 
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/DayDatadog: a Real-Time Metrics Database for One Quadrillion Points/Day
Datadog: a Real-Time Metrics Database for One Quadrillion Points/Day
 
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
OSMC 2021 | pg_stat_monitor: A cool extension for better database (PostgreSQL...
 
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
Azure and StorSimple for Disaster Recovery and Storage Management - SoftwareO...
 
Apache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdbApache con 2020 use cases and optimizations of iotdb
Apache con 2020 use cases and optimizations of iotdb
 
Symantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWSSymantec NetBackup na Nuvem AWS
Symantec NetBackup na Nuvem AWS
 
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
Lessons Learned: Running InfluxDB Cloud and Other Cloud Services at Scale | T...
 
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade OffDatabases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
Databases Have Forgotten About Single Node Performance, A Wrongheaded Trade Off
 
How to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage CostsHow to Reduce Public Cloud Storage Costs
How to Reduce Public Cloud Storage Costs
 

Más de UA DevOps Conference

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOpsUA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...UA DevOps Conference
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...UA DevOps Conference
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...UA DevOps Conference
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...UA DevOps Conference
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019UA DevOps Conference
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...UA DevOps Conference
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...UA DevOps Conference
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...UA DevOps Conference
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019UA DevOps Conference
 

Más de UA DevOps Conference (10)

ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOpsІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps»  GO DevOps
ІЛЛЯ ЛУБЕНЕЦЬ «DevSecOps наступний етап розвитку DevOps» GO DevOps
 
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
ОЛЕКСАНДР СНІГОВИЙ «Continuous Deployment: Challenges, Solutions, and Lesson...
 
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
АРТЕМ КОБРІН «Achieve Networking at Scale with a Self-Service Network Solutio...
 
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
ОЛЕКСАНДР СИРОТЕНКО «DataKernel: майструючи український фреймворк для highloa...
 
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
ЯРОСЛАВ РАВЛІНКО «Data Science at scale. Next generation data processing plat...
 
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
ОЛЕКСАНДР ВІЛЬЧИНСЬКИЙ «DevOps culture» Lviv DevOps Conference 2019
 
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
КОСТЯНТИН СЕВЕРЕНЧУК «Monitoring and Automation in DevTestSecOps world» Lviv ...
 
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
ОЛЕКСАНДР СНІГОВИЙ «Extension of DevOps: Policy as Code» Lviv DevOps Confere...
 
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
СТАНІСЛАВ КОЛЕНКІН «Cilium – Network security for microservices. Let’s see ho...
 
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
ОЛЕГ МАЦЬКІВ «Crash course on Operator Framework» Lviv DevOps Conference 2019
 

Último

Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 

Último (20)

CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 

ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019

  • 2. Far long ago….. in all datacentres bigger average = less details
  • 3. CONTENT  Introduction  Long-Term Storage Overview  Thanos Architecture and Resources Usage  VictoriaMetrics Architecture and Resources Usage  Price comparison
  • 4. INTRODUCTION Thanos is a set of components that can be composed into a highly available metric system with unlimited storage capacity, which can be added seamlessly on top of existing Prometheus deployments. Curren release 0.5.0 is designed to store old metrics (which reached retention period on Prometheus nodes) on some S3 like storage for long-term. Collected metrics can be accessed for reviewing via Grafana. Prometheus query dashboard will show only data stored on Prometheus instances. VictoriaMetrics is fast, cost-effective and scalable time-series database. It can be used as long- term remote storage for Prometheus. It uses own data compression, it allows to store more data on the same disk size. Cortex provides horizontally scalable, highly available, multi-tenant, long term storage for Prometheus.
  • 5. Prometheus node Monitored service 1 Monitored service 2 Monitored service ... Monitored service N Storage Grafana Long-Term Storage DataSource 1 DataSource 2 Alerts Alertmanager Store data after retention is reached
  • 6.  Why do we need Long-Term storage:  To store a historical data about your workloads  To review an incidents  To plan a scaling based on seasonal load  To find a bottlenecks into infrastructure during continuous run/load  What solutions can be used for storing Long-Term historical timeseries:  Cortex, InfluxDB, Kafka, Graphite, …, Thanos, VictoriaMetrics * * https://prometheus.io/docs/operating/integrations/#remote-endpoints-and-storage LONG-TERM STORAGE OVERVIEW
  • 7. THANOS ARCHITECTURE Prometheus POD Prometheus Prometheus config reloader Configmap reloader Thanos sidecar Thanos query POD Thanos query Thanos compact Thanos compact Thanos Store Gateway POD Thanos store gateway
  • 8. * https://thanos.io/getting-started.md/ Storages for Thanos: (stable) - Google Cloud Storage - AWS S3 - Azure Storage Account (beta) - OpenStack Swift - Tencent COS
  • 9. Thanos query POD Thanos query Thanos Store Gateway POD Thanos store gateway Prometheus 2 Grafana or Thanos UI Prometheus 1 Bucket
  • 10. AVANTAGES AND DISAVANTAGES - Infinity retention without reconfiguring srorage - Collected data is available even if infrastucture recreated (data is into bucket) - Global query view over data collected from multiple Prometheus instances and bucket - Horizontal scalability - Metrics compaction - Full monitoring stack - Complicated infrastructure
  • 11. HOW IT WAS TESTED NODE_0 NODE_2 NODE... NODE_498 NODE_499 METRIC_0 METRIC_1 METRIC_2 METRIC_... METRIC_999 NODE_49 9 NODE_ 4 500 NODES 1000 METRIC PER NODE each 15 seconds
  • 12. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 14.
  • 15. QUERIES VIA THANOS FROM BUCKET
  • 16.
  • 17. MEMORY USAGE STABILIZATION ON CLUSTER NODES SCRAPE DURATION GKE CLUSTER DETAILS pay attention on allocation ))) Between scrapes 30 sec, during this time we have 2 15-sec intervals, So 4.37 sec prometheus needs to scrape 1 000 000 metrics
  • 18. VICTORIAMETRICS ARCHITECTURE VM-select_2VM-select_1 VM-select_3 VM-storage_2VM-storage_1 VM-storage_3 VM-insert_2VM-insert_1 VM-insert_3 LB/ClusterIP LB/ClusterIP STATEFUL STATELESS STATELESS READ OPERATIONS WRITE OPERATIONS
  • 19.
  • 20.
  • 21. AVANTAGES AND DISAVANTAGES - Infinity retention with reconfiguring storage - Global query view over data collected from storage - Horizontal scalability - Metrics compaction (multpile times better) (floating to integer) - Simple infrastructure - No integration with Alert Manager - Cloud storages are not supported yet https://github.com/VictoriaMetrics/VictoriaMetrics/issues/129 - More load on hosts
  • 22.
  • 23. 24 Hours Scroll Bar (500 reporters) 500 nodes, 4 times per minute, 24 hours = 2 880 000 000 points
  • 24.
  • 25.
  • 26. THANOS VICTORIAMETRICS - 12-15 GiB metrics per 1 day (2.88Bil) - 16GiB memory used on nodes - 2.1 – 2.4 CPU cores are used on nodes - 2.8-3 GiB metrics per 1 day on each storage(2.88Bil) - 16GiB memory used on nodes - 2.8 – 4 CPU cores are used on nodes Storage price (Cloud Storage*): 15*365=5475 ~5500Gib Storage total: $126.50 per month; ~$1500 in 1 year * Based on retention we can move data to a cold line storage class Storage price (Persistent Disk Standard): 3*365=1095 ~1100Gib $52.80 per month * Numer_of_Storages Storage total: 52.8*3=158.4 per 1 month * https://github.com/VictoriaMetrics/VictoriaMetrics/issues/134 If one of the storages lost – some part of data became unavailable PRICE COMPARISON
  • 27. Thanos Vicrotiametrics Instance’s price 3* N-standard-4 4vCPU 15GB memory $97.49 monthly estimate 3*97.49=$292 Standard Provisioned Space: 1,500 GB - $60 CPU usage Memory usage 50% 16GB 65% 16GB Metrics per day 15GB 9GB Metrics per minute 2 000 000 2 000 000 Metrics per one day 2 880 000 000 2 880 000 000 Scrape interval (1M metrics) 4.373 s 4.553 s Historical data access 303-525 ms (500 timeseies) 179-492 ms (500 timeseies)
  • 29. Q & A
  • 30.
  • 31. To produce downsampled data, the Compactor continuously aggregates series down to five minute and one hour resolutions. For each raw chunk, encoded with TSDB’s XOR compression, it stores different types of aggregations, e.g. min, max, or sum in a single block. This allows Querier to automatically choose the aggregate that is appropriate for a given PromQL query.
  • 34. VM Gorilla compression analysis The only problem is the result may exceed 64 bits — default integer size used in modern computers. How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.