Monitoring and scaling postgres at datadog

•Descargar como PPTX, PDF•

2 recomendaciones•320 vistas

An overview of how Datadog has built and scaled our Postgres clusters to support the ingestion of trillions of metric data points per day, by Seth Rosenblum, Lead Data Reliability Engineer.

Software

Monitoring and Scaling
Postgres at Datadog
Seth Rosenblum, Datadog @SethRosenblum

Seth Rosenblum
Data Reliability Engineering Lead
‣ Kafka
‣ Elasticsearch
‣ Cassandra
‣ Postgres!
@SethRosenblum

@datadoghq
SaaS-based monitoring
Trillions of data points per day
We’re hiring!
https://jobs.datadoghq.com

Collecting data is cheap; not
having it when you need it can
be expensive

Collecting data is cheap; not
having it when you need it can
be expensive
...So instrument all the things!

Metrics
connections
commits
rollbacks
disk_read
buffer_hit
rows_returned
rows_fetched
rows_inserted
rows_updated
rows_deleted
database_size
deadlocks
temp_bytes
What metrics do we gather?
temp_files
bgwriter.checkpoints_timed
bgwriter.checkpoints_requested
bgwriter.buffers_checkpoint
bgwriter.buffers_clean
bgwriter.maxwritten_clean
bgwriter.buffers_backend
bgwriter.buffers_alloc
bgwriter.buffers_backend_fsync
bgwriter.write_time
bgwriter.sync_time
locks
seq_scans
seq_rows_read
index_scans
index_rows_fetched
rows_hot_updated
live_rows
dead_rows
index_rows_read
table_size
index_size
total_size
table.count
max_connections
percent_usage_connections
replication_delay
replication_delay_bytes
heap_blocks_read
heap_blocks_hit
index_blocks_read
index_blocks_hit
toast_blocks_read
toast_blocks_hit
toast_index_blocks_read
toast_index_blocks_hit

“Postgres performance-optimizes a lot better
when it has a consistent workload”
Josh Berkus

How we do it
Requirements
▸Write master is writeable, read replicas are
readable!

How we do it
Requirements
▸Write master is writeable, read replicas are
readable!
▸Read replicas are up to date and don’t lag

How we do it
Requirements
▸Write master is writeable, read replicas are
readable!
▸Read replicas are up to date and don’t lag
▸Additional read replicas can be provisioned
quickly

How we do it
Solutions
▸PostgreSQL!
▸http://bit.ly/pg-repl-docs
▸WAL-E
▸https://github.com/wal-e/wal-e

How we do it
Solutions
▸PostgreSQL!
▸http://bit.ly/pg-repl-docs
▸WAL-E WAL-G
▸https://github.com/wal-g/wal-g

What are we alerting on?
▸Write master is writeable, read replicas are
readable!
▸Up/Down checks
▸Latency

What are we alerting on?
▸Read replicas are up to date and don’t lag
▸Write master standby availability
▸Write master standby replication lag
▸Read replica lag

What are we alerting on?
▸Additional read replicas can be provisioned
quickly
▸Base backups are functioning properly

When do we care that a backup has failed?

“Aside from shared_buffers, the most
important memory-allocation parameter
is work_mem… Raising this value can
dramatically improve the performance of certain
queries…”
Robert Haas

Latency vs Potential
EXPLAIN ANALYZE
http://bit.ly/pg-explain
‣ Explain displays the execution plan

Latency vs Potential
EXPLAIN ANALYZE
http://bit.ly/pg-explain
‣ Explain displays the execution plan
‣ Analyze runs it and gathers stats

Latency vs Potential
EXPLAIN ANALYZE
Merge Right Join (cost=25870.55..31017.51 rows=229367 width=92) (actual time=2884.501..5147.047 rows=354834 loops=1)
Merge Cond: (a.uid = b.uid)
-> Index Scan using foo on bar a (cost=0.00..537.29 rows=9246 width=27) (actual time=0.049..41.782 rows=9246 loops=1)
-> Materialize (cost=25870.49..27204.80 rows=106745 width=81) (actual time=2884.413..3804.537 rows=354834 loops=1)
-> Sort (cost=25870.49..26137.35 rows=106745 width=81) (actual time=2884.406..3099.732 rows=111878 loops=1)
Sort Key: b.uid
Sort Method: external merge Disk: 8928kB
…
Total runtime: 5588.105 ms
(14 rows)
http://bit.ly/pg-auto-explain

Summary
1. Collect as many metrics as you can, before
you need them
2. If the metrics that you have aren’t providing
the right value, build ones that do
3. Be aggressive in monitoring slow queries,
catch them while they’re easy to find

Resources
‣ http://dtdg.co/monitor-postgres
‣ https://dtdg.co/gcp-sql
‣ https://dtdg.co/postgresql-vacuums

Questions?
Seth Rosenblum @SethRosenblum
seth@datadoghq.com

Más contenido relacionado

La actualidad más candente

In this session, we’ll follow the flow of data through an end-to-end system built to handle tens of terabytes an hour of event-oriented data, providing real-time streaming, in-memory, SQL, and batch access to this data. We’ll go into detail on how open source systems such as Hadoop, Kafka, Solr, and Impala/Hive can be stitched together to form the base platform; describe how and where to perform data transformation and aggregation; provide a simple and pragmatic way of managing event metadata; and talk about how applications built on top of this platform get access to data and extend its functionality. Finally, a brief demo of Rocana Ops, an application for large scale data center operations, will be given, along with an explanation about how it uses the underlying platform.

Building a system for machine and event-oriented data with Rocana

Treasure Data, Inc.

DataEngConf SF16 - High cardinality time series search

Hakka Labs

Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing

Vianney FOUCAULT

Data can be viewed as the exhaust of online activity. With the rise of cloud-based data platforms, barriers to data storage and transfer have crumbled. The demand for creative applications and learning from those datasets has accelerated. Rapid acceleration can quickly accrue disorder, and disorderly data design can turn the deepest data lake into an impenetrable swamp. In this talk, I will discuss the evolution of the data science workflow at Expedia with a special emphasis on Learning to Rank problems. From the heroic early days of ad-hoc Spark exploration to our first production sort model on the cloud, we will explore the process of industrializing the workflow. Layered over our story, I will share some best practices and suggestions on how to keep your data productive, or even pull your organization out of the data swamp.

Rental Cars and Industrialized Learning to Rank with Sean Downes

Databricks

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/2l2Rr6L. Doug Daniels discusses the cloud-based platform they have built at DataDog and how it differs from a traditional datacenter-based analytics stack. He walks through the decisions they have made at each layer, covers the pros and cons of these decisions and discusses the tooling they have built. Filmed at qconsf.com. Doug Daniels is a Director of Engineering at Datadog, where he works on high-scale data systems for monitoring, data science, and analytics. Prior to joining Datadog, he was CTO at Mortar Data and an architect and developer at Wireless Generation, where he designed data systems to serve more than 4 million students in 49 states.

Elastic Data Analytics Platform @Datadog

C4Media

Streaming data for real time analysis

Amazon Web Services

This presentation recounts the story of Macys.com and Bloomingdales.com's migration from legacy RDBMS to NoSQL Cassandra in partnership with DataStax. One thing that differentiates this talk from others on Cassandra is Macy's philosophy of "doing more with less." You will see why we emphasize the performance tuning aspects of iterative development when you see how much processing we can support on relatively small configurations. This session will cover: 1) The process that led to our decision to use Cassandra 2) The approach we used for migrating from DB2 & Coherence to Cassandra without disrupting the production environment 3) The various schema options that we tried and how we settled on the current one. We'll show you a selection of some of our extensive performance tuning benchmarks, as well as how these performance results figured into our final schema designs. 4) Our lessons learned and next steps

Macy's: Changing Engines in Mid-Flight

DataStax Academy

Scaling graphite for application metrics

Jim Plush

Scaling monitoring with Datadog

alexismidon

netflix-real-time-data-strata-talk

Danny Yuan

empirical analysis modeling of power dissipation control in internet data ce...

saadjamil31

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ

Hakka Labs

Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund

Spark Summit

Our product presents an aggregated view of metadata collected for billions of objects (files, emails, sharepoint objects etc.). We used Cassandra to store those billions of objects along with aggregated view of that metadata. Customers can analyse the corpus of data in real time by searching in completely flexible way i.e. be able to get summary aggregates for many billions of objects, and then be able to further drill down to items by filtering using various facets of the metadata. We achieve this using a combination of Cassandra and ElasticSearch. This presentation will talk about various data modelling techniques we use to aggregate and then further summarise all that metadata and be able to search the summary in real t

Symantec: Cassandra Data Modelling techniques in action

DataStax Academy

Analyzing and comparing your energy consumption with that of other consumers provides healthy peer pressure and useful insight leading to energy conservation and impacting the bottom line. We helped GridPocket (http://www.gridpocket.com/), a smart grid company developing energy management applications for electricity water and gas utilities, implement high scale anonymized energy comparison queries with an order of magnitude lower cost and higher performance than was previously possible. IoT use cases like that of GridPocket are swamping our planet with data, and drive demand for analytics on extremely scalable and low cost storage. Enter Spark SQL over Object Storage: highly scalable and low cost storage which provides RESTful APIs to store and retrieve objects and their metadata. Key performance indicators (KPIs) of query performance and cost are the number of bytes shipped from Object Storage to Spark and the number of incurred REST requests. We propose Pluggable Spark SQL Filters, which extend the existing Spark SQL partitioning mechanism with an ability to dynamically filter irrelevant objects during query execution. Our approach handles any data format supported by Spark SQL (Parquet, JSON, csv etc.), and unlike pushdown compatible formats such as Parquet which require touching each object to determine its relevance, it avoids accessing irrelevant objects altogether. We developed a pluggable interface for developing and deploying Filters, and implemented GridPocket’s filter which screens objects according to their metadata, for example geo-spatial bounding boxes which describe the area covered by an object’s data points. This leads to drastically lower KPIs since there is no need to ship the entire dataset from Object Storage to Spark if you are only comparing yourself with your neighborhood. We demonstrate GridPocket analytics notebooks, report on our implementation and resulting 10-20x speedups, explain how to implement a Pluggable File Filter, and how we applied this to other use cases.

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...

Spark Summit

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

confluent

La collecte de données au sein d'un DataLake sans impacter les systèmes opérationnels est un challenge pour de nombreuses entreprises. Lors du meetup Paris Data Engineers du 26 mars 2019, Dimitri Capitaine nous a présenté Data Collector qui est un outil de Change Data Capture (CDC) développé en interne chez OVH. Data Collector est capable d'assurer une réplication fiable et performante des bases de données jusqu'au DataLake. Hugo Larcher nous a alors présenté un cas d'utilisation autour de l'exploitation de données aéronautiques avec une touche d'IoT et de DataViz.

Change Data Capture with Data Collector @OVH

Paris Data Engineers !

Elastic Stack roadmap deep dive

Elasticsearch

As a leader in the financial industry, Capital One applications generate huge amounts of data that require fast and accurate handling, storage and analysis. We are transforming how we report operational data to our internal users so that they can make quick and precise business decisions to serve our customers. As part of this transformation, we are building a new Go-based data processing framework that will enable us to transfer data from multiple data stores (RDBMS, files, etc.) to a single NoSQL database - Cassandra. This new NoSQL store will act as a reporting database that will receive data on a near real-time basis and serve the data through scorecards and reports. We would like to share our experience in defining this fast data platform and the methodologies used to model financial data in Cassandra.

Capital One: Using Cassandra In Building A Reporting Platform

DataStax Academy

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...

Altinity Ltd

La actualidad más candente (20)

Building a system for machine and event-oriented data with Rocana

DataEngConf SF16 - High cardinality time series search

Clickhouse MeetUp@ContentSquare - ContentSquare's Experience Sharing

Rental Cars and Industrialized Learning to Rank with Sean Downes

Elastic Data Analytics Platform @Datadog

Streaming data for real time analysis

Macy's: Changing Engines in Mid-Flight

Scaling graphite for application metrics

Scaling monitoring with Datadog

netflix-real-time-data-strata-talk

empirical analysis modeling of power dissipation control in internet data ce...

DataEngConf SF16 - BYOMQ: Why We [re]Built IronMQ

Spark Summit EU talk by Sebastian Schroeder and Ralf Sigmund

Symantec: Cassandra Data Modelling techniques in action

Using Pluggable Apache Spark SQL Filters to Help GridPocket Users Keep Up wit...

Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber

Change Data Capture with Data Collector @OVH

Elastic Stack roadmap deep dive

Capital One: Using Cassandra In Building A Reporting Platform

HTTP Analytics for 6M requests per second using ClickHouse, by Alexander Boc...

Último

8257 interfacing 2 in microprocessor for btech students

HimanshiGarg82

Craft an AI & Machine Learning Pitch with our Editable Professional PowerPoint Template. Ignite your AI & Machine Learning pitch with our cutting-edge PowerPoint template tailored for the industry. Perfect for AI conferences, investor presentations, sales pitches to tech-focused companies, training sessions, and educational programs. - 20+ editable slides: Get a variety of options to choose from for your presentation. - Time-saving solution: Download, replace text/images with a few clicks. - User-friendly customization: Easy to use and personalize. - Modern and attractive design: Captivating visuals, sleek layout. - Tailored to your requirements: Fully alterable for customization. - Well-organized slides: Complete control over content. - Thematic specificity: Reflects healthcare industry with relevant graphics. - Showcase your business idea: Communicate value proposition effectively.

AI & Machine Learning Presentation Template

Presentation.STUDIO

+971565801893 Mtp-Kit (500MG) Prices » Dubai [(+971565801893**)] Abortion Pills For Sale In Dubai, UAE, Mifepristone and Misoprostol Tablets Available In Dubai, UAE CONTACT DR.Leen Whatsapp +971565801893 We Have Abortion Pills / Cytotec Tablets /Mifegest Kit Available in Dubai, Sharjah, Abudhabi, Ajman, Alain, Fujairah, Ras Al Khaimah, Umm Al Quwain, UAE, Buy cytotec in Dubai +971565801893''''Abortion Pills near me DUBAI | ABU DHABI|UAE. Price of Misoprostol, Cytotec” +971565801893' Dr.DEEM ''BUY ABORTION PILLS MIFEGEST KIT, MISOPROTONE, CYTOTEC PILLS IN DUBAI, ABU DHABI,UAE'' Contact me now via What's App…… abortion Pills Cytotec also available Oman Qatar Doha Saudi Arabia Bahrain Above all, Cytotec Abortion Pills are Available In Dubai / UAE, you will be very happy to do abortion in Dubai we are providing cytotec 200mg abortion pill in Dubai, UAE. Medication abortion offers an alternative to Surgical Abortion for women in the early weeks of pregnancy. We only offer abortion pills from 1 week-6 Months. We then advise you to use surgery if its beyond 6 months. Our Abu Dhabi, Ajman, Al Ain, Dubai, Fujairah, Ras Al Khaimah (RAK), Sharjah, Umm Al Quwain (UAQ) United Arab Emirates Abortion Clinic provides the safest and most advanced techniques for providing non-surgical, medical and surgical abortion methods for early through late second trimester, including the Abortion By Pill Procedure (RU 486, Mifeprex, Mifepristone, early options French Abortion Pill), Tamoxifen, Methotrexate and Cytotec (Misoprostol). The Abu Dhabi, United Arab Emirates Abortion Clinic performs Same Day Abortion Procedure using medications that are taken on the first day of the office visit and will cause the abortion to occur generally within 4 to 6 hours (as early as 30 minutes) for patients who are 3 to 12 weeks pregnant. When Mifepristone and Misoprostol are used, 50% of patients complete in 4 to 6 hours; 75% to 80% in 12 hours; and 90% in 24 hours. We use a regimen that allows for completion without the need for surgery 99% of the time. All advanced second trimester and late term pregnancies at our Tampa clinic (17 to 24 weeks or greater) can be completed within 24 hours or less 99% of the time without the need surgery. The procedure is completed with minimal to no complications. Our Women's Health Center located in Abu Dhabi, United Arab Emirates, uses the latest medications for medical abortions (RU-486, Mifeprex, Mifegyne, Mifepristone, early options French abortion pill), Methotrexate and Cytotec (Misoprostol). The safety standards of our Abu Dhabi, United Arab Emirates Abortion Doctors remain unparalleled. They consistently maintain the lowest complication rates throughout the nation. Our Physicians and staff are always available to answer questions and care for women in one of the most difficult times in their lives. The decision to have an abortion at the Abortion Clinic in Abu Dhabi, United Arab Emirates.+971565801893

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...

Health

WSO2CON2024 - It's time to go Platformless

WSO2

%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein

masabamasaba

%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain

masabamasaba

At the recent Microsoft Ignite 2023 conference, Microsoft unveiled a groundbreaking strategy that will redefine enterprise work management. The plan involves integrating Microsoft’s key planning tools, Microsoft To Do, Microsoft Planner, and Microsoft Project for the web into a unified experience called “Microsoft Planner.” What does this new strategy from Microsoft mean for current users? Join us and learn how best to take advantage of this announcement while gaining a clear path on how to elevate the current state of Microsoft Planner from a basic task manager to a comprehensive tool for Enterprise Work Management using OnePlan. Learn how OnePlan’s integration with Microsoft Planner allows for strategic alignment with business goals through advanced features like strategic planning, portfolio management, resource management, financial management, and more!

Introducing Microsoft’s new Enterprise Work Management (EWM) Solution

OnePlan Solutions

%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein

masabamasaba

We specialize in Psychic Readings, Psychic Love Spells, Binding Love Spells, Obsession Spells, Voodoo Spells, Lottery Spells, Marriage Spells, Black Magic Spells, Palm Readings & much more. Are you depressed? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? We perform this come-to-me love spell that works instantly with the aim of bringing back the victim to the person performing the magic. Have you lost your lover? Do u need to solve any relationship problem? Contact the powerful spells caster chief kule with love spells that work overnight and love spells that really work. Have you found yourself infatuated with a special someone you think could be the one? Are you looking for a spell to provide them with a nudge in the right direction? Or maybe the spell you cast didn’t achieve the results you were hoping for? Whether you’re new or versed in the ways of spell casting, we’re here to help. Today we’re going to provide you with a detailed guide on the types of love spells to cast. Not only that but there’s something for those who wish to find outside advice from more advanced spell casters. We’re also going to provide you with the top sites available to help you with your dilemma. Let’s begin our journey by educating ourselves on love magic and what a real love caster looks like. Love Magic and Love Casters Love magic made its first appearance back in Ancient Egypt and has been an active practice since. This type of magic is a branch of traditional magic and can be practiced in various ways. Typically the more common use of love magic is through the work of spells, but other methods look like Charms Rituals-LOVE Potions-Dolls and even Amulets If you are interested in becoming a love caster, be prepared for what’s to come. A genuine love caster knows that the art of love casting is no easy feat and shouldn’t be done casually. You should know that not only does it require you to be gifted spiritually, but you must be ready to serve others. Someone who is considered a real love caster has experience in all manner of spells, no matter the difficulty. Training yourself in attraction, commitment, and marriage spells is an excellent place to start. But this by no means will make you a professional. Practice your craft and expand your knowledge; understand that you will possess the ability to help others in time truly. Types of Love Spells What better way to start broadening your experiences with love spells than by learning more about them? These spells work like just about any other spell. Simply apply your intention, use a medium (sigils, mantras, candles, or charm bags), and top it off with establishing the belief that you will receive what you want. So what kind of spells are available and which ones suit your needs the best? Let’s take a look at the many options you have at your disposal.

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...

masabamasaba

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...

SelfMade bd

tonesoftg

lanshi9

Define the academic and professional writing..pdf

PearlKirahMaeRagusta1

%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...

masabamasaba

In today's dynamic e-commerce landscape, the payment gateway emerges as a linchpin, ensuring smooth and secure transactions between buyers and sellers. In this discourse, we delve into the meticulous process of devising test cases tailored for scrutinizing payment gateways. Crafting precise test cases for payment gateways is a quintessential responsibility for testers operating within the service industry. This article meticulously explores pivotal scenarios integral to how to test payment gateways, coupled with essential guidelines for drafting effective test cases.

Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf

kalichargn70th171

OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...

Shane Coughlan

Investing in AI transformation today The modern business advantage: Uncovering deep insights with AI Organizations around the world have come to recognize AI as the transformative technology that enables them to gain real business advantage. AI’s ability to organize vast quantities of data allows those who implement it to uncover deep business insights, augment human expertise, drive operational efficiency, transform their products, and better serve their customers

Microsoft AI Transformation Partner Playbook.pdf

Willy Marroquin (WillyDevNET)

%in kempton park+277-882-255-28 abortion pills for sale in kempton park

masabamasaba

Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...

Bert Jan Schrijver

WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation

WSO2

VTU technical seminar 8Th Sem on Scikit-learn

AmarnathKambale

Monitoring and scaling postgres at datadog

1. Monitoring and Scaling Postgres at Datadog Seth Rosenblum, Datadog @SethRosenblum

2. Seth Rosenblum Data Reliability Engineering Lead ‣ Kafka ‣ Elasticsearch ‣ Cassandra ‣ Postgres! @SethRosenblum

3. @datadoghq SaaS-based monitoring Trillions of data points per day We’re hiring! https://jobs.datadoghq.com

4. Collecting data is cheap; not having it when you need it can be expensive

5. Collecting data is cheap; not having it when you need it can be expensive ...So instrument all the things!

6. Metrics connections commits rollbacks disk_read buffer_hit rows_returned rows_fetched rows_inserted rows_updated rows_deleted database_size deadlocks temp_bytes What metrics do we gather? temp_files bgwriter.checkpoints_timed bgwriter.checkpoints_requested bgwriter.buffers_checkpoint bgwriter.buffers_clean bgwriter.maxwritten_clean bgwriter.buffers_backend bgwriter.buffers_alloc bgwriter.buffers_backend_fsync bgwriter.write_time bgwriter.sync_time locks seq_scans seq_rows_read index_scans index_rows_fetched rows_hot_updated live_rows dead_rows index_rows_read table_size index_size total_size table.count max_connections percent_usage_connections replication_delay replication_delay_bytes heap_blocks_read heap_blocks_hit index_blocks_read index_blocks_hit toast_blocks_read toast_blocks_hit toast_index_blocks_read toast_index_blocks_hit

7. Scaling PostgreSQL at Datadog

8. Moar Resources!

9. Moar Instances!

10. Writes Repl Reads

11. Writes Repl Reads

12. “Postgres performance-optimizes a lot better when it has a consistent workload” Josh Berkus

13. Writes Repl Standbys

14. How we do it Requirements ▸Write master is writeable, read replicas are readable!

15. How we do it Requirements ▸Write master is writeable, read replicas are readable! ▸Read replicas are up to date and don’t lag

16. How we do it Requirements ▸Write master is writeable, read replicas are readable! ▸Read replicas are up to date and don’t lag ▸Additional read replicas can be provisioned quickly

17. How we do it Solutions ▸PostgreSQL! ▸http://bit.ly/pg-repl-docs ▸WAL-E ▸https://github.com/wal-e/wal-e

18. How we do it Solutions ▸PostgreSQL! ▸http://bit.ly/pg-repl-docs ▸WAL-E WAL-G ▸https://github.com/wal-g/wal-g

19. How we do it Requirements ▸Write master is writeable, read replicas are readable! ▸Read replicas are up to date and don’t lag ▸Additional read replicas can be provisioned quickly

20. What are we alerting on? ▸Write master is writeable, read replicas are readable! ▸Up/Down checks ▸Latency

21. What are we alerting on? ▸Read replicas are up to date and don’t lag ▸Write master standby availability ▸Write master standby replication lag ▸Read replica lag

22.

23. What are we alerting on? ▸Additional read replicas can be provisioned quickly ▸Base backups are functioning properly

24.

25. When do we care that a backup has failed?

26.

27.

28. Monitoring to Improve Performance

29. Slow Queries

30. Slow Queries

31. Performance: RAM vs Disk

32. “Aside from shared_buffers, the most important memory-allocation parameter is work_mem… Raising this value can dramatically improve the performance of certain queries…” Robert Haas

33. Finding **Inefficient** Queries

34. Finding **Inefficient** Queries

35. Latency vs Potential EXPLAIN ANALYZE http://bit.ly/pg-explain ‣ Explain displays the execution plan

36. Latency vs Potential EXPLAIN ANALYZE http://bit.ly/pg-explain ‣ Explain displays the execution plan ‣ Analyze runs it and gathers stats

37. Latency vs Potential EXPLAIN ANALYZE Merge Right Join (cost=25870.55..31017.51 rows=229367 width=92) (actual time=2884.501..5147.047 rows=354834 loops=1) Merge Cond: (a.uid = b.uid) -> Index Scan using foo on bar a (cost=0.00..537.29 rows=9246 width=27) (actual time=0.049..41.782 rows=9246 loops=1) -> Materialize (cost=25870.49..27204.80 rows=106745 width=81) (actual time=2884.413..3804.537 rows=354834 loops=1) -> Sort (cost=25870.49..26137.35 rows=106745 width=81) (actual time=2884.406..3099.732 rows=111878 loops=1) Sort Key: b.uid Sort Method: external merge Disk: 8928kB … Total runtime: 5588.105 ms (14 rows) http://bit.ly/pg-auto-explain

38. “Aside from shared_buffers, the most important memory-allocation parameter is work_mem… Raising this value can dramatically improve the performance of certain queries, but it's important not to overdo it.” Robert Haas

39. Track Connections

40. Connections By Application

41. Summary 1. Collect as many metrics as you can, before you need them 2. If the metrics that you have aren’t providing the right value, build ones that do 3. Be aggressive in monitoring slow queries, catch them while they’re easy to find

42. Resources ‣ http://dtdg.co/monitor-postgres ‣ https://dtdg.co/gcp-sql ‣ https://dtdg.co/postgresql-vacuums

43. Questions? Seth Rosenblum @SethRosenblum seth@datadoghq.com

Monitoring and scaling postgres at datadog

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Monitoring and scaling postgres at datadog

Similar a Monitoring and scaling postgres at datadog (20)

Último

Último (20)

Monitoring and scaling postgres at datadog