SlideShare una empresa de Scribd logo
1 de 34
Descargar para leer sin conexión
Practical
Partitioning in
Production with
Postgres
Jimmy Angelakos
Senior PostgreSQL Architect
Postgres London 12/05/2021
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
2
We’ll be looking at:
• Intro to Partitioning in PostgreSQL
• Why?
• How?
• Practical Example
Introduction to
Partitioning in
PostgreSQL
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
4
• RDBMS context: division of a table into distinct independent tables
• Horizontal partitioning (by row) – different rows in different tables
• Why?
– Easier to manage
– Performance
What is partitioning?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
5
• Has had partitioning for quite some time now PG 8.1 (2005)
…
– Inheritance-based
– Why haven’t I heard of this before?
– It’s not great tbh...
• Declarative Partitioning: PG 10 (2017)
– Massive improvement
Partitioning in PostgreSQL
HISTORY
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
6
Declarative Partitioning
( PG 10+ )
Specification of: By declaring a table (DDL):
• Partitioning method
• Partition key
– Column(s) or expression(s)
– Value determines data routing
• Partition boundaries
CREATE TABLE cust (id INT, name TEXT)
PARTITION BY RANGE (id);
CREATE TABLE cust_1000
PARTITION OF cust FOR VALUES FROM
(1) TO (1000);
• Partitions may be partitioned
themselves (sub-partitioning)
Why?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
8
• Database size: unlimited ✅
• Tables per database: 1.4 billion ✅
• Table size: 32 TB 😐
– Default block size: 8192 bytes
• Rows per table: depends
– As many as can fit onto 4.2 billion blocks 😐
PostgreSQL limits
(Hard limits, hard to reach)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
9
• Disk size limitations
– You can put partitions on different tablespaces
• Performance
– Partition pruning
– Table scans
– Index scans
– Hidden pitfalls of very large tables*
What partitioning can help with (i)
(Very large tables)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
10
• Maintenance
– Deletions (some filesystems are bad at deleting large numbers of files)
🤭
– DROP TABLE cust_1000;
– ALTER TABLE cust DETACH PARTITION cust_1000;
• VACUUM
– Bloat
– Freezing → xid wraparound
What partitioning can help with (ii)
(Very large tables)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
11
• Magic bullet
– No substitute for rational database design
• Sharding
– Not about putting part of the data on different nodes
• Performance tuning
– Unless you have one of the mentioned issues
What partitioning is not
How?
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
13
• Get your calculator out
– Data ingestion rate (both rows and size in bytes)
– Projected increases (e.g. 25 locations projected to be 200 by end of year)
– Data retention requirements
• Will inform choice of partitioning method and key
• For instance: 1440 measurements/day from each of 1000 sensors – extrapolate per year
• Keep checking if this is valid and be prepared to revise
Dimensioning
Plan ahead!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
14
• Range: For key column(s) e.g. ranges of dates, identifiers, etc.
– Lower end: inclusive, upper end: exclusive
• List: Explicit key values stated for each partition
• Hash (PG 11+): If you have a column with values close to unique
– Define Modulus ( & remainder ) for number of almost-evenly-sized partitions
Partitioning method
Dimensioning usually makes this clearer
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
15
• Analysis
– Determine main keys used for retrieval from queries
– Proper key selection enables partition pruning
– Can use multiple columns for higher granularity (more partitions)
• Desirable
– High enough cardinality (range of values) for the number of partitions needed
– A column that doesn’t change often, to avoid moving rows among partitions
Partition Key selection
Choose wisely - know your data!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
16
• Automatic creation of partitions
– Create in advance
– Use a cronjob
• Imperative merging/splitting of partitions
– Move rows manually
• Sharding to different nodes
– You may have to configure FDW manually
What Postgres does not do
^core
Practical
Example
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
18
• Is your table too large to handle?
• Can partitioning help?
• What if it’s in constant use?
Partitioning a live production system
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
19
• OLTP workload, transactions keep flowing in
– Table keeps increasing in size
• VACUUM never ends
– Has been running for a full month already…
• Queries are getting slower
– Not just because of sheer number of rows...
The situation
Huge 20 TB table
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
20
• Postgres has 1GB segment size
– Can only be changed at
compilation time
– 20 TB table = 20000 segments
(files on disk)
• Why is this a problem?
– md.c →
* Hidden performance pitfall (i)
For VERY large tables
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
21
●
This loops 20000 times every time you
want to access a table page
– Linked list of segments
●
Code from PG 9.6
●
It has been heavily optimised recently
(caching, etc).
●
Still needs to run a lot of times
* Hidden performance pitfall (ii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
22
• Need to partition the huge table
– Dimensioning
– Partition method
– Partition key
• Make sure we’re on the latest version (PG 13)
– Get latest features & performance enhancements
So what do we do?
Next steps
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
23
• Dimensioning
– One partition per month will be about 30GB of data, so acceptable size
• Method, Key
– Candidate key is transaction date, which we can partition by range
– Check that there are no data errors (e.g. dates in the future when they shouldn’t be)
• Partition sizes don’t have to be equal
– We can partition older, less often accessed data by year
What is our table like?
It holds daily transaction totals for each point of sales
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
24
• Lock the table totally (ACCESS EXCLUSIVE) or prevent writes
– People will start yelling, and they will be right
• Cause excessive load on the system (e.g. I/O) or cause excessive disk space usage
– Can’t copy whole 20 TB table into empty partitioned table
– See above about yelling
• Present an inconsistent or incomplete view of the data
Problems
What things you cannot do in production
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
25
• Rename the huge table and its indices
• Create an empty partitioned table with the old huge table’s name
• Create the required indices on the new partitioned table
– They will be created automatically for each new partition
• Create first new partition for new incoming data
• Attach the old table as a partition of the new table so it can be used normally*
• Move data out of the old table incrementally at our own pace
The plan
Take it step by step
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
26
-- Do this all in one transaction
BEGIN;
ALTER TABLE dailytotals RENAME TO dailytotals_legacy;
ALTER INDEX dailytotals_batchid RENAME TO dailytotals_legacy_batchid;
ALTER INDEX …
…
Rename the huge table and its indices
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
27
CREATE TABLE dailytotals (
totalid BIGINT NOT NULL DEFAULT nextval('dailytotals_totalid_seq')
, totaldate DATE NOT NULL
, totalsum BIGINT
…
, batchid BIGINT NOT NULL
)
PARTITION BY RANGE (totaldate);
CREATE INDEX dailytotals_batchid ON dailytotals (batchid);
…
Create empty partitioned table & indices
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
28
CREATE TABLE dailytotals_202106
PARTITION OF dailytotals
FOR VALUES FROM ('2021-06-01') TO ('2021-07-01');
Create partition for new incoming data
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
29
DO $$
DECLARE earliest DATE;
DECLARE latest DATE;
BEGIN
-- Set boundaries
SELECT min(totaldate) INTO earliest FROM dailytotals_legacy;
latest := '2021-06-01'::DATE;
Attach old table as a partition (i)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
30
-- HACK HACK HACK (only because we know and trust our data)
ALTER TABLE dailytotals_legacy
ADD CONSTRAINT dailytotals_legacy_totaldate
CHECK (totaldate >= earliest AND totaldate < latest)
NOT VALID;
-- You should not touch pg_catalog directly 😕
UPDATE pg_constraint
SET convalidated = true
WHERE conname = 'dailytotals_legacy_totaldate';
Attach old table as a partition (ii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
31
ALTER TABLE dailytotals
ATTACH PARTITION dailytotals_legacy
FOR VALUES FROM (earliest) TO (latest);
END;
$$ LANGUAGE PLPGSQL;
COMMIT;
Attach old table as a partition (iii)
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
32
• For instance, during quiet hours for the system, in scheduled batch jobs, etc.
WITH rows AS (
DELETE FROM dailytotals_legacy d
WHERE (totaldate >= '2020-01-01' AND totaldate < '2021-01-01')
RETURNING d.* )
INSERT INTO dailytotals SELECT * FROM rows;
• Make sure the partition exists!
Move data into new table at our own pace
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
33
• PG13: Logical replication for partitioned tables, improved performance (joins, pruning)
• PG12: Performance (pruning, COPY), FK references for partitioned tables
• PG11: DEFAULT partition, UPDATE on partition key, Hash method, PKs, FKs, Indexes, Triggers
Partitioning improvements
Make sure you’re on the latest release so you have them!
© Copyright EnterpriseDB Corporation, 2021. All rights reserved.
34
• Know your data!
• Upgrade – be on the latest release!
• Partition before you get in deep water!
• Find me on Twitter: @vyruss
To conclude...

Más contenido relacionado

La actualidad más candente

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?Mydbops
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performanceoysteing
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)Noriyoshi Shinoda
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDBMongoDB
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQLJim Mlodgenski
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBMydbops
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...Databricks
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDatabricks
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxNeoClova
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpen Gurukul
 

La actualidad más candente (20)

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?MongoDB sharded cluster. How to design your topology ?
MongoDB sharded cluster. How to design your topology ?
 
PostgreSQL replication
PostgreSQL replicationPostgreSQL replication
PostgreSQL replication
 
Postgresql
PostgresqlPostgresql
Postgresql
 
How to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better PerformanceHow to Analyze and Tune MySQL Queries for Better Performance
How to Analyze and Tune MySQL Queries for Better Performance
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)PGConf.ASIA 2017 Logical Replication Internals (English)
PGConf.ASIA 2017 Logical Replication Internals (English)
 
Indexing with MongoDB
Indexing with MongoDBIndexing with MongoDB
Indexing with MongoDB
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
Introduction to PostgreSQL
Introduction to PostgreSQLIntroduction to PostgreSQL
Introduction to PostgreSQL
 
Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDBHistogram-in-Parallel-universe-of-MySQL-and-MariaDB
Histogram-in-Parallel-universe-of-MySQL-and-MariaDB
 
EDB Postgres DBA Best Practices
EDB Postgres DBA Best PracticesEDB Postgres DBA Best Practices
EDB Postgres DBA Best Practices
 
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
From Query Plan to Query Performance: Supercharging your Apache Spark Queries...
 
Dynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache SparkDynamic Partition Pruning in Apache Spark
Dynamic Partition Pruning in Apache Spark
 
MySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptxMySQL8.0_performance_schema.pptx
MySQL8.0_performance_schema.pptx
 
Mongo db intro.pptx
Mongo db intro.pptxMongo db intro.pptx
Mongo db intro.pptx
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQL
 

Similar a Practical Partitioning in Production with Postgres

Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresJimmy Angelakos
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13EDB
 
Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Jimmy Angelakos
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!EDB
 
Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and DjangoEDB
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-FeaturesNavneet Upneja
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSCuneyt Goksu
 
GLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesGLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesBiju Thomas
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable영원 서
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Ashnikbiz
 
PostgreSQL - Decoding Partitions
PostgreSQL - Decoding PartitionsPostgreSQL - Decoding Partitions
PostgreSQL - Decoding PartitionsBeena Emerson
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBhawani N Prasad
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-featuresNavneet Upneja
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopDataWorks Summit
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGuang Xu
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAACuneyt Goksu
 
Redefining tables online without surprises
Redefining tables online without surprisesRedefining tables online without surprises
Redefining tables online without surprisesNelson Calero
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeKent Graziano
 

Similar a Practical Partitioning in Production with Postgres (20)

Practical Partitioning in Production with Postgres
Practical Partitioning in Production with PostgresPractical Partitioning in Production with Postgres
Practical Partitioning in Production with Postgres
 
New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13New and Improved Features in PostgreSQL 13
New and Improved Features in PostgreSQL 13
 
Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]Don't Do This [FOSDEM 2023]
Don't Do This [FOSDEM 2023]
 
PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!PostgreSQL 13 is Coming - Find Out What's New!
PostgreSQL 13 is Coming - Find Out What's New!
 
Large Table Partitioning with PostgreSQL and Django
 Large Table Partitioning with PostgreSQL and Django Large Table Partitioning with PostgreSQL and Django
Large Table Partitioning with PostgreSQL and Django
 
ORACLE 12C-New-Features
ORACLE 12C-New-FeaturesORACLE 12C-New-Features
ORACLE 12C-New-Features
 
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OSIDUG NA 2014 / 11 tips for DB2 11 for z/OS
IDUG NA 2014 / 11 tips for DB2 11 for z/OS
 
GLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New FeaturesGLOC 2014 NEOOUG - Oracle Database 12c New Features
GLOC 2014 NEOOUG - Oracle Database 12c New Features
 
Google - Bigtable
Google - BigtableGoogle - Bigtable
Google - Bigtable
 
Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus Powering GIS Application with PostgreSQL and Postgres Plus
Powering GIS Application with PostgreSQL and Postgres Plus
 
PostgreSQL - Decoding Partitions
PostgreSQL - Decoding PartitionsPostgreSQL - Decoding Partitions
PostgreSQL - Decoding Partitions
 
Gcp data engineer
Gcp data engineerGcp data engineer
Gcp data engineer
 
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasadBigdata netezza-ppt-apr2013-bhawani nandan prasad
Bigdata netezza-ppt-apr2013-bhawani nandan prasad
 
Oracle 12 c new-features
Oracle 12 c new-featuresOracle 12 c new-features
Oracle 12 c new-features
 
Challenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on HadoopChallenges of Implementing an Advanced SQL Engine on Hadoop
Challenges of Implementing an Advanced SQL Engine on Hadoop
 
GCP Data Engineer cheatsheet
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
 
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAATemporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
Temporal Tables, Transparent Archiving in DB2 for z/OS and IDAA
 
Redefining tables online without surprises
Redefining tables online without surprisesRedefining tables online without surprises
Redefining tables online without surprises
 
Big table
Big tableBig table
Big table
 
Intro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on SnowflakeIntro to Data Vault 2.0 on Snowflake
Intro to Data Vault 2.0 on Snowflake
 

Más de EDB

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSEDB
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenEDB
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube EDB
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EDB
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLEDB
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLEDB
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLEDB
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?EDB
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLEDB
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINEDB
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQLEDB
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLEDB
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!EDB
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesEDB
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJEDB
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLEDB
 

Más de EDB (20)

Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaSCloud Migration Paths: Kubernetes, IaaS, or DBaaS
Cloud Migration Paths: Kubernetes, IaaS, or DBaaS
 
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr UnternehmenDie 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
Die 10 besten PostgreSQL-Replikationsstrategien für Ihr Unternehmen
 
Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube Migre sus bases de datos Oracle a la nube
Migre sus bases de datos Oracle a la nube
 
EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021EFM Office Hours - APJ - July 29, 2021
EFM Office Hours - APJ - July 29, 2021
 
Benchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQLBenchmarking Cloud Native PostgreSQL
Benchmarking Cloud Native PostgreSQL
 
Las Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQLLas Variaciones de la Replicación de PostgreSQL
Las Variaciones de la Replicación de PostgreSQL
 
NoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQLNoSQL and Spatial Database Capabilities using PostgreSQL
NoSQL and Spatial Database Capabilities using PostgreSQL
 
Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?Is There Anything PgBouncer Can’t Do?
Is There Anything PgBouncer Can’t Do?
 
Data Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQLData Analysis with TensorFlow in PostgreSQL
Data Analysis with TensorFlow in PostgreSQL
 
A Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAINA Deeper Dive into EXPLAIN
A Deeper Dive into EXPLAIN
 
IOT with PostgreSQL
IOT with PostgreSQLIOT with PostgreSQL
IOT with PostgreSQL
 
A Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQLA Journey from Oracle to PostgreSQL
A Journey from Oracle to PostgreSQL
 
Psql is awesome!
Psql is awesome!Psql is awesome!
Psql is awesome!
 
EDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJEDB 13 - New Enhancements for Security and Usability - APJ
EDB 13 - New Enhancements for Security and Usability - APJ
 
Comment sauvegarder correctement vos données
Comment sauvegarder correctement vos donnéesComment sauvegarder correctement vos données
Comment sauvegarder correctement vos données
 
Cloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - ItalianoCloud Native PostgreSQL - Italiano
Cloud Native PostgreSQL - Italiano
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 
Cloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJCloud Native PostgreSQL - APJ
Cloud Native PostgreSQL - APJ
 
Best Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQLBest Practices in Security with PostgreSQL
Best Practices in Security with PostgreSQL
 

Último

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Practical Partitioning in Production with Postgres

  • 1. Practical Partitioning in Production with Postgres Jimmy Angelakos Senior PostgreSQL Architect Postgres London 12/05/2021
  • 2. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 2 We’ll be looking at: • Intro to Partitioning in PostgreSQL • Why? • How? • Practical Example
  • 4. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 4 • RDBMS context: division of a table into distinct independent tables • Horizontal partitioning (by row) – different rows in different tables • Why? – Easier to manage – Performance What is partitioning?
  • 5. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 5 • Has had partitioning for quite some time now PG 8.1 (2005) … – Inheritance-based – Why haven’t I heard of this before? – It’s not great tbh... • Declarative Partitioning: PG 10 (2017) – Massive improvement Partitioning in PostgreSQL HISTORY
  • 6. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 6 Declarative Partitioning ( PG 10+ ) Specification of: By declaring a table (DDL): • Partitioning method • Partition key – Column(s) or expression(s) – Value determines data routing • Partition boundaries CREATE TABLE cust (id INT, name TEXT) PARTITION BY RANGE (id); CREATE TABLE cust_1000 PARTITION OF cust FOR VALUES FROM (1) TO (1000); • Partitions may be partitioned themselves (sub-partitioning)
  • 8. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 8 • Database size: unlimited ✅ • Tables per database: 1.4 billion ✅ • Table size: 32 TB 😐 – Default block size: 8192 bytes • Rows per table: depends – As many as can fit onto 4.2 billion blocks 😐 PostgreSQL limits (Hard limits, hard to reach)
  • 9. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 9 • Disk size limitations – You can put partitions on different tablespaces • Performance – Partition pruning – Table scans – Index scans – Hidden pitfalls of very large tables* What partitioning can help with (i) (Very large tables)
  • 10. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 10 • Maintenance – Deletions (some filesystems are bad at deleting large numbers of files) 🤭 – DROP TABLE cust_1000; – ALTER TABLE cust DETACH PARTITION cust_1000; • VACUUM – Bloat – Freezing → xid wraparound What partitioning can help with (ii) (Very large tables)
  • 11. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 11 • Magic bullet – No substitute for rational database design • Sharding – Not about putting part of the data on different nodes • Performance tuning – Unless you have one of the mentioned issues What partitioning is not
  • 12. How?
  • 13. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 13 • Get your calculator out – Data ingestion rate (both rows and size in bytes) – Projected increases (e.g. 25 locations projected to be 200 by end of year) – Data retention requirements • Will inform choice of partitioning method and key • For instance: 1440 measurements/day from each of 1000 sensors – extrapolate per year • Keep checking if this is valid and be prepared to revise Dimensioning Plan ahead!
  • 14. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 14 • Range: For key column(s) e.g. ranges of dates, identifiers, etc. – Lower end: inclusive, upper end: exclusive • List: Explicit key values stated for each partition • Hash (PG 11+): If you have a column with values close to unique – Define Modulus ( & remainder ) for number of almost-evenly-sized partitions Partitioning method Dimensioning usually makes this clearer
  • 15. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 15 • Analysis – Determine main keys used for retrieval from queries – Proper key selection enables partition pruning – Can use multiple columns for higher granularity (more partitions) • Desirable – High enough cardinality (range of values) for the number of partitions needed – A column that doesn’t change often, to avoid moving rows among partitions Partition Key selection Choose wisely - know your data!
  • 16. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 16 • Automatic creation of partitions – Create in advance – Use a cronjob • Imperative merging/splitting of partitions – Move rows manually • Sharding to different nodes – You may have to configure FDW manually What Postgres does not do ^core
  • 18. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 18 • Is your table too large to handle? • Can partitioning help? • What if it’s in constant use? Partitioning a live production system
  • 19. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 19 • OLTP workload, transactions keep flowing in – Table keeps increasing in size • VACUUM never ends – Has been running for a full month already… • Queries are getting slower – Not just because of sheer number of rows... The situation Huge 20 TB table
  • 20. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 20 • Postgres has 1GB segment size – Can only be changed at compilation time – 20 TB table = 20000 segments (files on disk) • Why is this a problem? – md.c → * Hidden performance pitfall (i) For VERY large tables
  • 21. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 21 ● This loops 20000 times every time you want to access a table page – Linked list of segments ● Code from PG 9.6 ● It has been heavily optimised recently (caching, etc). ● Still needs to run a lot of times * Hidden performance pitfall (ii)
  • 22. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 22 • Need to partition the huge table – Dimensioning – Partition method – Partition key • Make sure we’re on the latest version (PG 13) – Get latest features & performance enhancements So what do we do? Next steps
  • 23. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 23 • Dimensioning – One partition per month will be about 30GB of data, so acceptable size • Method, Key – Candidate key is transaction date, which we can partition by range – Check that there are no data errors (e.g. dates in the future when they shouldn’t be) • Partition sizes don’t have to be equal – We can partition older, less often accessed data by year What is our table like? It holds daily transaction totals for each point of sales
  • 24. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 24 • Lock the table totally (ACCESS EXCLUSIVE) or prevent writes – People will start yelling, and they will be right • Cause excessive load on the system (e.g. I/O) or cause excessive disk space usage – Can’t copy whole 20 TB table into empty partitioned table – See above about yelling • Present an inconsistent or incomplete view of the data Problems What things you cannot do in production
  • 25. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 25 • Rename the huge table and its indices • Create an empty partitioned table with the old huge table’s name • Create the required indices on the new partitioned table – They will be created automatically for each new partition • Create first new partition for new incoming data • Attach the old table as a partition of the new table so it can be used normally* • Move data out of the old table incrementally at our own pace The plan Take it step by step
  • 26. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 26 -- Do this all in one transaction BEGIN; ALTER TABLE dailytotals RENAME TO dailytotals_legacy; ALTER INDEX dailytotals_batchid RENAME TO dailytotals_legacy_batchid; ALTER INDEX … … Rename the huge table and its indices
  • 27. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 27 CREATE TABLE dailytotals ( totalid BIGINT NOT NULL DEFAULT nextval('dailytotals_totalid_seq') , totaldate DATE NOT NULL , totalsum BIGINT … , batchid BIGINT NOT NULL ) PARTITION BY RANGE (totaldate); CREATE INDEX dailytotals_batchid ON dailytotals (batchid); … Create empty partitioned table & indices
  • 28. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 28 CREATE TABLE dailytotals_202106 PARTITION OF dailytotals FOR VALUES FROM ('2021-06-01') TO ('2021-07-01'); Create partition for new incoming data
  • 29. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 29 DO $$ DECLARE earliest DATE; DECLARE latest DATE; BEGIN -- Set boundaries SELECT min(totaldate) INTO earliest FROM dailytotals_legacy; latest := '2021-06-01'::DATE; Attach old table as a partition (i)
  • 30. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 30 -- HACK HACK HACK (only because we know and trust our data) ALTER TABLE dailytotals_legacy ADD CONSTRAINT dailytotals_legacy_totaldate CHECK (totaldate >= earliest AND totaldate < latest) NOT VALID; -- You should not touch pg_catalog directly 😕 UPDATE pg_constraint SET convalidated = true WHERE conname = 'dailytotals_legacy_totaldate'; Attach old table as a partition (ii)
  • 31. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 31 ALTER TABLE dailytotals ATTACH PARTITION dailytotals_legacy FOR VALUES FROM (earliest) TO (latest); END; $$ LANGUAGE PLPGSQL; COMMIT; Attach old table as a partition (iii)
  • 32. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 32 • For instance, during quiet hours for the system, in scheduled batch jobs, etc. WITH rows AS ( DELETE FROM dailytotals_legacy d WHERE (totaldate >= '2020-01-01' AND totaldate < '2021-01-01') RETURNING d.* ) INSERT INTO dailytotals SELECT * FROM rows; • Make sure the partition exists! Move data into new table at our own pace
  • 33. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 33 • PG13: Logical replication for partitioned tables, improved performance (joins, pruning) • PG12: Performance (pruning, COPY), FK references for partitioned tables • PG11: DEFAULT partition, UPDATE on partition key, Hash method, PKs, FKs, Indexes, Triggers Partitioning improvements Make sure you’re on the latest release so you have them!
  • 34. © Copyright EnterpriseDB Corporation, 2021. All rights reserved. 34 • Know your data! • Upgrade – be on the latest release! • Partition before you get in deep water! • Find me on Twitter: @vyruss To conclude...