This presentation covers a number of the way that you can tune PostgreSQL to better handle high write workloads. We will cover both application and database tuning methods as each type can have substantial benefits but can also interact in unexpected ways when you are operating at scale. On the application side we will look at write batching, use of GUID's, general index structure, the cost of additional indexes and impact of working set size. For the database we will see how wal compression, auto vacuum and checkpoint settings as well as a number of other configuration parameters can greatly affect the write performance of your database and application.
8. Insert Test
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
10. Update Test
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
UPDATE #1
UPDATE #2
24. Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
25. Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
26. Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
27. Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
During crash
recovery
PostgreSQL
uses the FPW
block in the
WAL to replace
the bad
checkpointed
block
28. Full Page Writes
Block in
Memory
PostgreSQL
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
4K
4K
8K
During crash
recovery
PostgreSQL
uses the FPW
block in the
WAL to replace
the bad
checkpointed
block
29. WAL throughput – Dump of wal
Btree d: INSERT_LEAF off 184, blkref #0: rel 1663/32772/32779 blk 300
Btree d: INSERT_LEAF off 110, blkref #0: rel 1663/32772/32784 blk 1092
Btree d: INSERT_LEAF off 41, blkref #0: rel 1663/32772/32782 blk 5752
Btree d: INSERT_LEAF off 40, blkref #0: rel 1663/32772/32782 blk 8000
Btree d: INSERT_LEAF off 89, blkref #0: rel 1663/32772/32779 blk 1757
Btree d: INSERT_LEAF off 363, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 77, blkref #0: rel 1663/32772/32783 blk 4
Btree d: INSERT_LEAF off 94, blkref #0: rel 1663/32772/32779 blk 2083
Btree d: INSERT_LEAF off 362, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 10, blkref #0: rel 1663/32772/32782 blk 7687
Btree d: INSERT_LEAF off 365, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 114, blkref #0: rel 1663/32772/32784 blk 791
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32783 blk 2213
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32785 blk 1639
Btree d: INSERT_LEAF off 209, blkref #0: rel 1663/32772/32784 blk 1433
Transaction d: COMMIT 2017-09-07 01:08:55.354810 UTC
At the beginning of the run
30. WAL throughput – Dump of wal
Btree d: INSERT_LEAF off 184, blkref #0: rel 1663/32772/32779 blk 300
Btree d: INSERT_LEAF off 110, blkref #0: rel 1663/32772/32784 blk 1092
Btree d: INSERT_LEAF off 41, blkref #0: rel 1663/32772/32782 blk 5752
Btree d: INSERT_LEAF off 40, blkref #0: rel 1663/32772/32782 blk 8000
Btree d: INSERT_LEAF off 89, blkref #0: rel 1663/32772/32779 blk 1757
Btree d: INSERT_LEAF off 363, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 77, blkref #0: rel 1663/32772/32783 blk 4
Btree d: INSERT_LEAF off 94, blkref #0: rel 1663/32772/32779 blk 2083
Btree d: INSERT_LEAF off 362, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 10, blkref #0: rel 1663/32772/32782 blk 7687
Btree d: INSERT_LEAF off 365, blkref #0: rel 1663/32772/32781 blk 1355
Btree d: INSERT_LEAF off 114, blkref #0: rel 1663/32772/32784 blk 791
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32783 blk 2213
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/32772/32785 blk 1639
Btree d: INSERT_LEAF off 209, blkref #0: rel 1663/32772/32784 blk 1433
Transaction d: COMMIT 2017-09-07 01:08:55.354810 UTC
Btree d: INSERT_LEAF off 216, blkref #0: rel 1663/16395/16407 blk 14331 FPW
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 139, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 59, blkref #0: rel 1663/16395/16407 blk 17944 FPW
Btree d: INSERT_LEAF off 45, blkref #0: rel 1663/16395/16408 blk 17
Btree d: INSERT_LEAF off 252, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 135, blkref #0: rel 1663/16395/16408 blk 7
Btree d: INSERT_LEAF off 5, blkref #0: rel 1663/16395/16405 blk 131373 FPW
Btree d: INSERT_LEAF off 175, blkref #0: rel 1663/16395/16404 blk 25954
Btree d: INSERT_LEAF off 19, blkref #0: rel 1663/16395/16405 blk 40974 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 19, blkref #0: rel 1663/16395/16405 blk 143873 FPW
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 14, blkref #0: rel 1663/16395/16405 blk 37468 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 84, blkref #0: rel 1663/16395/16407 blk 2696
Btree d: INSERT_LEAF off 149, blkref #0: rel 1663/16395/16407 blk 1401 FPW
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16406 blk 39718
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16410 blk 29411
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16408 blk 29370
Btree d: INSERT_LEAF off 123, blkref #0: rel 1663/16395/16406 blk 5
Btree d: INSERT_LEAF off 2, blkref #0: rel 1663/16395/16409 blk 1
Btree d: INSERT_LEAF off 24, blkref #0: rel 1663/16395/16405 blk 69991 FPW
Transaction d: COMMIT 2017-09-07 01:04:32.650362 UTC
At the beginning of the run Later in the run
1K vs 48K of data
38. WAL Compression
Block in
Memory
update t set y = 6;
Block on
Disk
Checkpoint
Datafile
WAL
Archive
Compressed
BlockCompressed
Block
Lot of Random
Values
=
Poor
Compression
41. Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
42. Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
Compressed – Average 5.2KB per FPW
43. Why didn’t WAL compression help
Regular – Average 5.7KB per FPW
Compressed – Average 5.2KB per FPW
Random Data does not compress well
44. 0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
max_wal_size=16GB
More Blocks + Random Inserts = More FPW
45. Randomness and Size of the Data
Assume 10K Random Updates per Second
Assume checkpoint every 60 second
Touch 10K x 60 = 600K blocks between checkpoints
1GB table is ~130K blocks - Touch every block 4 times
100GB table is 13M blocks - Low chance of same block touch
61. Remember those 9 Indexes – Cut it to 6
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
Remove Index
Remove Index
Remove Index – Allows HOT Update
67. HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
68. HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c1, d, e, f
69. HOT (Heap-Only-Tuple) Update
Not HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c, d, e, f1
a’
b’
f1
a, b, c1, d, e, f
HOT
a
b
f
Indexes Heap
a, b, c, d, e, f
a, b, c1, d, e, f
70. HOT Updates – Looking at FPW in the logs
HOT Updated
Heap 14/ 68, , d: HOT_UPDATE off 19 xmax 2327993188 ; new off 3 xmax 0, blkref #0: rel 1663/41083/41086 blk 28
XLOG 0/ 3368, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 1492899 FPW
Transaction 8/ 34, , d: COMMIT 2017-09-07 00:07:17.532647 UTC
Non HOT Update
Heap 14/ 75, , d: UPDATE off 67 xmax 2327993195 ; new off 7 xmax 0, blkref #0: rel 1663/41083/41086 blk 285
XLOG 0/ 2774, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41090 blk 7039952 FPW
Btree 2/ 120, , d: INSERT_LEAF off 17, blkref #0: rel 1663/41083/41090 blk 7039952
XLOG 0/ 3150, , d: FPI_FOR_HINT , blkref #0: rel 1663/41083/41092 blk 29 FPW
Btree 2/ 64, , d: INSERT_LEAF off 205, blkref #0: rel 1663/41083/41092 blk 29
Btree 2/ 2639, , d: INSERT_LEAF off 73, blkref #0: rel 1663/41083/41093 blk 4 FPW
Btree 2/ 3148, , d: INSERT_LEAF off 2, blkref #0: rel 1663/41083/41094 blk 1 FPW
Btree 2/ 5099, , d: INSERT_LEAF off 364, blkref #0: rel 1663/41083/41095 blk 4237904 FPW
Transaction 8/ 34, , d: COMMIT 2017-09-07 00:24:29.427017 UTC
3.4K VS 16.7K
71. HOT Updates – How to Track
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
0 | 0
sfo=> update benchmark_uuid set e=cast(0 as boolean) where id = 1000;
UPDATE 1
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
1 | 1
sfo=> update benchmark_uuid set last_updated=CURRENT_TIMESTAMP where id=1001;
UPDATE 1
sfo=> select n_tup_upd, n_tup_hot_upd from pg_stat_all_tables where relname = 'benchmark_uuid';
n_tup_upd | n_tup_hot_upd
-----------+---------------
2 | 1
HOT UPDATE
=
!=
74. Remember those 9 Indexes – Cut it to 6
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
82. Constraining Random Values
Prefix UUID with a date
• 550e8400-e29b-41d4-a716-446655440000
• YYYYMMDDHH24-UUID
• Example
• 2010022712-550e8400-e29b-41d4-a716-446655440000
• 2010022713-550e8400-e29b-41d4-a716-446655440000
• Balance the number of hot blocks vs contention
• More date precision
• less random to b-tree = less blocks touched
• Possibly more contention on the leaf blocks
83. Change PK to UUID like and remove 3 indexes
Test Table
• UUID PK – Random
• ID int – Right Lean Sequence
• VARCHAR(100) – Random
• VARCHAR(50) – Small Set of Words
• INT – Random
• INT – Random (smaller set)
• BOOLEAN – Random (50/50)
• BOOLEAN – Somewhat Random (75/25)
• Timestamp – Right Lean
Remove Index
Remove Index
Remove Index – Allows HOT Update
Remove Index
Remove Index
Make non random – right lean
84. 0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL Reduced Indexes Non Random GUID
115. Vacuum Freeze in Memory
Block in
Memory
Checkpoint
Datafile
WAL
Archive
Frozen
116. Vacuum in Memory Continued
• Increase checkpoint_timeout
• alter table X set (vacuum settings);
• Manual Test
• Vacuum in Memory before checkpoint – 3.5 seconds
• Vacuum in Memory after checkpoint – 84.5 seconds
• Vacuum not in Memory – 165.8 seconds
141. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
WAL
Block in
Memory
Aurora
Storage
142. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
143. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
144. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
145. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
146. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
147. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
148. 0
5,000
10,000
15,000
20,000
25,000
30,000
1 31 61 91 121 151 181 211 241 271
InsertsPerSecond
Minutes
Insert Workload- PostgreSQL 9.6
BASE WAL Compression 16GB Max WAL
Reduced Indexes Non Random GUID Aurora PostgreSQL