We will begin with a quick overview of the Amazon RDS service and how it achieves durability and high availability. Then we will do a deep dive into the exciting new features we recently released, including 9.6, snapshot sharing, enhancements to encryption, vacuum, and replication. We will also explore lessons we have learned managing a large fleet of PostgreSQL instances, including important tunables and possible gotchas around pg_upgrade. During the session we also briefly cover our newly announced Aurora PostgreSQL compatible edition. We will wrap up the session with benchmarking of new RDS instance classes, and the value proposition of these new instance types.
2. Amazon Aurora with PostgreSQL Compatibility
• PostgreSQL 9.6+
• Cloud Optimized
• Log based
• 6 Copies across 3 Availability Zones
• Up to 15 Read Replicas
• Faster Failover
• Enhanced Scaling
• Autoscaling of storage to 64TB
Logging + Storage
SQL
Transactions
Caching
Amazon S3
3. RDS Version Updates
New Major Version – 9.6
New Minor Releases (soon)
• 9.6.2
• 9.5.6
• 9.4.11
• 9.3.16
4. Extension Support Additions
9.6.1 bloom & pg_visibility
9.6.2 log_fdw, pg_hint_plan & pg_freespacemap
rds-postgres-extensions-request@amazon.com
9.3 Original - 32
9.3 Current - 35
9.4 Current - 39
9.5 Current - 44
Future - ???
9.6 Current - 49
5. log_fdw
set log_destination to csvlog
postgres=> create extension log_fdw;
postgres=> CREATE SERVER log_fdw_server FOREIGN DATA WRAPPER log_fdw;
postgres=> select * from list_postgres_log_files();
file_name | file_size_bytes
----------------------------------+-----------------
postgresql.log.2017-03-28-17.csv | 2068
postgres.log | 617
postgres=> select
create_foreign_table_for_log_file('pg_csv_log','log_fdw_server','postgresql.log.2017-03-28-17.csv');
postgres=> select log_time, message from pg_csv_log where message like 'connection%';
log_time | message
----------------------------+--------------------------------------------------------------------------------
2017-03-28 17:50:01.862+00 | connection received: host=ec2-54-174-205.compute-1.amazonaws.com port=45626
2017-03-28 17:50:01.868+00 | connection authorized: user=mike database=postgres
6. log_fdw - continued
can be done without csv
postgres=> select
create_foreign_table_for_log_file('pg_log','log_fdw_server','postgresql.log.2017-03-28-17');
postgres=> select log_entry from pg_log where log_entry like '%connection%';
log_entry
----------------------------------------------------------------------------------------------------------------------------- -----------------------
2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):[unknown]@[unknown]:[20434]:LOG: received: host=ec2-54-174-205..amazonaws.com
2017-03-28 17:50:01 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:LOG: connection authorized: user=mike database=postgres
2017-03-28 17:57:44 UTC:ec2-54-174.compute-1.amazonaws.com(45626):mike@postgres:[20434]:ERROR: column "connection" does not exist at character 143
8. pg_hint_plan - example
postgres=> EXPLAIN SELECT * FROM pgbench_branches b
postgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
QUERY PLAN
-------------------------------------------------------------------------------------------
Sort (cost=15943073.17..15993073.17 rows=20000000 width=465)
Sort Key: a.aid
-> Hash Join (cost=5.50..802874.50 rows=20000000 width=465)
Hash Cond: (a.bid = b.bid)
-> Seq Scan on pgbench_accounts a (cost=0.00..527869.00 rows=20000000 width=97)
-> Hash (cost=3.00..3.00 rows=200 width=364)
-> Seq Scan on pgbench_branches b (cost=0.00..3.00 rows=200 width=364)
postgres=> /*+ NestLoop(a b) */
postgres-> EXPLAIN SELECT * FROM pgbench_branches b
postgres-> JOIN pgbench_accounts a ON b.bid = a.bid ORDER BY a.aid;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.58..44297240.44 rows=20000000 width=465)
-> Index Scan using pgbench_accounts_pkey on pgbench_accounts a (cost=0.44..847232.44 rows=20000000 width=97)
-> Index Scan using pgbench_branches_pkey on pgbench_branches b (cost=0.14..2.16 rows=1 width=364)
Index Cond: (bid = a.bid)
11. Forcing SSL on all connections
DB
Instance
Snapshot
Application
Host
SSL
Log Backups
Security Group
12. Forcing SSL on all connections
DB
Instance
Snapshot
Application
Host
SSL
Log Backups
Security Group
VPC
13. Forcing SSL on all connections
DB
Instance
Snapshot
Application
Host
SSL
Log Backups
Security Group
VPC
Encryption at Rest
14. Forcing SSL on all connections
DB
Instance
Snapshot
Application
Host
SSL
Log Backups
Security Group
VPC
Encryption at Rest
ssl_mode=disable
15. Forcing SSL on all connections
DB
Instance
Snapshot
Application
Host
SSL
Log Backups
Security Group
VPC
Encryption at Rest
ssl_mode=disable
rds.force_ssl=1 (default 0)
24. HIPAA-eligible service & FedRAMP
• RDS PostgreSQL is now a HIPAA-eligible service
• https://aws.amazon.com/compliance/hipaa-compliance/
• FedRAMP in AWS GovCloud (US) region
• https://aws.amazon.com/compliance/fedramp/
26. Move data to the same or different database engine
Keep your apps running during the migration
Start your first migration in 10 minutes or less
Replicate within, to, or from AWS EC2 or RDS
AWS
Database Migration
Service
(DMS)
29. Customer
Premises
Application Users
EC2
or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration
Service create tables and load data
Keep your apps running during the migration
AWS Database
Migration Service
30. Customer
Premises
Application Users
EC2
or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration
Service create tables and load data
Uses change data capture to keep
them in sync
Keep your apps running during the migration
AWS Database
Migration Service
31. Customer
Premises
Application Users
EC2
or
RDS
Internet
VPN
Start a replication instance
Connect to source and target databases
Select tables, schemas, or databases
Let the AWS Database Migration
Service create tables and load data
Uses change data capture to keep
them in sync
Switch applications over to the target
at your convenience
Keep your apps running during the migration
AWS Database
Migration Service
32. AWS Database Migration Service - PostgreSQL
• Source - on premise or EC2 PostgreSQL (9.4+)
RDS (9.4.9+ or 9.5.4+ or 9.6.1+)
• Destination can be EC2 or RDS
• Initial bulk copy via consistent select
• Uses PostgreSQL logical replication support to provide
change data capture
https://aws.amazon.com/dms/
33. Schema Conversion Tool - SCT
Downloadable tool (Windows, Mac, Linux Desktop)
Source Database Target Database on Amazon RDS
Microsoft SQL Server Amazon Aurora, MySQL, PostgreSQL
MySQL PostgreSQL
Oracle Amazon Aurora, MySQL, PostgreSQL
PostgreSQL Amazon Aurora, MySQL
36. Logical Replication Support
• Supported with 9.6.1+, 9.5.4+ and 9.4.9+
• Set rds.logical_replication parameter to 1
• As user who has rds_replication & rds_superuser role
SELECT * FROM pg_create_logical_replication_slot('test_slot', 'test_decoding');
pg_recvlogical -d postgres --slot test_slot -U master --host $rds_hostname -f - --start
• Added support for Event Triggers
47. Vacuum parameters
Will auto vacuum when
• autovacuum_vacuum_threshold +
autovacuum_vacuum_scale_factor * pgclass.reltuples
How hard auto vacuum works
• autovacuum_max_workers
• autovacuum_nap_time
• autovacuum_cost_limit
• autovacuum_cost_delay
48. RDS autovacuum logging (9.4.5+)
log_autovacuum_min_duration = 5000 (i.e. 5 secs)
rds.force_autovacuum_logging_level = LOG
…[14638]:ERROR: canceling autovacuum task
…[14638]:CONTEXT: automatic vacuum of table "postgres.public.pgbench_tellers"
…[14638]:LOG: skipping vacuum of "pgbench_branches" --- lock not available
49. RDS autovacuum visibility(9.3.12, 9.4.7, 9.5.2)
pg_stat_activity
BEFORE
usename | query
----------+-------------------------------------------------------------
rdsadmin | <insufficient privilege>
rdsadmin | <insufficient privilege>
gtest | SELECT c FROM sbtest27 WHERE id BETWEEN 392582 AND 392582+4
gtest | select usename, query from pg_stat_activity
NOW
usename | query
----------+----------------------------------------------
rdsadmin | <insufficient privilege>
gtest | select usename, query from pg_stat_activity
gtest | COMMIT
rdsadmin | autovacuum: ANALYZE public.sbtest16
82. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
WAL
Block in
Memory
Aurora
Storage
83. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
84. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
85. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Full
Block
WAL
Block in
Memory
Aurora
Storage
86. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
87. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
88. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
89. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
90. Aurora – Writing Less
Block in
Memory
PostgreSQL Aurora
update t set y = 6; update t set y = 6;
Checkpoint
Datafile
Full
Block
WAL
Archive
Block in
Memory
Aurora
Storage
91. Amazon Aurora Loads Data 3x Faster
Database initialization is three times faster than PostgreSQL using the
standard PgBench benchmark
Command: pgbench -i -s 2000 –F 90
92. Amazon Aurora Delivers up to 85x Faster Recovery
SysBench oltp(write-only) 10GiB workload with 250 tables & 150,000 rows
Writes per Second 69,620
Writes per Second 32,765
Writes per Second 16,075
Writes per Second 92,415
Recovery Time (seconds) 102.0
Recovery Time (seconds) 52.0
Recovery Time (seconds) 13.0
Recovery Time (seconds) 1.2
0 20 40 60 80 100 120 140
0 20,000 40,000 60,000 80,000
PostgreSQL
12.5GB
Checkpoint
PostgreSQL
8.3GB
Checkpoint
PostgreSQL
2.1GB
Checkpoint
Amazon Aurora
No Checkpoints
Recovery Time in Seconds
Writes Per Second
Crash Recovery Time - SysBench 10GB Write Workload
Transaction-aware storage system recovers almost instantly
93. Amazon Aurora is >=2x Faster on PgBench
pgbench “tpcb-like” workload, scale 2000 (30GiB). All configurations run for 60 minutes
94. Amazon Aurora is 2x-3x Faster on SysBench
Amazon Aurora delivers 2x the absolute peak of PostgreSQL and 3x
PostgreSQL performance at high client counts
SysBench oltp(write-only) workload with 30 GB database with 250 tables and 400,000 initial rows per table
95. Amazon Aurora Gives >2x Faster Response Times
Response time under heavy write load >2x faster than PostgreSQL
(and >10x more consistent)
SysBench oltp(write-only) 23GiB workload with 250 tables and 300,000 initial rows per table. 10-minute warmup.
96. Amazon Aurora Has More Consistent Throughput
While running at load, performance is more than three times
more consistent than PostgreSQL
PgBench “tpcb-like” workload at scale 2000. Amazon Aurora was run with 1280 clients. PostgreSQL was run with
512 clients (the concurrency at which it delivered the best overall throughput)
97. Amazon Aurora is 3x Faster at Large Scale
Scales from 1.5x to 3x faster as database grows from 10 GiB to 100 GiB
SysBench oltp(write-only) – 10GiB with 250 tables & 150,000 rows and 100GiB with 250 tables & 1,500,000 rows
75,666
27,491
112,390
82,714
0
20,000
40,000
60,000
80,000
100,000
120,000
10GB 100GB
writes/sec
SysBench Test Size
SysBench write-only
PostgreSQL Amazon Aurora
Line data type
Reg* data types
Open prepared transactions
Add a Key for the encrypted snapshot and then show that it needs to be shared for this to work. Note that this doesn’t work with default keys.
Add a Key for the encrypted snapshot and then show that it needs to be shared for this to work. Note that this doesn’t work with default keys.
Add a Key for the encrypted snapshot and then show that it needs to be shared for this to work. Note that this doesn’t work with default keys.
Move data to the same or different database engine
~ Supports Oracle, Microsoft SQL Server, MySQL, PostgreSQL, MariaDB, Amazon Aurora, Amazon Redshift
Keep your apps running during the migration
~ DMS minimizes impact to users by capturing and applying data changes
Start your first migration in 10 minutes or less
~ The AWS Database Migration Service takes care of infrastructure provisioning and allows you to setup your first database migration task in less than 10 minutes
Replicate within, to or from AWS EC2 or RDS
~ After migrating your database, use the AWS Database Migration Service to replicate data into your Redshift data warehouses, cross-region to other RDS instances, or back to on-premises
Using the AWS Database Migration Service to migrate data to AWS is simple.
(CLICK) Start by spinning up a DMS instance in your AWS environment
(CLICK) Next, from within DMS, connect to both your source and target databases
(CLICK) Choose what data you want to migrate. DMS lets you migrate tables, schemas, or whole databases
Then sit back and let DMS do the rest. (CLICK) It creates the tables, loads the data, and best of all, keeps them synchronized for as long as you need
That replication capability, which keeps the source and target data in sync, allows customers to switch applications (CLICK) over to point to the AWS database at their leisure.DMS eliminates the need for high-stakes extended outages to migrate production data into the cloud. DMS provides a graceful switchover capability.
Using the AWS Database Migration Service to migrate data to AWS is simple.
(CLICK) Start by spinning up a DMS instance in your AWS environment
(CLICK) Next, from within DMS, connect to both your source and target databases
(CLICK) Choose what data you want to migrate. DMS lets you migrate tables, schemas, or whole databases
Then sit back and let DMS do the rest. (CLICK) It creates the tables, loads the data, and best of all, keeps them synchronized for as long as you need
That replication capability, which keeps the source and target data in sync, allows customers to switch applications (CLICK) over to point to the AWS database at their leisure.DMS eliminates the need for high-stakes extended outages to migrate production data into the cloud. DMS provides a graceful switchover capability.
Using the AWS Database Migration Service to migrate data to AWS is simple.
(CLICK) Start by spinning up a DMS instance in your AWS environment
(CLICK) Next, from within DMS, connect to both your source and target databases
(CLICK) Choose what data you want to migrate. DMS lets you migrate tables, schemas, or whole databases
Then sit back and let DMS do the rest. (CLICK) It creates the tables, loads the data, and best of all, keeps them synchronized for as long as you need
That replication capability, which keeps the source and target data in sync, allows customers to switch applications (CLICK) over to point to the AWS database at their leisure.DMS eliminates the need for high-stakes extended outages to migrate production data into the cloud. DMS provides a graceful switchover capability.
Using the AWS Database Migration Service to migrate data to AWS is simple.
(CLICK) Start by spinning up a DMS instance in your AWS environment
(CLICK) Next, from within DMS, connect to both your source and target databases
(CLICK) Choose what data you want to migrate. DMS lets you migrate tables, schemas, or whole databases
Then sit back and let DMS do the rest. (CLICK) It creates the tables, loads the data, and best of all, keeps them synchronized for as long as you need
That replication capability, which keeps the source and target data in sync, allows customers to switch applications (CLICK) over to point to the AWS database at their leisure.DMS eliminates the need for high-stakes extended outages to migrate production data into the cloud. DMS provides a graceful switchover capability.
Using the AWS Database Migration Service to migrate data to AWS is simple.
(CLICK) Start by spinning up a DMS instance in your AWS environment
(CLICK) Next, from within DMS, connect to both your source and target databases
(CLICK) Choose what data you want to migrate. DMS lets you migrate tables, schemas, or whole databases
Then sit back and let DMS do the rest. (CLICK) It creates the tables, loads the data, and best of all, keeps them synchronized for as long as you need
That replication capability, which keeps the source and target data in sync, allows customers to switch applications (CLICK) over to point to the AWS database at their leisure.DMS eliminates the need for high-stakes extended outages to migrate production data into the cloud. DMS provides a graceful switchover capability.
Who would like to see more decoders supported
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Quorum system for read/write; latency tolerant
Quorum membership changes do not stall writes
Data is replicated 6 times across 3 Availability Zones
Continuous backup to Amazon S3
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Storage volume automatically grows up to 64 TB
Data is replicated 6 times across 3 Availability Zones
Continuous backup to Amazon S3
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Storage volume automatically grows up to 64 TB
Data is replicated 6 times across 3 Availability Zones
Continuous backup to Amazon S3
Continuous monitoring of nodes and disks for repair
10GB segments as unit of repair or hotspot rebalance
Storage volume automatically grows up to 64 TB
(30 seconds): Let’s take a look at data load performance. With the pgbench benchmark, you first have to load the database. We compared the time it takes to load, vacuum, and build indexes for a scale 10000, or 150GB, pgbench database. As you can see, Amazon Aurora can finish the pgbench initialization phase about 3 times faster than PostgreSQL. Most of the performance difference in load times is due to the database-specific storage optimizations that are key to Amazon Aurora storage – we will dive deeper into those optimizations in a few minutes.
(1 minute): In all the tests we have shown you, we have tried to tune PostgreSQL to deliver the best possible performance results. One key part of that tuning is to reduce the number of checkpoints by increasing the duration between checkpoints. One consequence of that is it increases recovery time if there is a database failure. This is because when recovering from a crash, PostgreSQL has to start from the last checkpoint – the last time it wrote all dirty pages from memory to storage – and roll forward through all the Write-Ahead Log – or WAL – records written since the last checkpoint. The more WAL to roll forward, the longer recovery will take. With Amazon Aurora, there are no checkpoints needed, so recovery time is independent of checkpoints, and independent of how many transactions are being processed by the database. As you can see in the graph, as we increased the checkpoint time for PostgreSQL, the overall throughput increased, but so did the recovery time. At the best throughput level for PostgreSQL, the recovery time for Aurora was 85X faster than for PostgreSQL, and Aurora delivered more than 92 thousand writes per second compared with just under 70 thousand writes per second from PostgreSQL.
(1 minute) Let’s first look at some pgbench results. Pgbench is the standard benchmark that is part of the PostgreSQL distribution, and it has several built-in modes. One of those modes is tpcb-like, in which pgbench runs transactions that are very similar to the standard TPC-B transaction. We ran pgbench in tpcb-like mode while increasing the number of concurrent client connections from 256 up to 1,536. We used a 30GB scale 2000 size database, and we ran each test for 60 minutes. As you can see in the graph, PostgreSQL reaches a peak of just under 18 thousand transactions per second at 512 connections, whereas Amazon Aurora continues to scale up as more connections are added, reaching a peak of just over 38 thousand transactions per second at 1,024 connections. The peak-to-peak comparison shows Amazon Aurora delivers more than 2x the throughput of PostgreSQL, and the direct comparison of Amazon Aurora’s peak with the corresponding PostgreSQL result with 1,024 connections shows a ratio of greater than 2 ½ times.
(30 seconds) In this test, we used sysbench, a benchmark utility often used to compare different database engines. We ran the sysbench write-only benchmark, again while increasing the number of client connections, with a 30GB database. PostgreSQL scales up until reaching more than 47 thousand writes per second at 1,024 connections, then the throughput drops as more connections are added; Amazon Aurora scales up to more than 92 thousand writes per second at 1,536 connections, about 2x more throughput when comparing peak to peak. Compared directly with the PostgreSQL throughput with 1,536 connections, the ratio is more than 2 ½ X.
(1 minute) It’s important to measure throughput, but it’s also important to measure response time at scale. So, we looked at sysbench response times, with 1,024 concurrent connections. On the graph you can see very different behavior for Amazon Aurora as compared with PostgreSQL: the response times for Aurora are much steadier, with much less variation. More precisely, based on measuring the standard deviations of the two data sets, Amazon Aurora is more than 10x more consistent than PostgreSQL. Also, the average response time is about 2.9x lower. So, Aurora delivers much faster response times with much less variability. You might wonder what’s going on with the PostgreSQL results. What you see is the impact of database checkpoints, which PostgreSQL does to ensure that dirty pages in memory are periodically written to storage to ensure recovery time from a crash isn’t extended too long. During a checkpoint, PostgreSQL will do a lot of writes, which will slow down user transactions, hence the variability in the PostgreSQL response times.
------------ NOTES ONLY – DO NOT USE ---------------
On the sysbench response time graph: Stddev(POPS) is 96.97ms. Stddev(Manny) is 7.38ms. 13x more consistent (although I prefer “greater than 10x”)
Avg(POPS) is 201ms, avg(Manny) is 69ms. So this is really 2.9x lower response times
(3x would be 207, but who’s counting…)
On the sysbench response time graph: Stddev(POPS) is 96.97ms. Stddev(Manny) is 7.38ms. 13x more consistent (although I prefer “greater than 10x”)
Avg(POPS) is 201ms, avg(Manny) is 69ms. So this is really 2.9x lower response times
(3x would be 207, but who’s counting…)
(30 seconds): Let’s go back to pgbench to look at consistent perfornance based on throughput. In this graph, higher is better, as we’re showing the throughput while running pgbench in tpcb-like mode. We ran each database at the optimal number of clients to deliver max throughput for that database, and plotted the variability in throughput over time. As you can see, Amazon Aurora is much more consistent, and delivers significantly higher throughput: based on standard deviation, Aurora is about 3x more consistent than PostgreSQL.
[On the pgbench throughput graph: stddev(POPS) is 5080 tps. Stddev(Manny) is 1395 tps. 3.6x more consistent (again, I hate over precision, so “3x” is my preference).]
(1 minute): In this test, we compared how each database scales in terms of throughput as the database size scales, using the sysbench write-only workload. As you can see, with a 10GB database, Aurora delivers about 1.5X better throughput; with a 100GB database Aurora delivers about 3X better throughput. Aurora can handle larger databases and workloads significantly better than PostgreSQL.