PostgreSQL is not just a simple data storage - it is a sophisticated transactional system. In this talk, I am going to highlight unusual cases of using transactions in Postgres, from profiling user queries and running unit-tests to keeping data exports consistent over multiple processes and making sure that a typo in your query doesn’t destroy a result of all previous commands. Furthermore, I would like to show cases when transactions are not working as expected, and when the database itself prohibits you from using them (and why). A result of this should be a better understanding of enormous benefits (and small limitations) of employing transactions to safeguard your data and keep the application logic sane.
3. WE OFFER A SUCCESSFUL AND CURATED ASSORTMENT
> 300,000
articles from
~ 2,000
international brands
17 private
labels
HIGHLY
EXPERIENCED
category management
> 500
designers
& stylistsLOCALIZATION
of the assortment
CURATED
SHOPPING
with Zalon
3
5. “...we should talk about
transactional systems rather
than storage when we want to talk
about RDBMS and other
transaction technologies.”
https://masteringpostgresql.com
d
5
7. Every command in PostgreSQL
starts as a part of an (implicit)
transaction.
SQL transaction blocks with BEGIN
… COMMIT to explicitly start
transactions (usually to group
multiple statements).
7
9. CREATE SCHEMA findmybike;
SET SEARCH_PATH to findmybike, public;
CREATE TABLE bike (
b_id BIGSERIAL PRIMARY KEY,
b_owner BIGINT NOT NULL REFERENCES rider (r_id),
b_description TEXT,
b_photo_path TEXT
);
CREATE TABLE bike_location (
bl_bike_id BIGSERIAL REFERENCES bike (b_id),
bl_location POINT NOT NULL,
bl_last_seen TIMESTAMP WITH TIME ZONE NOT NULL
);
CREATE TABLE rider (
r_id BIGSERIAL PRIMARY KEY,
r_name TEXT NOT NULL,
r_email TEXT UNIQUE NOT NULL,
r_backup_email TEXT UNIQUE NOT NULL,
r_password TEXT NOT NULL
); 9
10. psql -qh localhost -U owner -d demo -f pgday.amsterdam/findmybike.sql
psql:pgday.amsterdam/findmybike.sql:10: ERROR: relation "rider" does
not exist
psql -q -h localhost -U owner -d demo -c "dt+ findmybike.*"
List of relations
Schema | Name | Type | Owner | Size | Description
------------+-------+-------+-------+------------+-------------
findmybike | rider | table | owner | 8192 bytes |
(1 row)
Non-atomic changes
10
11. psql -qh localhost -U owner -d demo -1f pgday.amsterdam/findmybike.sql
psql:pgday.amsterdam/findmybike.sql:10: ERROR: relation "rider" does
not exist
psql:pgday.amsterdam/findmybike.sql:16: ERROR: current transaction is
aborted, commands ignored until end of transaction block
psql -q -h localhost -U owner -d demo -c "dt+ findmybike.*"
List of relations
Schema | Name | Type | Owner | Size | Description
--------+------+------+-------+------+-------------
(0 rows)
Atomic changes and rollbacks
11
16. EXPORTED SNAPSHOT
(xmin: 123, xmax: 135,
xip_list: 123,126,127
() () ()
W
O
R
K
E
R
2
W
O
R
K
E
R
3
W
O
R
K
E
R
1
T
X
I
D
1
2
3
T
X
I
D
1
2
6
T
X
I
D
1
2
7
C
O
O
R
D
I
N
A
T
O
R
SYNCHRONIZED SNAPSHOTS
16
17. BEGIN;
DELETE FROM bike_location
USING bike, owner
WHERE bl_bike_id = b_id AND b_owner = r_id AND r_email = 'nolongervalid@example.com';
DELETE FROM bike
WHERE b_owner = r_id AND r_email = 'nolongervalid@example.com';
DELETE FROM rider
WHERE r_email = 'nolongervalid@example.com';
COMMIT;
-- there is a better way of executing those statements at once using a CTE
Atomic data changes
17
18. BEGIN;
CREATE EXTENSION pg_trgm;
CREATE INDEX bike_b_description_trgm_idx
ON bike USING gin(b_description gin_trgm_ops);
EXPLAIN ANALYZE SELECT * FROM bike
WHERE b_description LIKE '%orange%';
ROLLBACK;
Transactional DDL: testing indexes
18
19. BEGIN;
ALTER TABLE rider ADD COLUMN r_phone TEXT;
CREATE TABLE bike_parking (
bp_id INTEGER PRIMARY KEY,
bp_location POINT NOT NULL,
bp_name TEXT NOT NULL,
bp_DESCRIPTION TEXT
);
ALTER TABLE bike ADD COLUMN b_parking_id REFERENCES bike_parking (bp_id);
INSERT INTO versioning.changes (name, description, last_modified, modified_by)
VALUES ('FINDMYBIKE-42','Add bike parking and rider phone', now(), current_user);
ROLLBACK; -- or COMMIT
Transactional DDL: tracking changes
19
20. EXPLAIN ANALYZE dangerous
queries
BEGIN;
EXPLAIN ANALYZE WITH
rider_to_delete AS (
DELETE FROM rider
WHERE r_email = 'rider_1@example.com'
RETURNING r_id
),
bike_to_delete AS (
DELETE FROM bike
USING rider_to_delete rtd
WHERE b_owner = rtd.r_id
RETURNING b_id
)
DELETE FROM bike_location
USING bike_to_delete btd
WHERE bl_bike_id = btd.b_id;
ROLLBACK; 20
21. EXPLAIN ANALYZE dangerous
queries
Delete on bike_location (cost=636622.13..636630.17 rows=1 width=38) (actual time=35338.529..35338.529 rows=0 loops=1)
CTE rider_to_delete
-> Delete on rider (cost=0.56..8.58 rows=1 width=6) (actual time=1.354..1.358 rows=1 loops=1)
-> Index Scan using rider_r_email_key on rider (cost=0.56..8.58 rows=1 width=6) (actual time=1.325..1.327 rows=1
loops=1)
Index Cond: (r_email = 'rider_1@example.com'::text)
CTE bike_to_delete
-> Delete on bike (cost=0.03..636613.12 rows=1 width=38) (actual time=2.991..35335.634 rows=3 loops=1)
-> Hash Join (cost=0.03..636613.12 rows=1 width=38) (actual time=2.985..35335.597 rows=3 loops=1)
Hash Cond: (bike.b_owner = rtd.r_id)
-> Seq Scan on bike (cost=0.00..561385.60 rows=20060660 width=14) (actual time=0.014..25716.838
rows=20000008 loops=1)
-> Hash (cost=0.02..0.02 rows=1 width=40) (actual time=2.953..2.953 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on rider_to_delete rtd (cost=0.00..0.02 rows=1 width=40) (actual time=1.359..1.364 rows=1
loops=1)
-> Nested Loop (cost=0.44..8.48 rows=1 width=38) (actual time=9315.484..35338.516 rows=1 loops=1)
-> CTE Scan on bike_to_delete btd (cost=0.00..0.02 rows=1 width=40) (actual time=2.996..35335.647 rows=3 loops=1)
-> Index Scan using bike_location_pkey on bike_location (cost=0.44..8.46 rows=1 width=14) (actual time=0.952..0.952
rows=0 loops=3)
Index Cond: (bl_bike_id = btd.b_id)
Planning time: 1.057 ms
Trigger for constraint bike_b_owner_fkey on rider: time=26197.948 calls=1
Trigger for constraint bike_location_bl_bike_id_fkey on bike: time=0.914 calls=3
Execution time: 61537.489 ms 21
22. EXPLAIN ANALYZE dangerous
queries
-> Delete on bike (cost=0.03..636613.12 rows=1 width=38) (actual
time=2.991..35335.634 rows=3 loops=1)
-> Hash Join (cost=0.03..636613.12 rows=1 width=38) (actual
time=2.985..35335.597 rows=3 loops=1)
Hash Cond: (bike.b_owner = rtd.r_id)
-> Seq Scan on bike (cost=0.00..561385.60 rows=20060660 width=14)
(actual time=0.014..25716.838 rows=20000008 loops=1)
-> Hash (cost=0.02..0.02 rows=1 width=40) (actual time=2.953..2.953
rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> CTE Scan on rider_to_delete rtd (cost=0.00..0.02 rows=1
width=40) (actual time=1.359..1.364 rows=1 loops=1)
...
Planning time: 1.057 ms
Trigger for constraint bike_b_owner_fkey on rider: time=26197.948 calls=1
Trigger for constraint bike_location_bl_bike_id_fkey on bike: time=0.914 calls=3
Execution time: 61537.489 ms
22
23. CREATE TABLE bike (
b_id BIGSERIAL PRIMARY KEY,
b_owner BIGINT NOT NULL REFERENCES rider (r_id),
b_description TEXT,
b_photo_path TEXT
);
CREATE TABLE rider (
r_id BIGSERIAL PRIMARY KEY,
r_name TEXT NOT NULL,
r_email TEXT UNIQUE NOT NULL,
r_backup_email TEXT UNIQUE NOT NULL,
r_password TEXT NOT NULL
);
-- new index to speed up the delete query.
-- Note: cannot be part of a transaction
CREATE INDEX CONCURRENTLY bike_b_owner_idx ON bike(b_owner);
23
24. Delete on bike_location (cost=17.63..25.67 rows=1 width=38) (actual time=4.080..4.080 rows=0 loops=1)
CTE rider_to_delete
-> Delete on rider (cost=0.56..8.58 rows=1 width=6) (actual time=1.840..1.843 rows=1 loops=1)
-> Index Scan using rider_r_email_key on rider (cost=0.56..8.58 rows=1 width=6) (actual time=1.802..1.803 rows=1
loops=1)
Index Cond: (r_email = 'rider_1@example.com'::text)
CTE bike_to_delete
-> Delete on bike (cost=0.56..8.61 rows=1 width=38) (actual time=1.900..1.937 rows=3 loops=1)
-> Nested Loop (cost=0.56..8.61 rows=1 width=38) (actual time=1.894..1.915 rows=3 loops=1)
-> CTE Scan on rider_to_delete rtd (cost=0.00..0.02 rows=1 width=40) (actual time=1.857..1.860 rows=1
loops=1)
-> Index Scan using bike_b_owner_idx on bike (cost=0.56..8.58 rows=1 width=14) (actual time=0.033..0.047
rows=3 loops=1)
Index Cond: (b_owner = rtd.r_id)
-> Nested Loop (cost=0.44..8.48 rows=1 width=38) (actual time=4.061..4.075 rows=1 loops=1)
-> CTE Scan on bike_to_delete btd (cost=0.00..0.02 rows=1 width=40) (actual time=1.907..1.949 rows=3 loops=1)
-> Index Scan using bike_location_pkey on bike_location (cost=0.44..8.46 rows=1 width=14) (actual time=0.706..0.706
rows=0 loops=3)
Index Cond: (bl_bike_id = btd.b_id)
Planning time: 1.382 ms
Trigger for constraint bike_b_owner_fkey on rider: time=0.392 calls=1
Trigger for constraint bike_location_bl_bike_id_fkey on bike: time=0.437 calls=3
Execution time: 4.999 ms
(19 rows)
EXPLAIN ANALYZE dangerous
queries
24
25. -> Delete on bike (cost=0.56..8.61 rows=1 width=38) (actual time=1.900..1.937 rows=3
loops=1)
-> Nested Loop (cost=0.56..8.61 rows=1 width=38) (actual time=1.894..1.915
rows=3 loops=1)
-> CTE Scan on rider_to_delete rtd (cost=0.00..0.02 rows=1 width=40)
(actual time=1.857..1.860 rows=1 loops=1)
-> Index Scan using bike_b_owner_idx on bike (cost=0.56..8.58 rows=1
width=14) (actual time=0.033..0.047 rows=3 loops=1)
Index Cond: (b_owner = rtd.r_id)
Planning time: 1.382 ms
Trigger for constraint bike_b_owner_fkey on rider: time=0.392 calls=1
Trigger for constraint bike_location_bl_bike_id_fkey on bike: time=0.437 calls=3
Execution time: 4.999 ms
(19 rows)
EXPLAIN ANALYZE dangerous
queries
25
26. psql -h localhost -U robot_backup -d demo
# BEGIN;
# INSERT INTO bike (b_owner, b_description) VALUES (1, 'test');
INSERT 0 1
# INSERT INTO bike (b_owner, b_description) VALUES (2 'test2');
ERROR: syntax error at or near "'test2'" at character 53
insert INTO bike(b_owner, b_description) VALUES (2, 'test2');
ERROR: current transaction is aborted, commands ignored until end of
transaction block
ROLLBACK;
Correcting errors interactively with
ON_ERROR_ROLLBACK
26
27. psql -h localhost -U robot_backup -d demo -v
ON_ERROR_ROLLBACK=interactive
# BEGIN;
# INSERT INTO bike (b_owner, b_description) VALUES (1, 'test');
INSERT 0 1
# INSERT INTO bike (b_owner, b_description) VALUES (2 'test2');
ERROR: syntax error at or near "'test2'" at character 53
INSERT INTO bike (b_owner, b_description) VALUES (2, 'test2');
INSERT 0 1
COMMIT;
Correcting errors interactively with
ON_ERROR_ROLLBACK
27
28. psql -h localhost -U robot_backup -d demo
# BEGIN;
# SAVEPOINT statement1;
# INSERT INTO bike (b_owner, b_description) VALUES (1, 'test');
INSERT 0 1
# RELEASE statement1;
# SAVEPOINT statement2;
# INSERT INTO bike (b_owner, b_description) VALUES (2 'test2');
ERROR: syntax error at or near "'test2'" at character 53
# ROLLBACK TO statement2;
INSERT INTO bike (b_owner, b_description) VALUES (2, 'test2');
INSERT 0 1
COMMIT;
Correcting errors interactively with
subtransactions
28
29. Performing batch updates
-- session 1
BEGIN;
UPDATE bike SET b_photo_path = '/data/'||b_photo_path;
-- session 2
UPDATE bike SET b_description = 'my awesome bike' WHERE b_id = 1000000;
29
30. CREATE OR REPLACE FUNCTION findmybike.batch_change_path(p_new_prefix TEXT, p_batch_size INTEGER)
RETURNS VOID AS
$$
BEGIN
WHILE EXISTS(SELECT 1
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%') LOOP
WITH keys_to_update AS (
SELECT b_id
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%'
LIMIT p_batch_size
) UPDATE bike b
SET b_photo_path = p_new_prefix || b_photo_path FROM keys_to_update ktu
WHERE b.b_id = ktu.b_id;
END LOOP;
END;
$$
LANGUAGE plpgsql;
Performing batch updates (via a function)
30
31. WHILE EXISTS(SELECT 1
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%') LOOP
WITH keys_to_update AS (
SELECT b_id
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%'
LIMIT p_batch_size
) UPDATE bike b
SET b_photo_path = p_new_prefix || b_photo_path FROM keys_to_update ktu
WHERE b.b_id = ktu.b_id;
END LOOP;
Performing batch updates (via a function)
31
32. CREATE INDEX bike_p_bike_path_idx ON bike(p_bike_path);
-- session 1
SELECT findmybike.batch_change_path('/data1', 100);
-- session 2
UPDATE bike SET b_description = 'my awesome bike' WHERE b_id = 14238019;
A function is always executed in a single transaction
32
33. CREATE OR REPLACE PROCEDURE findmybike.batch_change_path(p_new_prefix TEXT, p_batch_size INTEGER)
AS
$$
BEGIN
WHILE EXISTS(SELECT 1
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%') LOOP
WITH keys_to_update AS (
SELECT b_id
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%'
LIMIT p_batch_size
) UPDATE bike b
SET b_photo_path = p_new_prefix || b_photo_path FROM keys_to_update ktu
WHERE b.b_id = ktu.b_id;
COMMIT;
END LOOP;
END;
$$
LANGUAGE plpgsql;
Performing batch updates (Postgres 11 procedure)
33
34. WHILE EXISTS(SELECT 1
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%') LOOP
WITH keys_to_update AS (
SELECT b_id
FROM bike
WHERE b_photo_path NOT LIKE p_new_prefix || '%'
LIMIT p_batch_size
) UPDATE bike b
SET b_photo_path = p_new_prefix || b_photo_path FROM keys_to_update ktu
WHERE b.b_id = ktu.b_id;
COMMIT;
END LOOP;
Performing batch updates (Postgres 11 procedure)
34
35. Performing batch updates (Postgres 11 procedure)
select query, backend_xid, xact_start from pg_stat_activity where state = 'active' and pid
!= (select pg_backend_pid());
-[ RECORD 1 ]---------------------------------------
query | call batch_change_path('/data1', 100);
backend_xid | 117913
xact_start | 2018-07-11 16:01:07.532973+02
select query, backend_xid, xact_start from pg_stat_activity where state = 'active' and pid
!= (select pg_backend_pid());
-[ RECORD 1 ]---------------------------------------
query | call batch_change_path('/data1', 100);
backend_xid | 118814
xact_start | 2018-07-11 16:01:07.532973+02
35
36. ...we believe it is better to have application
programmers deal with performance problems due to
overuse of transactions as bottlenecks arise, rather than
always coding around the lack of transactions
Spanner: Google’s Globally-Distributed Database paper
36
37. “HIRE THE BEST PEOPLE YOU CAN, AND GET OUT OF THEIR WAY.“
37