Обзорный рассказ про новые возможности в мире PostgreSQL для митапа Big Data Minsk User Group 29 апреля 2016 г.: https://www.facebook.com/events/120784531655479/
4. СОЗДАНИЕ ИНДЕКСА
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ]
[ [ IF NOT EXISTS ] name ] ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ]
[ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ]
[, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ] [ WHERE predicate ]
4
5. ВЫБОРКА БЕЗ ИНДЕКСА
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_events
WHERE 3488850707 < event_id AND event_id < 3488880707;
------------------------------------------------------------------
Seq Scan on github_events (cost=0.00..265213.33 rows=13185
width=8) (actual time=0.008..495.324 rows=12982 loops=1)
Filter: (('3488850707'::bigint < event_id) AND (event_id <
'3488880707'::bigint))
Rows Removed by Filter: 2040200
Planning time: 0.189 ms
Execution time: 504.053 ms
5
6. ПРОСТОЙ ИНДЕКС
CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM github_events
WHERE 3488850707 < event_id AND event_id < 3488880707;
------------------------------------------------------------------
Index Scan using event_id_idx on github_events
(cost=0.43..1921.28 rows=13187 width=8) (actual time=0.024..12.544
rows=12982 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND (event_id <
'3488880707'::bigint))
Planning time: 0.190 ms
Execution time: 21.130 ms
6
7. ОБЫЧНЫЙ ИНДЕКС
CREATE UNIQUE INDEX event_id_idx ON github_events(event_id);
--------------------------------
Index Scan using event_id_idx on github_events
(cost=0.43..1921.28 rows=13187 width=8) (actual
time=0.037..12.485 rows=12982 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND
(event_id < '3488880707'::bigint))
Planning time: 0.186 ms
Execution time: 21.222 ms
7
9. ПОКРЫВАЮЩИЙ ИНДЕКС
• Меньше размер индекса
• Меньше издержек на обновление
• Быстрее планирование и поиск
• Для включенных столбцов не нужен opclass
• Фильтр по включенным столбцам
CREATE UNIQUE INDEX event_id_idx2 ON
github_events(event_id) INCLUDING (repo_id);
https://pgconf.ru/media/2016/02/19/4_Lubennikova_B-
tree_pgconf.ru_3.0%20(1).pdf
9
10. ПОКРЫВАЮЩИЙ ИНДЕКС
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id FROM
github_events WHERE 3488850707 < event_id AND event_id < 3488880707;
---------------------------------------
Index Only Scan using event_id_idx2 on github_events
(cost=0.43..23764.29 rows=13187 width=8) (actual time=0.032..12.533
rows=12982 loops=1)
Index Cond: ((event_id > '3488850707'::bigint) AND (event_id <
'3488880707'::bigint))
Heap Fetches: 12982
Planning time: 0.178 ms
Execution time: 21.147 ms
10
11. BRIN-ИНДЕКС
CREATE INDEX event_id_brin_idx ON github_event USING(event_id);
--------------------------------
Bitmap Heap Scan on github_events (cost=175.16..42679.52 rows=13187 width=8) (actual
time=0.824..1
5.489 rows=12982 loops=1)
Recheck Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))
Rows Removed by Index Recheck: 13995
Heap Blocks: lossy=3072
-> Bitmap Index Scan on event_id_brin_idx (cost=0.00..171.87 rows=13187 width=0) (actual
time=0
.698..0.698 rows=30720 loops=1)
Index Cond: (('3488850707'::bigint < event_id) AND (event_id < '3488880707'::bigint))
Planning time: 0.094 ms
Execution time: 24.421 ms
11
13. CSTORE_FDW
• Inspired by Optimized Row Columnar (ORC) format
developed by Hortonworks.
• Compression: Reduces in-memory and on-disk data size
by 2-4x. Can be extended to support different codecs.
• Column projections: Only reads column data relevant to
the query. Improves performance for I/O bound queries.
• Skip indexes: Stores min/max statistics for row groups,
and uses them to skip over unrelated rows.
13
14. CSTORE_FDW
CREATE FOREIGN TABLE cstored_github_events (
event_id bigint,
event_type text,
event_public boolean,
repo_id bigint,
payload jsonb,
repo jsonb, actor jsonb,
org jsonb,
created_at timestamp
)
SERVER cstore_server
OPTIONS(compression 'pglz');
INSERT INTO cstored_github_events (SELECT * FROM github_events);
ANALYZE cstored_github_events;
14
15. ТИПИЧНЫЙ ЗАПРОС
meetup_demo=# EXPLAIN ANALYZE SELECT repo_id, count(*) FROM cstored_github_events WHERE created_at BETWEEN timestamp
'2016-01-02 01:00:00' AND timestamp '2016-01-02 23:00:00' GROUP BY repo_id ORDER BY 2 DESC;
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------
Sort (cost=75153.59..75221.43 rows=27137 width=8) (actual time=950.085..1030.283 rows=106145 loops=1)
Sort Key: (count(*)) DESC
Sort Method: quicksort Memory: 8048kB
-> HashAggregate (cost=72883.86..73155.23 rows=27137 width=8) (actual time=772.445..861.162 rows=106145 loops=1)
Group Key: repo_id
-> Foreign Scan on cstored_github_events (cost=0.00..70810.84 rows=414603 width=8) (actual time=4.762..382.302
rows=413081 loops=1)
Filter: ((created_at >= '2016-01-02 01:00:00'::timestamp without time zone) AND (created_at <= '2016-01-02
23:00:00'::timestamp without time zone))
Rows Removed by Filter: 46919
CStore File: /var/lib/pgsql/9.5/data/cstore_fdw/18963/1236161
CStore File Size: 1475036725
Planning time: 0.126 ms
Execution time: 1109.248 ms
15
16. НЕ ВСЕГДА КАК В РЕКЛАМЕ
SELECT
pg_size_pretty(cstore_table_size('cstored_github_events'));
1407 MB
SELECT pg_size_pretty(pg_table_size('github_events'));
2668 MB
16
17. POSTGRESQL 9.5:
FOREIGN TABLE INHERITANCE
• Fast INSERT and look-ups into current table.
• Periodically move data to archive table for compression.
• Query both via main table.
• Combined row-based and columnar store
17
18. КЛАСТЕРИЗАЦИЯ
SELECT retweet_count FROM contest WHERE "user.id" =
13201312;
Time: 120.743 ms
CREATE INDEX user_id_post_id ON contest("user.id"
ASC, "id" DESC);
CLUSTER contest USING user_id_post_id;
VACUUM contest;
Time: 4.128 ms
18
https://github.com/reorg/pg_repack
There is
no CLUSTER statement
in the SQL standard.
bloating
19. ЧТО ЕЩЕ?
• UPSERT: INSERT… ON CONFLICT DO
NOTHING/UPDATE (9.5)
• Частичные индексы (9.2)
• Материализованные представления (9.3)
19
20. ПРОФИЛИРОВАНИЕ И DBA
• pg_stat_statements, pg_stat_activity, pg_buffercache
• https://github.com/PostgreSQL-Consulting/pg-utils
• https://github.com/ankane/pghero
• Множество полезных запросов на wiki PostgreSQL
• https://wiki.postgresql.org/wiki/Show_database_bloat
20
22. JSONB
CREATE INDEX login_idx ON github_events USING btree((org->>'login'));
CREATE INDEX login_idx2 ON github_events USING gin(org jsonb_value_path_ops);
jsonb_path_value_ops
(hash(path_item_1.path_item_2. ... .path_item_n); value)
jsonb_value_path_ops
(value; bloom(path_item_1) | bloom(path_item_2) | ... | bloom(path_item_n))
22
23. JSQUERY
CREATE TABLE js (
id serial,
data jsonb,
CHECK (data @@ '
name IS STRING AND
similar_ids.#: IS NUMERIC AND
points.#:(x IS NUMERIC AND y IS NUMERIC)':: jsquery));
23