SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Indexes
in
PostgreSQL
(10)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The outlineThe outline
• Indexes in PostgreSQL
• What’s new in v10:
– Parallelism
– Hash indexing
– New supports for SP-GiST (inet data)
– Summarization of BRINs
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
~$ whoami~$ whoami
Giuseppe BroccoloGiuseppe Broccolo
- data engineer at- data engineer at
- member of- member of
@giubro
gbroccolo7
gbroccolo
gemini__81
g.broccolo.7@gmail.com
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
PostgreSQL indexesPostgreSQL indexes
• AKA Access Methods
– allow concurrent changes (MVCC compliant)
– persist the information (WAL)
– speed up access to data:
• links to data blocks (sometimes can be avoided)
• Indexes’ blocks live in shared buffers AWA data blocks
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
WALWALWAL
sharedbuffers
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the treesThe default AMs – the trees
balanced
unbalanced
• binary structure hierarchically sorted
– nodes (values, link to pointed nodes, etc.)
– pointing depends from hierarchical criteria
– allow to skip orders of values
• N~O(an
) n~O(logN)→
• balanced structures speed up punctual searches
• unbalanced ones are quite faster for range
searches
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the hashesThe default AMs – the hashes
• binary maps (k: v)
– k: the hash of the search key - bucket
– v: the address where the key is stored
– just one kind of search: =
– complexity:
• ~O(1)
– like trees, their sizes are comparable with
the indexed dataset
• ~O(N)
search key
k: value...
hashing
N
complexity
~O(logN)
...
~O(1)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMs – the BRINsThe default AMs – the BRINs
• Block Range Indexes:
– À. Herrera, S. Riggs, H. Linnakangas (PG 9.5)
– Range: summarization of adjacent-on-disk blocks
– complexity:
• ~O(N/K), K~10/100
• really small indexes,faster creation
• ~O(N/K’), K’~1000/10000
• can be used for low-selectivity queries
• low performance for “dynamic” data
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
8kB8kB8kB8kB
range 0 range 1 range 2 range 3
range 7range 6range 5range 4
Summarization:
blk n. xxxxx
range X blk n. yyyyy
blk n. zzzzz
......
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The default AMsThe default AMs
• B-tree, GIN, GiST, SP-GiST, Hash, BRIN
• can add user defined new access methods
– fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant)
• CREATE ACCESS METHOD
sortable generalized
balanced unbalanced
trees
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses
• access methods use operator classes (opclass)
•
•
•
• define:
– operators for the needed types
– support functions depending on the access method
• can be extended to specific datatypes
CREATE INDEX idx_name
USING method
ON table (column opclass_name)
WITH (opt=value);
• CREATE OPERATOR CLASS opclass_name
FOR TYPE datatype
USING method
OPERATOR $$(),
[...],
FUNCTION func1(),
[...]
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Execution plansExecution plans
• IndexScan need to inspect data
pages for row visibility
• IndexOnlyScan just index pages, use
visibility map (PG9.2)
• BitmapIndexScan
BitmapHeapScan 1) reduce # of accesses
using a bitmap
2) used by BRIN to
inspect block ranges
N
complexity
~O(logN)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
What’s new in PG 10 ?What’s new in PG 10 ?
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in index scansParallelization in index scans
• parallelization is not new in PG (9.6), see G. Ciolli later
– parallel B-tree index scans
– parallel BitmapHeapScan (different areas of the heap are processed
by parallel workers)
– R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• for B-tree
– Workers inspect leaf pages in parallel
gather
node
gather
node
worker #1
worker #2
worker #N
...
• for bitmap heap scan
– Workers inspect heap chunks in parallel
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Parallelization in Index ScansParallelization in Index Scans
• The parameters:
– max_parallel_workers (included in max_worker_processes)
– max_parallel_workers_per_gather (included in max_parallel_workers)
– min_parallel_index_scan_size (512kB)
• heuristic: # workers / index size > 512kB * 3# workers
– parallel_setup_cost (1000.0)
– parallel_tuple_cost (0.1)
– force_parallel_mode (false)
• tune them basing on underlying HW!
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
• table/B-tree ~O(300MB)
=# CREATE TABLE test AS
=# SELECT generate_series(1,10000000) t(i);
CREATE
=# CREATE INDEX btree_idx ON test USING btree (i);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When parallelization is disabled:When parallelization is disabled:
• Ex. IndexOnlyScan on B-tree:
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.433..0.434 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 1
Planning time: 0.525 ms
Execution time: 0.461 ms
(5 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Setup parallel executions:
•
•
•
• Plan does not change!! Force parallelization...
=# SET max_parallel_workers TO 8;
SET
=# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers
SET
=# SET force_parallel_mode TO true;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• Ex. IndexOnlyScan on B-tree
=# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5;
QUERY PLAN
----------------------------------------------------------
Gather (cost=1000.43..1008.45 rows=1 width=4)
(actual time=2.523..2.579 rows=1 loops=1)
Workers Planned: 6
Workers Launched: 6
Single Copy: true
-> Index Only Scan using btree_id on test
(cost=0.43..8.45 rows=1 width=4)
(actual time=0.030..0.032 rows=1 loops=1)
Index Cond: (i = 5)
Heap Fetches: 0
Planning time: 0.063 ms
Execution time: 3.934 ms
(9 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
When is parallelization used ?When is parallelization used ?
• try to “trick” the planner with lower tuple costs:
• the same plan is obtained – and it is still disadvantageous!
– costs parameters are (almost) always fine
– parallelization costs are sustainable in case of (real) big data
=# SET force_parallel_mode TO false;
SET
=# SET parallel_tuple_cost TO 0.01;
SET
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
8kB8kB8kB8kB
WALWALWAL
• Hash AMs did not define how index changes had to be logged into WALs:
– Hashes lived just in shared buffers – no crash safe!
– Hashes could not be phisically replicated
• Hashes AMs now include WAL logging (R. Haas, G. Ghosh,
A. Kapila,A. Sharma)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• Ex. physical replication, with pre-existing hash index before 1st
base backup:
hot standby
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
master
=# d hash_example
Table "public.hash_example"
Column | Type | Modifiers
--------+---------+-----------
i | integer |
Indexes:
"hash_idx" hash (i)
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Hash indexes are now logged!Hash indexes are now logged!
• pre PostgreSQL 10:
hot standby
=# explain analyze select * from
=# hash_example where i = 123;
QUERY PLAN
-----------------------------------------
Index Scan using hash_idx on hash_example
(cost=0.00..8.02 rows=1 width=21)
(actual time=1.526..1.529 rows=1 loops=1)
[...]
master
=# explain analyze select * from
=# hash_example where i = 123;
ERROR: could not read block 0 in file
"base/16402/458955269": read only 0 of
8192 byte
=# SET enable_index_scan TO false;
SET
WALWAL WALWALWALWAL
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Unbalanced indexes perform better in case of inclusion searches:
– Ex. Quad-tree
&&
bbox
• H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes):
– defined the OpClass for inet to be interfaced with SP-GiST AMs
• inet_ops → && >> >>= > >= <> << <<= < <= =
– important improvement in SP-GiST AM: # of child nodes is limited
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex.
=# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '.' ||
=# (random() * 255)::int::text || '/' ||
=# (random() * 32)::int::text)::inet as addr
=# FROM generate_series(1, 1000);
CREATE
=# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops);
CREATE
=# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops);
CREATE
=# CREATE TABLE network_b AS (
=# SELECT * FROM network_a ORDER BY random() LIMIT 100);
CREATE
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. no indexes
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.00..15032.50 rows=78724 width=14)
(actual time=0.017..185.134 rows=94973 loops=1)
Join Filter: (a.addr && b.addr)
Rows Removed by Join Filter: 905027
-> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.008..0.187 rows=1000 loops=1)
-> Materialize (cost=0.00..20.00 rows=1000 width=7)
(actual time=0.000..0.061 rows=1000 loops=1000)
-> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7)
(actual time=0.005..0.083 rows=1000 loops=1)
Planning time: 0.522 ms
Execution time: 190.120 ms
(8 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..631.40 rows=13600 width=39)
(actual time=0.048..112.023 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.016..0.153 rows=1000 loops=1)
-> Index Only Scan using gist_idx_a on network_a a
(cost=0.14..0.35 rows=10 width=7)
(actual time=0.018..0.093 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.111 ms
Execution time: 119.433 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
SP-GiST support forSP-GiST support for inetinet
• Ex. SP-GiST index
=# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr;
QUERY PLAN
-----------------------------------------------------------------------------------
Nested Loop (cost=0.14..667.40 rows=13600 width=39)
(actual time=0.034..58.196 rows=94973 loops=1)
-> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32)
(actual time=0.009..0.105 rows=1000 loops=1)
-> Index Only Scan using spgist_idx_a on network_a a
(cost=0.14..0.37 rows=10 width=7)
(actual time=0.008..0.042 rows=95 loops=1000)
Index Cond: (addr && a.addr)
Heap Fetches: 94973
Planning time: 0.109 ms
Execution time: 63.562 ms
(7 rows)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
BRIN summarization for newBRIN summarization for new INSERTINSERTss
• pre PG 10: perform VACUUM, or call brin_summarize_new_value()
• NOW (Á. Herrera):
– autovacuum daemon is now able to summarize now data in present ranges:
• CREATE INDEX ON table USING brin (column) WITH (autosummarize=on);
– It is possible to summarize/desummarized single blocks (bigint):
• brin_summarize_range / brin_desummarize_range
• BRIN are (still) not able to “shrinks” summarized data
– if you update/delete boundary data, need to REINDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Other features about indexesOther features about indexes
• Improve hash index performance
(A. Kapila, M. Cy, A. Sharma)
• Improve accuracy in determining if a BRIN index scan is beneficial
(D. Rowley, E. Hasegeli)
• Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently
(A. Borodin)
• Reduce page locking during vacuuming of GIN indexes
(A. Borodin)
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
The future of indexes in PostgreSQLThe future of indexes in PostgreSQL
• Allow compression/decompression AM functions in SP-GiST
OpClasses (good for PostGIS!)
• CREATE GLOBAL INDEX
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
ConclusionsConclusions
• PostgreSQL has a long tradition in indexes development
• different types for different goals
• an eye to the future
PGDay.IT 2017 - 11th
edition
Milan October, 13th
2017
Giuseppe Broccolo
g.broccolo.7@gmail.com
Viralize.com
Creative Commons licenseCreative Commons license
This work is licensed under a Creative Commons
Attribution-ShareAlike 4.0 International License
https://creativecommons.org/licenses/by-nc-sa/4.0/
© 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/

Más contenido relacionado

Similar a Indexes in PostgreSQL (10)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...pgdayrussia
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobilitySilvia Pichler
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchTaswar Bhatti
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with PythonGiuseppe Broccolo
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data scienceLong Nguyen
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Beat Signer
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Citus Data
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)Dina Goldshtein
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...Hyung-Gyu Ryoo
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Beat Signer
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Fabio Benedetti
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML, Inc
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasyPier Carlo Chiodi
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoSolutions
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhUlrik Lyngs
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj Vysvader
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Codemotion
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...Tao Xie
 

Similar a Indexes in PostgreSQL (10) (20)

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коро...
 
Big Data for crowd mobility
Big Data for crowd mobilityBig Data for crowd mobility
Big Data for crowd mobility
 
Devteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearchDevteach 2017 Store 2 million of audit a day into elasticsearch
Devteach 2017 Store 2 million of audit a day into elasticsearch
 
High scalable applications with Python
High scalable applications with PythonHigh scalable applications with Python
High scalable applications with Python
 
Introduction to R for data science
Introduction to R for data scienceIntroduction to R for data science
Introduction to R for data science
 
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
Query Processing and Optimisation - Lecture 10 - Introduction to Databases (1...
 
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
Data Modeling, Normalization, and Denormalisation | PostgreSQL Conference Eur...
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
 
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
[FOSS4G 2017 Boston]Development of an extension of Geoserver for handling 3D ...
 
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
Access Methods - Lecture 9 - Introduction to Databases (1007156ANR)
 
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...Context Semantic Analysis: a knowledge-based technique for computing inter-do...
Context Semantic Analysis: a knowledge-based technique for computing inter-do...
 
BICOD-2017
BICOD-2017BICOD-2017
BICOD-2017
 
Bicod2017
Bicod2017Bicod2017
Bicod2017
 
BigML Summer 2017 Release
BigML Summer 2017 ReleaseBigML Summer 2017 Release
BigML Summer 2017 Release
 
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made EasySalottino MIX 2017 - ARouteServer - IXP Automation Made Easy
Salottino MIX 2017 - ARouteServer - IXP Automation Made Easy
 
GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!GeoServer in Production: we do it, here is how!
GeoServer in Production: we do it, here is how!
 
Provenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong HuynhProvenance and Analytics for Social Machines, Trung Dong Huynh
Provenance and Analytics for Social Machines, Trung Dong Huynh
 
Juraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CVJuraj vysvader - Python developer's CV
Juraj vysvader - Python developer's CV
 
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
Gabriele Petronella - Mythical trees and where to find them - Codemotion Mila...
 
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
ISEC'18 Keynote: Intelligent Software Engineering: Synergy between AI and Sof...
 

Último

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfblazblazml
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfSubhamKumar3239
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 

Último (20)

Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdfEnglish-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
English-8-Q4-W3-Synthesizing-Essential-Information-From-Various-Sources-1.pdf
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
convolutional neural network and its applications.pdf
convolutional neural network and its applications.pdfconvolutional neural network and its applications.pdf
convolutional neural network and its applications.pdf
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 

Indexes in PostgreSQL (10)

  • 1. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Indexes in PostgreSQL (10)
  • 2. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The outlineThe outline • Indexes in PostgreSQL • What’s new in v10: – Parallelism – Hash indexing – New supports for SP-GiST (inet data) – Summarization of BRINs
  • 3. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ~$ whoami~$ whoami Giuseppe BroccoloGiuseppe Broccolo - data engineer at- data engineer at - member of- member of @giubro gbroccolo7 gbroccolo gemini__81 g.broccolo.7@gmail.com
  • 4. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods
  • 5. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com PostgreSQL indexesPostgreSQL indexes • AKA Access Methods – allow concurrent changes (MVCC compliant) – persist the information (WAL) – speed up access to data: • links to data blocks (sometimes can be avoided) • Indexes’ blocks live in shared buffers AWA data blocks 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB WALWALWAL sharedbuffers
  • 6. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→
  • 7. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches
  • 8. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the treesThe default AMs – the trees balanced unbalanced • binary structure hierarchically sorted – nodes (values, link to pointed nodes, etc.) – pointing depends from hierarchical criteria – allow to skip orders of values • N~O(an ) n~O(logN)→ • balanced structures speed up punctual searches • unbalanced ones are quite faster for range searches
  • 9. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the hashesThe default AMs – the hashes • binary maps (k: v) – k: the hash of the search key - bucket – v: the address where the key is stored – just one kind of search: = – complexity: • ~O(1) – like trees, their sizes are comparable with the indexed dataset • ~O(N) search key k: value... hashing N complexity ~O(logN) ... ~O(1)
  • 10. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMs – the BRINsThe default AMs – the BRINs • Block Range Indexes: – À. Herrera, S. Riggs, H. Linnakangas (PG 9.5) – Range: summarization of adjacent-on-disk blocks – complexity: • ~O(N/K), K~10/100 • really small indexes,faster creation • ~O(N/K’), K’~1000/10000 • can be used for low-selectivity queries • low performance for “dynamic” data 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB 8kB8kB8kB8kB range 0 range 1 range 2 range 3 range 7range 6range 5range 4 Summarization: blk n. xxxxx range X blk n. yyyyy blk n. zzzzz ......
  • 11. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The default AMsThe default AMs • B-tree, GIN, GiST, SP-GiST, Hash, BRIN • can add user defined new access methods – fully supported since 9.6 (thanks to postgrespro & 2ndQuadrant) • CREATE ACCESS METHOD sortable generalized balanced unbalanced trees
  • 12. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Extend AMs to datatypes: the OpClassesExtend AMs to datatypes: the OpClasses • access methods use operator classes (opclass) • • • • define: – operators for the needed types – support functions depending on the access method • can be extended to specific datatypes CREATE INDEX idx_name USING method ON table (column opclass_name) WITH (opt=value); • CREATE OPERATOR CLASS opclass_name FOR TYPE datatype USING method OPERATOR $$(), [...], FUNCTION func1(), [...]
  • 13. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Execution plansExecution plans • IndexScan need to inspect data pages for row visibility • IndexOnlyScan just index pages, use visibility map (PG9.2) • BitmapIndexScan BitmapHeapScan 1) reduce # of accesses using a bitmap 2) used by BRIN to inspect block ranges N complexity ~O(logN)
  • 14. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com What’s new in PG 10 ?What’s new in PG 10 ?
  • 15. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in index scansParallelization in index scans • parallelization is not new in PG (9.6), see G. Ciolli later – parallel B-tree index scans – parallel BitmapHeapScan (different areas of the heap are processed by parallel workers) – R. Syed, A. Kapila, R. Haas, R. Sabih, D. Kumar, R. Haas, J. Rouhaud
  • 16. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • for B-tree – Workers inspect leaf pages in parallel gather node gather node worker #1 worker #2 worker #N ... • for bitmap heap scan – Workers inspect heap chunks in parallel
  • 17. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Parallelization in Index ScansParallelization in Index Scans • The parameters: – max_parallel_workers (included in max_worker_processes) – max_parallel_workers_per_gather (included in max_parallel_workers) – min_parallel_index_scan_size (512kB) • heuristic: # workers / index size > 512kB * 3# workers – parallel_setup_cost (1000.0) – parallel_tuple_cost (0.1) – force_parallel_mode (false) • tune them basing on underlying HW!
  • 18. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree • table/B-tree ~O(300MB) =# CREATE TABLE test AS =# SELECT generate_series(1,10000000) t(i); CREATE =# CREATE INDEX btree_idx ON test USING btree (i); CREATE
  • 19. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When parallelization is disabled:When parallelization is disabled: • Ex. IndexOnlyScan on B-tree: =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.433..0.434 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 1 Planning time: 0.525 ms Execution time: 0.461 ms (5 rows)
  • 20. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Setup parallel executions: • • • • Plan does not change!! Force parallelization... =# SET max_parallel_workers TO 8; SET =# SET max_parallel_workers_per_gather TO 8; -- up to 6 workers SET =# SET force_parallel_mode TO true; SET
  • 21. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • Ex. IndexOnlyScan on B-tree =# EXPLAIN ANALYZE SELECT * FROM test WHERE i=5; QUERY PLAN ---------------------------------------------------------- Gather (cost=1000.43..1008.45 rows=1 width=4) (actual time=2.523..2.579 rows=1 loops=1) Workers Planned: 6 Workers Launched: 6 Single Copy: true -> Index Only Scan using btree_id on test (cost=0.43..8.45 rows=1 width=4) (actual time=0.030..0.032 rows=1 loops=1) Index Cond: (i = 5) Heap Fetches: 0 Planning time: 0.063 ms Execution time: 3.934 ms (9 rows)
  • 22. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com When is parallelization used ?When is parallelization used ? • try to “trick” the planner with lower tuple costs: • the same plan is obtained – and it is still disadvantageous! – costs parameters are (almost) always fine – parallelization costs are sustainable in case of (real) big data =# SET force_parallel_mode TO false; SET =# SET parallel_tuple_cost TO 0.01; SET
  • 23. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! 8kB8kB8kB8kB WALWALWAL • Hash AMs did not define how index changes had to be logged into WALs: – Hashes lived just in shared buffers – no crash safe! – Hashes could not be phisically replicated • Hashes AMs now include WAL logging (R. Haas, G. Ghosh, A. Kapila,A. Sharma)
  • 24. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • Ex. physical replication, with pre-existing hash index before 1st base backup: hot standby =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) master =# d hash_example Table "public.hash_example" Column | Type | Modifiers --------+---------+----------- i | integer | Indexes: "hash_idx" hash (i) WALWAL WALWALWALWAL
  • 25. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Hash indexes are now logged!Hash indexes are now logged! • pre PostgreSQL 10: hot standby =# explain analyze select * from =# hash_example where i = 123; QUERY PLAN ----------------------------------------- Index Scan using hash_idx on hash_example (cost=0.00..8.02 rows=1 width=21) (actual time=1.526..1.529 rows=1 loops=1) [...] master =# explain analyze select * from =# hash_example where i = 123; ERROR: could not read block 0 in file "base/16402/458955269": read only 0 of 8192 byte =# SET enable_index_scan TO false; SET WALWAL WALWALWALWAL
  • 26. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Unbalanced indexes perform better in case of inclusion searches: – Ex. Quad-tree && bbox • H. Hesegeli extended the use case to IPv4/IPv6 addresses (inet, 7 Bytes/19 Bytes): – defined the OpClass for inet to be interfaced with SP-GiST AMs • inet_ops → && >> >>= > >= <> << <<= < <= = – important improvement in SP-GiST AM: # of child nodes is limited
  • 27. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. =# CREATE TABLE network_a AS SELECT ((random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '.' || =# (random() * 255)::int::text || '/' || =# (random() * 32)::int::text)::inet as addr =# FROM generate_series(1, 1000); CREATE =# CREATE INDEX gist_idx ON network_a USING gist (addr inet_ops); CREATE =# CREATE INDEX spgist_idx_a ON network_a USING spgist (addr inet_ops); CREATE =# CREATE TABLE network_b AS ( =# SELECT * FROM network_a ORDER BY random() LIMIT 100); CREATE
  • 28. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. no indexes =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.00..15032.50 rows=78724 width=14) (actual time=0.017..185.134 rows=94973 loops=1) Join Filter: (a.addr && b.addr) Rows Removed by Join Filter: 905027 -> Seq Scan on network_a a (cost=0.00..15.00 rows=1000 width=7) (actual time=0.008..0.187 rows=1000 loops=1) -> Materialize (cost=0.00..20.00 rows=1000 width=7) (actual time=0.000..0.061 rows=1000 loops=1000) -> Seq Scan on network_b b (cost=0.00..15.00 rows=1000 width=7) (actual time=0.005..0.083 rows=1000 loops=1) Planning time: 0.522 ms Execution time: 190.120 ms (8 rows)
  • 29. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..631.40 rows=13600 width=39) (actual time=0.048..112.023 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.016..0.153 rows=1000 loops=1) -> Index Only Scan using gist_idx_a on network_a a (cost=0.14..0.35 rows=10 width=7) (actual time=0.018..0.093 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.111 ms Execution time: 119.433 ms (7 rows)
  • 30. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com SP-GiST support forSP-GiST support for inetinet • Ex. SP-GiST index =# EXPLAIN ANALYZE SELECT * FROM network_a a JOIN network_b b ON b.addr && a.addr; QUERY PLAN ----------------------------------------------------------------------------------- Nested Loop (cost=0.14..667.40 rows=13600 width=39) (actual time=0.034..58.196 rows=94973 loops=1) -> Seq Scan on network_b b (cost=0.00..23.60 rows=1360 width=32) (actual time=0.009..0.105 rows=1000 loops=1) -> Index Only Scan using spgist_idx_a on network_a a (cost=0.14..0.37 rows=10 width=7) (actual time=0.008..0.042 rows=95 loops=1000) Index Cond: (addr && a.addr) Heap Fetches: 94973 Planning time: 0.109 ms Execution time: 63.562 ms (7 rows)
  • 31. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com BRIN summarization for newBRIN summarization for new INSERTINSERTss • pre PG 10: perform VACUUM, or call brin_summarize_new_value() • NOW (Á. Herrera): – autovacuum daemon is now able to summarize now data in present ranges: • CREATE INDEX ON table USING brin (column) WITH (autosummarize=on); – It is possible to summarize/desummarized single blocks (bigint): • brin_summarize_range / brin_desummarize_range • BRIN are (still) not able to “shrinks” summarized data – if you update/delete boundary data, need to REINDEX
  • 32. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Other features about indexesOther features about indexes • Improve hash index performance (A. Kapila, M. Cy, A. Sharma) • Improve accuracy in determining if a BRIN index scan is beneficial (D. Rowley, E. Hasegeli) • Allow faster GiST INSERTs/UPDATEs by reusing index space efficiently (A. Borodin) • Reduce page locking during vacuuming of GIN indexes (A. Borodin)
  • 33. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com The future of indexes in PostgreSQLThe future of indexes in PostgreSQL • Allow compression/decompression AM functions in SP-GiST OpClasses (good for PostGIS!) • CREATE GLOBAL INDEX
  • 34. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com ConclusionsConclusions • PostgreSQL has a long tradition in indexes development • different types for different goals • an eye to the future
  • 35. PGDay.IT 2017 - 11th edition Milan October, 13th 2017 Giuseppe Broccolo g.broccolo.7@gmail.com Viralize.com Creative Commons licenseCreative Commons license This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License https://creativecommons.org/licenses/by-nc-sa/4.0/ © 2017 Giuseppe Broccolo, ITPUG – www.itpug.org/