SlideShare una empresa de Scribd logo
1 de 73
Descargar para leer sin conexión
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
PostgreSQL Indexing
Dublin, 2013
Hans-Jürgen Schönig
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Scope of this session:
- What a basic index does
- The PostgreSQL optimizer (cost model)
- Classical B-tree Indexes
- Partial / functional indexes
- Different types of indexes
- Full-Text-Search
- Fuzzy matching
- Writing your own indexing strategy
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Generating test data:
- for the purpose of this session we need a
table consisting of two columns:
test=# CREATE TABLE t_test (id serial, name text);
CREATE TABLE
test=# INSERT INTO t_test (name) VALUES ('hans');
INSERT 0 1
test=# INSERT INTO t_test (name) VALUES ('paul');
INSERT 0 1
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A lot more test data ...
- Let us create some more test data
by repeating the process
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2
...
test=# INSERT INTO t_test (name) SELECT name FROM t_test;
INSERT 0 2097152
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us see, how PostgreSQL executes a simple query:
test=# SELECT count(*) FROM t_test;
count
---------
4194304
(1 row)
Time: 431.192 ms
test=# explain analyze SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(actual time=0.013..531.448 rows=4194304 loops=1)
Total runtime: 977.917 ms
(3 rows)
Time: 1045.065 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Reading some data:
- Let us add a filter:
test=# SELECT count(*) FROM t_test WHERE id = 421234;
count
-------
1
(1 row)
Time: 476.965 ms
test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0)
(actual time=53.405..495.126 rows=1 loops=1)
Filter: (id = 421234)
Rows Removed by Filter: 4194303
Total runtime: 495.175 ms
(5 rows)
Time: 520.659 ms
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Sequentially reading data:
- In case you like reading the phone book sequentially
we are basically done.
- Sequentially reading the phone book is technically ok
=> but socially not accepted
- Defining an index is the desired solution
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Creating an index
test=# h CREATE INDEX
Command: CREATE INDEX
Description: define a new index
Syntax:
CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ]
ON table_name [ USING method ]
( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ]
[ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
[ WITH ( storage_parameter = value [, ... ] ) ]
[ TABLESPACE tablespace_name ]
[ WHERE predicate ]
- At the end of the day all clauses will be
covered by this training
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- A typical index:
test=# CREATE INDEX idx_id ON t_test (id);
CREATE INDEX
Time: 7357.663 ms
- This gives us a standard btree index
- PostgreSQL provides “High-Concurrency B-Trees”
(Lehman-Yao, 1981)
- Many people can modify the index at the same time
- Highly efficient B+ tree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- How a btree works:
8k
Root Node
...
Sorted
...
Forward chaining
Tabelle
Index
8k ...
Row
linp
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Indexing is beneficial
test=# explain analyze SELECT count(*)
FROM t_test
WHERE id = 421234;
QUERY PLAN
------------------------------------------------------------------------------
Aggregate (cost=8.73..8.74 rows=1 width=0)
(actual time=0.024..0.024 rows=1 loops=1)
-> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0)
(actual time=0.019..0.020 rows=1 loops=1)
Index Cond: (id = 421234)
Heap Fetches: 1
Total runtime: 0.057 ms
(5 rows)
Time: 0.395 ms
- A lot faster :).
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Still slow ...
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 787.407 ms
- This is still slow. Let us create an index ...
test=# CREATE INDEX idx_name ON t_test (name);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The benefit is exactly zero:
test=# SELECT count(*) FROM t_test WHERE name = 'hans';
count
---------
2097152
(1 row)
Time: 782.443 ms
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
- The index won't be used
- Too many identical values (“not selective”)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- The cost is far from zero:
test=# SELECT pg_size_pretty(pg_relation_size('t_test'));
pg_size_pretty
----------------
177 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_id'));
pg_size_pretty
----------------
90 MB
(1 row)
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
90 MB
(1 row)
- Indexes need a fair amount of space
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Input values DO make a difference:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.74..7.75 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- PostgreSQL will decide depending on the input value
=> cost based optimization
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
- In our example the index is only used in case
of rare or non-existing values
- What is the point of an index when its entire
content is totally useless?
=> a more selective strategy is needed
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Partial indexes:
test=# DROP INDEX idx_name;
DROP INDEX
test=# CREATE INDEX idx_name ON t_test (name)
WHERE name NOT IN ('hans', 'paul');
CREATE INDEX
test=# SELECT pg_size_pretty(pg_relation_size('idx_name'));
pg_size_pretty
----------------
8192 bytes
(1 row)
- A partial index reduces space consumption
- Benefit is still the same
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Equal benefit – lower cost:
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans';
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=80350.32..80350.33 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0)
Filter: (name = 'hans'::text)
(3 rows)
test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2';
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=7.28..7.29 rows=1 width=0)
-> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0)
Index Cond: (name = 'hans2'::text)
(3 rows)
- This is exactly the same as before !
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- What about functions?
test=# CREATE INDEX idx_cos ON t_test ( cos(id) );
CREATE INDEX
Time: 16867.228 ms
test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17;
QUERY PLAN
----------------------------------------------------------------------------------
Aggregate (cost=23960.99..23961.00 rows=1 width=0)
-> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0)
Recheck Cond: (cos((id)::double precision) = 17::double precision)
-> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0)
Index Cond: (cos((id)::double precision) = 17::double precision)
(5 rows)
- PostgreSQL provides functional indexes
- VERY nice to avoid additional columns
- Gives a lot of extra flexibility
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
1. Basic indexing:
- Type of functions allowed
- Functions must be deterministic
=> “immutable”
=> Functions can be written in almost any language
=> This is highly performance sensitive
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL decide on
index vs. no index?
- PostgreSQL uses statistics to estimate the number of
rows coming back
- Each operation will be assigned to costs
=> costs are just a number to compare
different options inside the planner
- Costs parameters can be changed at runtime
or globally
=> be careful, it can go against you
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- pg_stats is your friend:
test=# d pg_stats
View "pg_catalog.pg_stats"
Column | Type | Modifiers
-------------------------------+-----------+-----------
schemaname | name |
tablename | name |
attname | name |
inherited | boolean |
null_frac | real |
avg_width | integer |
n_distinct | real |
most_common_vals | anyarray |
most_common_freqs | real[] |
histogram_bounds | anyarray |
correlation | real |
most_common_elems | anyarray |
most_common_elem_freqs | real[] |
elem_count_histogram | real[] |
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Updating statistics
- System statistics are updated by ANALYZE:
test=# h ANALYZE
Command: ANALYZE
Description: collect statistics about a database
Syntax:
ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ]
- In most setups autovacuum is in charge
of updating pg_statistic
- In most cases statistics are not an issue
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- How does PostgreSQL estimate costs?
- seq_page_cost = 1
- random_page_cost = 4
- cpu_tuple_cost = 0.01
- cpu_operator_cost = 0.0025
- cpu_index_tuple_cost = 0.005
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (1):
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=75100.80..75100.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0)
(2 rows)
- total costs are at 75100.81
- costs are composed of I/O and CPU costs
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (2):
test=# SELECT pg_relation_size('t_test') / 8192;
?column?
----------
22672
(1 row)
- our table consists of 22672 blocks
- each block is 8kb in size
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Let us do the math (3):
The seq scan:
I/O cost = 22672 * seq_page_cost = 22672
4.194.304 * cpu_tuple_cost = 41943.04
= 64615.04 for the seq scan
The aggregate:
4.194.304 * cpu_operator_cost = 10485.76
Total costs => 75.100.80 + cpu_operator_cost
(we have to display the tuple)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Inflation at work:
test=# SET seq_page_cost TO 10;
SET
test=# explain SELECT count(*) FROM t_test;
QUERY PLAN
-----------------------------------------------------------------------
Aggregate (cost=279148.80..279148.81 rows=1 width=0)
-> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0)
(2 rows)
- Costs can be changed at runtime to fine tune
index usage
=> only do this if you are fully aware of what
you are doing. It can have unintended side
effects
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Spinning disks vs. SSDs
- Traditional disks are fast sequentially
and pretty bad when doing random
I/O
- SSDs fixed the problem.
=> consider changing random_page_cost
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Abusing tablespaces:
test=# ALTER TABLESPACE pg_default
SET (random_page_cost = 1);
ALTER TABLESPACE
- Allows different cost settings for various
disk subsystems
- It also allows to split “cached” and “uncached”
data -> ugly but useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# CREATE TABLE t_random AS SELECT *
FROM t_test
ORDER BY random();
SELECT 4194304
test=# CREATE INDEX idx_random ON t_random(id);
CREATE INDEX
test=# ANALYZE t_random;
ANALYZE
- The PostgreSQL optimizer considers the
physical order of rows on disk
- High-correlation will make indexes ways
more likely as the optimizer reduces its
estimates for I/O costs.
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Correlation and disk layout
test=# explain SELECT count(*) FROM t_test WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=75.35..75.36 rows=1 width=0)
-> Index Only Scan using idx_id on t_test
(cost=0.00..72.72 rows=1049 width=0)
Index Cond: (id < 1000)
(3 rows)
test=# explain SELECT count(*) FROM t_random WHERE id < 1000;
QUERY PLAN
-------------------------------------------------------------------------------
Aggregate (cost=950.31..950.32 rows=1 width=0)
-> Index Only Scan using idx_random on t_random
(cost=0.00..947.94 rows=947 width=0)
Index Cond: (id < 1000)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
2. The PostgreSQL cost model
- Implications:
- This is why different plans can pop up
EVEN if the data is the same
- There is no fixed amount of data making
PostgreSQL switch from index to
sequential scan
- High correlation can improve performance
=> consider clustering the table
test=# h CLUSTER
Command: CLUSTER
Description: cluster a table according to an index
Syntax:
CLUSTER [VERBOSE] table_name [ USING index_name ]
CLUSTER [VERBOSE]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Using OR / AND:
- PostgreSQL can use more than one index per
table per query
- PostgreSQL provides multi-column indexes
- What you might see is a so called “Bitmap Scan”
=> don't mix it up with Oracle Bitmap Indexes
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423;
QUERY PLAN
---------------------------------------------------------------------------
Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9)
Recheck Cond: ((id = 2343) OR (id = 423423))
-> BitmapOr (cost=9.44..9.44 rows=2 width=0)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 2343)
-> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0)
Index Cond: (id = 423423)
(7 rows)
- PostgreSQL will scan the index twice
- PostgreSQL will look for blocks in the underlying table
- The condition has to be re-evaluated
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Bitmap scans:
test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef';
QUERY PLAN
-----------------------------------------------------------------------
Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9)
Index Cond: (name = 'josef'::text)
Filter: (id = 2343)
(3 rows)
- PostgreSQL does not always use two indexes
when you have 2 quals
- The more selective index might be enough
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Multicolumn indexes:
test=# DROP INDEX idx_id;
DROP INDEX
test=# CREATE INDEX idx_combined ON t_test (id, name);
CREATE INDEX
test=# explain SELECT * FROM t_test WHERE id = 10;
QUERY PLAN
--------------------------------------------------------------------------------
Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9)
Index Cond: (id = 10)
(2 rows)
- PostgreSQL can use parts of those column IF they are
in the first part(s) of the index
- Imagine a phone book; it is just liked a combined index
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
3. Indexing many columns
- Many indexes or combined indexes?
- It depends on what you want to query
- If you always use the first conditions in the index
a combined index might be a good idea
- Many indexes are more flexible but maybe not perfect
- Sometimes a mixed-strategy can be useful
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
4. Indexes to provide order
- b-tress can be used for more than searching
- Binary trees provide you with order.
- Order helps to avoid repeated sorting.
test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10;
QUERY PLAN
--------------------------------------------------------------------------------------
Limit (cost=0.00..0.31 rows=10 width=9)
-> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9)
(2 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
5. Dealing with upper / lowercase
- Upper and lower case searches are common:
- If you want to do case-insensitive, don't use
a functional index
- Consider using “citext”
test=# CREATE EXTENSION citext;
CREATE EXTENSION
test=# SELECT 'ABC'::citext = 'abc'::citext;
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- PostgreSQL supports more than just btrees
- B-Trees are fine if you are interested in things
which can be sorted
- Try to sort polygons => you won't find them
- Geometric data and Full-Text-Search need
different algorithms
NOTE: This is not about, which index is faster.
This is about the correct ALGORITHM
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Index types provided by PostgreSQL
- B-Trees
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
6. Different types of indexes
- Indexes and algorithms
- B-Trees: numbers, text, dates, etc.
- Gist: Generalized Search Tree
- Gin: Generalized Inverted Index
- Sp-Gist: Space Partitioned Gist
- Hash
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- Gist operates on different principles
than btree
- it supports “contains”, “left of”, “overlaps”, etc.
- “contains”, etc. are good for
=> Full Text Search
=> Geometric operations (PostGIS, etc.)
=> Finding genome sequences
=> Handling ranges (time, etc.)
=> Fuzzy search
- Gist allows KNN-search
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. Gist indexes
- How it works internally ...
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- Gist is a so called inverted index
- Used for Full Text Search
- If you have 1 mio documents containing the word
“house”. Do you really want to have house inside
the index 1 mio times?
=> Binary tree for words
=> A document list for each word
=> Classical approach to text search
- FTS is not about “=”, it is about “contains”
=> forget btree
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. GIN indexes
- GIN internal workings:
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- SP-Gist is a space partitioned index
- Can be used for a variety of algorithms, which use
space partitioning
=> quad trees
=> suffix trees
=> k-d trees
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
7. SP-Gist indexes
- Quad trees: A prototype example ...
- We want to insert ... (6, 4) and (2, 8)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming:
- Before searching, it makes sense to perform
“stemming”
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car');
to_tsvector
-----------------------------------------
'better':5 'car':3,11 'mani':2 'one':10
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Stemming is language dependent:
- Stemming works nicely for “roman” languages
=> it is hard to do this for chinese and so on
test=# SELECT to_tsvector('english', 'i am'),
to_tsvector('german', 'i am'),
to_tsvector('dutch', 'i am');
to_tsvector | to_tsvector | to_tsvector
-------------+-------------+--------------
| 'i':1 | 'am':2 'i':1
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- “contains” is your friend:
- ts_query compares a search string with a so called
ts_vector:
test=# SELECT to_tsvector('english', 'having many cars is better
than to have just one car')
@@ to_tsquery('english', 'car');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Indexing is easy:
- All you need is a functional index
- Alternatively the stemmed content can be
“materialized” in a separate column
CREATE INDEX idx_fti ON t_test
USING gist (to_tsvector('german', name));
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- ts_vector and ts_query magic
- PostgreSQL allows you to use “and” (&)
and “or” (|)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', 'car & truck');
?column?
----------
f
(1 row)
test=# SELECT to_tsvector('english', 'having many cars is better than
to have just one car')
@@ to_tsquery('english', '(car | truck) & many');
?column?
----------
t
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- A stupid question: What is a “word”?
- PostgreSQL is NOT limited to textual search
- Remember, it is all about “contains” ...
- Create yourself your own parser:
test=# h CREATE TEXT SEARCH PARSER
Command: CREATE TEXT SEARCH PARSER
Description: define a new text search parser
Syntax:
CREATE TEXT SEARCH PARSER name (
START = start_function ,
GETTOKEN = gettoken_function ,
END = end_function ,
LEXTYPES = lextypes_function
[, HEADLINE = headline_function ]
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility (2):
test=# h CREATE TEXT SEARCH CONFIGURATION
Command: CREATE TEXT SEARCH CONFIGURATION
Description: define a new text search configuration
Syntax:
CREATE TEXT SEARCH CONFIGURATION name (
PARSER = parser_name |
COPY = source_config
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
8. Full Text Search
- Even more flexibility:
test=# h CREATE TEXT SEARCH DICTIONARY
Command: CREATE TEXT SEARCH DICTIONARY
Description: define a new text search dictionary
Syntax:
CREATE TEXT SEARCH DICTIONARY name (
TEMPLATE = template
[, option = value [, ... ]]
)
test=# h CREATE TEXT SEARCH TEMPLATE
Command: CREATE TEXT SEARCH TEMPLATE
Description: define a new text search template
Syntax:
CREATE TEXT SEARCH TEMPLATE name (
[ INIT = init_function , ]
LEXIZE = lexize_function
)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- What does it take to organize a btree?
Operator Strategy number
< 1
<= 2
= 3
>= 4
< 5
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Why care?
- The way numbers are treated is pretty “common”
- How about sorting this one?
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Defining indexing strategies
- We can write our own operators
- Those operators can be assigned to an operator
class, which will tell the index how to “behave”
“2305 09 04 78”
“4353 07 06 77”
=> it seems the sort order is correct as shown
=> it isn't – it is an Austrian social security number
=> 1977 was before 1978 and not other way round
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (1):
test=# CREATE OR REPLACE FUNCTION normalize_si(text)
RETURNS text AS $$
BEGIN
RETURN substring($1, 9, 2) ||
substring($1, 7, 2) ||
substring($1, 5, 2) ||
substring($1, 1, 4);
END; $$
LANGUAGE 'plpgsql' IMMUTABLE;
CREATE FUNCTION
test=# SELECT normalize_si('2305090478');
normalize_si
--------------
7804092305
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Writing an operator (2):
test=# CREATE OR REPLACE FUNCTION si_lt(text, text)
RETURNS boolean AS
$$
BEGIN
RETURN normalize_si($1) < normalize_si($2);
END;
$$ LANGUAGE 'plpgsql' IMMUTABLE;
test=# CREATE OPERATOR <# (
PROCEDURE=si_lt,
LEFTARG=text,
RIGHTARG=text);
CREATE OPERATOR
CREATE FUNCTION
test=# SELECT '2305090478'::text <# '4353070677'::text;
?column?
----------
f
(1 row)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
9. Operator classes
- Creating the operator class:
- write operators for all operations needed
- write “support functions” (= “same”, etc.)
- make sure that the most important strategies
have proper operators
test=# h CREATE OPERATOR CLASS
Command: CREATE OPERATOR CLASS
Description: define a new operator class
Syntax:
CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type
USING index_method [ FAMILY family_name ] AS
{ OPERATOR strategy_number operator_name [ ( op_type, op_type ) ]
[ FOR SEARCH | FOR ORDER BY sort_family_name ]
| FUNCTION support_number [ ( op_type [ , op_type ] ) ]
function_name ( argument_type [, ...] )
| STORAGE storage_type
} [, ... ]
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- pg_trgm
- Trigrams are perfect to perform fuzzy matching
- Trigrams can be used nicely along with KNN-search
- pg_trgm is available as extension to PostgreSQL
test=# CREATE EXTENSION pg_trgm;
CREATE EXTENSION
- Problem: “What is the proper way to spell the name of this
village?
“gramatneusiedl” vs. “grammatneusiedel”?
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm
test=# CREATE TABLE t_search AS
SELECT relname::text
FROM pg_class;
SELECT 303
test=# CREATE INDEX idx_trgm
ON t_search USING gist(relname gist_trgm_ops);
CREATE INDEX
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- Testing pg_trgm (2):
test=# SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
relname | ?column?
--------------------------------+----------
pg_class | 0.454545
pg_opclass | 0.538462
pg_class_oid_index | 0.714286
pg_opclass_oid_index | 0.727273
pg_class_relname_nsp_index | 0.793103
pg_opclass_am_name_nsp_index | 0.8
pg_seclabel | 0.823529
pg_am | 0.833333
pg_seclabels | 0.833333
pg_shseclabel | 0.842105
(10 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
10. Available operator classes
- KNN in action:
test=# explain SELECT *, 'pgclass' <-> relname
FROM t_search
ORDER BY 'pgclass' <-> relname
LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------------------
Limit (cost=0.14..1.40 rows=10 width=19)
-> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19)
Order By: (relname <-> 'pgclass'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- LIKE can be indexed in some cases:
- The PostgreSQL optimizer can rewrite queries featuring LIKE
in a fancy and efficient way
=> The goal is to find the “next character” in line
and query for a range
- This kind of rewrite only works when the next character
Is actually knows to PostgreSQL
- Special operator classes might be needed
=> varchar_pattern_ops, text_pattern_ops
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
11. Traditional LIKE
- An example:
test=# CREATE INDEX idx_relname
ON t_search (relname);
CREATE INDEX
test=# SET enable_seqscan TO off;
SET
test=# explain SELECT relname
FROM t_search
WHERE relname LIKE 'abc%';
QUERY PLAN
----------------------------------------------------------------------------------
Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19)
Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text))
Filter: (relname ~~ 'abc%'::text)
(3 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
12. Indexing MIN / MAX
- An example:
- MIN / MAX works by reading the index from left
and right (backward scan)
test=# explain SELECT min(relname), max(relname) FROM t_search;
QUERY PLAN
----------------------------------------------------------------------------------
Result (cost=0.74..0.75 rows=1 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan using idx_relname on t_search
(cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
InitPlan 2 (returns $1)
-> Limit (cost=0.27..0.37 rows=1 width=19)
-> Index Only Scan Backward using idx_relname on
t_search t_search_1 (cost=0.27..29.57 rows=303 width=19)
Index Cond: (relname IS NOT NULL)
(9 rows)
Cybertec Schönig & Schönig GmbH
Hans-Jürgen Schönig, www.postgresql-support.de
Any question?
Thanks you for your attention
Any question?

Más contenido relacionado

La actualidad más candente

Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会Masahiko Sawada
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015PostgreSQL-Consulting
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuningelliando dias
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and OptimizationPgDay.Seoul
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpen Gurukul
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slidesmetsarin
 
PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会Satoshi Yamada
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1Federico Campoli
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundMasahiko Sawada
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBScaleGrid.io
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Kyle Hailey
 
Dbts2013 特濃jpoug log_file_sync
Dbts2013 特濃jpoug log_file_syncDbts2013 特濃jpoug log_file_sync
Dbts2013 特濃jpoug log_file_syncKoji Shinkubo
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsCommand Prompt., Inc
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuningFederico Campoli
 
まずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニングまずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニングKosuke Kida
 

La actualidad más candente (20)

Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会Inside vacuum - 第一回PostgreSQLプレ勉強会
Inside vacuum - 第一回PostgreSQLプレ勉強会
 
pg_dbms_statsの紹介
pg_dbms_statsの紹介pg_dbms_statsの紹介
pg_dbms_statsの紹介
 
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
How does PostgreSQL work with disks: a DBA's checklist in detail. PGConf.US 2015
 
PostgreSQL Performance Tuning
PostgreSQL Performance TuningPostgreSQL Performance Tuning
PostgreSQL Performance Tuning
 
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
[Pgday.Seoul 2021] 2. Porting Oracle UDF and Optimization
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
OpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQLOpenGurukul : Database : PostgreSQL
OpenGurukul : Database : PostgreSQL
 
Postgresql
PostgresqlPostgresql
Postgresql
 
PostgreSQL Database Slides
PostgreSQL Database SlidesPostgreSQL Database Slides
PostgreSQL Database Slides
 
PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会PostgreSQL実行計画入門@関西PostgreSQL勉強会
PostgreSQL実行計画入門@関西PostgreSQL勉強会
 
Postgresql database administration volume 1
Postgresql database administration volume 1Postgresql database administration volume 1
Postgresql database administration volume 1
 
Introduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparoundIntroduction VAUUM, Freezing, XID wraparound
Introduction VAUUM, Freezing, XID wraparound
 
Working with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDBWorking with JSON Data in PostgreSQL vs. MongoDB
Working with JSON Data in PostgreSQL vs. MongoDB
 
Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle Ash masters : advanced ash analytics on Oracle
Ash masters : advanced ash analytics on Oracle
 
SQL Tuning 101
SQL Tuning 101SQL Tuning 101
SQL Tuning 101
 
Dbts2013 特濃jpoug log_file_sync
Dbts2013 特濃jpoug log_file_syncDbts2013 特濃jpoug log_file_sync
Dbts2013 特濃jpoug log_file_sync
 
PostgreSQL Administration for System Administrators
PostgreSQL Administration for System AdministratorsPostgreSQL Administration for System Administrators
PostgreSQL Administration for System Administrators
 
PostgreSql query planning and tuning
PostgreSql query planning and tuningPostgreSql query planning and tuning
PostgreSql query planning and tuning
 
まずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニングまずやっとくPostgreSQLチューニング
まずやっとくPostgreSQLチューニング
 

Destacado

Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesJonathan Katz
 
Database index
Database indexDatabase index
Database indexRiteshkiit
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerConfio Software
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsIDERA Software
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User PrivilegesArena PLM
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenHans-Jürgen Schönig
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logHans-Jürgen Schönig
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesHans-Jürgen Schönig
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In SqlAnurag
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityHans-Jürgen Schönig
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humansCraig Kerstiens
 

Destacado (20)

Indexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data TypesIndexing Complex PostgreSQL Data Types
Indexing Complex PostgreSQL Data Types
 
Database index
Database indexDatabase index
Database index
 
Advanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL ServerAdvanced Index, Partitioning and Compression Strategies for SQL Server
Advanced Index, Partitioning and Compression Strategies for SQL Server
 
Geek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing BasicsGeek Sync | SQL Server Indexing Basics
Geek Sync | SQL Server Indexing Basics
 
Les11 Including Constraints
Les11 Including ConstraintsLes11 Including Constraints
Les11 Including Constraints
 
Indexing basics
Indexing basicsIndexing basics
Indexing basics
 
Advanced User Privileges
Advanced User PrivilegesAdvanced User Privileges
Advanced User Privileges
 
Less07 Users
Less07 UsersLess07 Users
Less07 Users
 
Writing optimal queries
Writing optimal queriesWriting optimal queries
Writing optimal queries
 
Postgre sql unleashed
Postgre sql unleashedPostgre sql unleashed
Postgre sql unleashed
 
5min analyse
5min analyse5min analyse
5min analyse
 
PostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreibenPostgreSQL: Eigene Aggregate schreiben
PostgreSQL: Eigene Aggregate schreiben
 
Walbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction logWalbouncer: Filtering PostgreSQL transaction log
Walbouncer: Filtering PostgreSQL transaction log
 
Explain explain
Explain explainExplain explain
Explain explain
 
PostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tablesPostgreSQL: Joining 1 million tables
PostgreSQL: Joining 1 million tables
 
PostgreSQL: The NoSQL way
PostgreSQL: The NoSQL wayPostgreSQL: The NoSQL way
PostgreSQL: The NoSQL way
 
Constraints In Sql
Constraints In SqlConstraints In Sql
Constraints In Sql
 
Indexes
IndexesIndexes
Indexes
 
PostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database securityPostgreSQL instance encryption: More database security
PostgreSQL instance encryption: More database security
 
Postgres performance for humans
Postgres performance for humansPostgres performance for humans
Postgres performance for humans
 

Similar a PostgreSQL Indexing Strategies for Performance Optimization

Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009mattsmiley
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migrationHeribertus Bramundito
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQLSatoshi Nagayasu
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsNirav Shah
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondTomas Vondra
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Ontico
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?alexbrasetvik
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfEric Xiao
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesDave Stokes
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenPostgresOpen
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetLucian Oprea
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceBiju Nair
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?Mydbops
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen TatarynovFwdays
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfgaros1
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitDave Stokes
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007paulguerin
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalM Malai
 

Similar a PostgreSQL Indexing Strategies for Performance Optimization (20)

Basic Query Tuning Primer
Basic Query Tuning PrimerBasic Query Tuning Primer
Basic Query Tuning Primer
 
Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009Basic Query Tuning Primer - Pg West 2009
Basic Query Tuning Primer - Pg West 2009
 
Checking clustering factor to detect row migration
Checking clustering factor to detect row migrationChecking clustering factor to detect row migration
Checking clustering factor to detect row migration
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Top 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tipsTop 10 Oracle SQL tuning tips
Top 10 Oracle SQL tuning tips
 
Performance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyondPerformance improvements in PostgreSQL 9.5 and beyond
Performance improvements in PostgreSQL 9.5 and beyond
 
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
Postgres can do THAT?
Postgres can do THAT?Postgres can do THAT?
Postgres can do THAT?
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre FeaturesLinuxfest Northwest 2022 - MySQL 8.0 Nre Features
Linuxfest Northwest 2022 - MySQL 8.0 Nre Features
 
Chapter15
Chapter15Chapter15
Chapter15
 
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres OpenJohn Melesky - Federating Queries Using Postgres FDW @ Postgres Open
John Melesky - Federating Queries Using Postgres FDW @ Postgres Open
 
PostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_CheatsheetPostgreSQL High_Performance_Cheatsheet
PostgreSQL High_Performance_Cheatsheet
 
Using Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve PerformaceUsing Netezza Query Plan to Improve Performace
Using Netezza Query Plan to Improve Performace
 
What is new in PostgreSQL 14?
What is new in PostgreSQL 14?What is new in PostgreSQL 14?
What is new in PostgreSQL 14?
 
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov"Optimization of a .NET application- is it simple ! / ?",  Yevhen Tatarynov
"Optimization of a .NET application- is it simple ! / ?", Yevhen Tatarynov
 
query-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdfquery-optimization-techniques_talk.pdf
query-optimization-techniques_talk.pdf
 
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source SummitMySQL 8.0 New Features -- September 27th presentation for Open Source Summit
MySQL 8.0 New Features -- September 27th presentation for Open Source Summit
 
Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007Myth busters - performance tuning 101 2007
Myth busters - performance tuning 101 2007
 
Introduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-finalIntroduction to-mongo db-execution-plan-optimizer-final
Introduction to-mongo db-execution-plan-optimizer-final
 

Último

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfDrew Moseley
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...Akihiro Suda
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Matt Ray
 

Último (20)

Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Comparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdfComparing Linux OS Image Update Models - EOSS 2024.pdf
Comparing Linux OS Image Update Models - EOSS 2024.pdf
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
20240415 [Container Plumbing Days] Usernetes Gen2 - Kubernetes in Rootless Do...
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
Open Source Summit NA 2024: Open Source Cloud Costs - OpenCost's Impact on En...
 

PostgreSQL Indexing Strategies for Performance Optimization

  • 1. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de PostgreSQL Indexing Dublin, 2013 Hans-Jürgen Schönig
  • 2. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Scope of this session: - What a basic index does - The PostgreSQL optimizer (cost model) - Classical B-tree Indexes - Partial / functional indexes - Different types of indexes - Full-Text-Search - Fuzzy matching - Writing your own indexing strategy
  • 3. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Generating test data: - for the purpose of this session we need a table consisting of two columns: test=# CREATE TABLE t_test (id serial, name text); CREATE TABLE test=# INSERT INTO t_test (name) VALUES ('hans'); INSERT 0 1 test=# INSERT INTO t_test (name) VALUES ('paul'); INSERT 0 1
  • 4. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 5. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A lot more test data ... - Let us create some more test data by repeating the process test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2 ... test=# INSERT INTO t_test (name) SELECT name FROM t_test; INSERT 0 2097152
  • 6. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us see, how PostgreSQL executes a simple query: test=# SELECT count(*) FROM t_test; count --------- 4194304 (1 row) Time: 431.192 ms test=# explain analyze SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=977.865..977.865 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (actual time=0.013..531.448 rows=4194304 loops=1) Total runtime: 977.917 ms (3 rows) Time: 1045.065 ms
  • 7. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Reading some data: - Let us add a filter: test=# SELECT count(*) FROM t_test WHERE id = 421234; count ------- 1 (1 row) Time: 476.965 ms test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) (actual time=495.134..495.135 rows=1 loops=1) -> Seq Scan on t_test (cost=0.00..75100.80 rows=1 width=0) (actual time=53.405..495.126 rows=1 loops=1) Filter: (id = 421234) Rows Removed by Filter: 4194303 Total runtime: 495.175 ms (5 rows) Time: 520.659 ms
  • 8. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Sequentially reading data: - In case you like reading the phone book sequentially we are basically done. - Sequentially reading the phone book is technically ok => but socially not accepted - Defining an index is the desired solution
  • 9. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Creating an index test=# h CREATE INDEX Command: CREATE INDEX Description: define a new index Syntax: CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name ] ON table_name [ USING method ] ( { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ) [ WITH ( storage_parameter = value [, ... ] ) ] [ TABLESPACE tablespace_name ] [ WHERE predicate ] - At the end of the day all clauses will be covered by this training
  • 10. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - A typical index: test=# CREATE INDEX idx_id ON t_test (id); CREATE INDEX Time: 7357.663 ms - This gives us a standard btree index - PostgreSQL provides “High-Concurrency B-Trees” (Lehman-Yao, 1981) - Many people can modify the index at the same time - Highly efficient B+ tree
  • 11. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - How a btree works: 8k Root Node ... Sorted ... Forward chaining Tabelle Index 8k ... Row linp
  • 12. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Indexing is beneficial test=# explain analyze SELECT count(*) FROM t_test WHERE id = 421234; QUERY PLAN ------------------------------------------------------------------------------ Aggregate (cost=8.73..8.74 rows=1 width=0) (actual time=0.024..0.024 rows=1 loops=1) -> Index Only Scan using idx_id on t_test (cost=0.00..8.73 rows=1 width=0) (actual time=0.019..0.020 rows=1 loops=1) Index Cond: (id = 421234) Heap Fetches: 1 Total runtime: 0.057 ms (5 rows) Time: 0.395 ms - A lot faster :).
  • 13. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Still slow ... test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 787.407 ms - This is still slow. Let us create an index ... test=# CREATE INDEX idx_name ON t_test (name); CREATE INDEX
  • 14. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The benefit is exactly zero: test=# SELECT count(*) FROM t_test WHERE name = 'hans'; count --------- 2097152 (1 row) Time: 782.443 ms test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) - The index won't be used - Too many identical values (“not selective”)
  • 15. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - The cost is far from zero: test=# SELECT pg_size_pretty(pg_relation_size('t_test')); pg_size_pretty ---------------- 177 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_id')); pg_size_pretty ---------------- 90 MB (1 row) test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 90 MB (1 row) - Indexes need a fair amount of space
  • 16. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Input values DO make a difference: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.74..7.75 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.74 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - PostgreSQL will decide depending on the input value => cost based optimization
  • 17. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: - In our example the index is only used in case of rare or non-existing values - What is the point of an index when its entire content is totally useless? => a more selective strategy is needed
  • 18. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Partial indexes: test=# DROP INDEX idx_name; DROP INDEX test=# CREATE INDEX idx_name ON t_test (name) WHERE name NOT IN ('hans', 'paul'); CREATE INDEX test=# SELECT pg_size_pretty(pg_relation_size('idx_name')); pg_size_pretty ---------------- 8192 bytes (1 row) - A partial index reduces space consumption - Benefit is still the same
  • 19. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Equal benefit – lower cost: test=# explain SELECT count(*) FROM t_test WHERE name = 'hans'; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=80350.32..80350.33 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..75100.80 rows=2099808 width=0) Filter: (name = 'hans'::text) (3 rows) test=# explain SELECT count(*) FROM t_test WHERE name = 'hans2'; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=7.28..7.29 rows=1 width=0) -> Index Only Scan using idx_name on t_test (cost=0.00..7.28 rows=1 width=0) Index Cond: (name = 'hans2'::text) (3 rows) - This is exactly the same as before !
  • 20. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - What about functions? test=# CREATE INDEX idx_cos ON t_test ( cos(id) ); CREATE INDEX Time: 16867.228 ms test=# explain SELECT count(*) FROM t_test WHERE cos(id) = 17; QUERY PLAN ---------------------------------------------------------------------------------- Aggregate (cost=23960.99..23961.00 rows=1 width=0) -> Bitmap Heap Scan on t_test (cost=395.25..23908.56 rows=20972 width=0) Recheck Cond: (cos((id)::double precision) = 17::double precision) -> Bitmap Index Scan on idx_cos (cost=0.00..390.01 rows=20972 width=0) Index Cond: (cos((id)::double precision) = 17::double precision) (5 rows) - PostgreSQL provides functional indexes - VERY nice to avoid additional columns - Gives a lot of extra flexibility
  • 21. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 1. Basic indexing: - Type of functions allowed - Functions must be deterministic => “immutable” => Functions can be written in almost any language => This is highly performance sensitive
  • 22. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL decide on index vs. no index? - PostgreSQL uses statistics to estimate the number of rows coming back - Each operation will be assigned to costs => costs are just a number to compare different options inside the planner - Costs parameters can be changed at runtime or globally => be careful, it can go against you
  • 23. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - pg_stats is your friend: test=# d pg_stats View "pg_catalog.pg_stats" Column | Type | Modifiers -------------------------------+-----------+----------- schemaname | name | tablename | name | attname | name | inherited | boolean | null_frac | real | avg_width | integer | n_distinct | real | most_common_vals | anyarray | most_common_freqs | real[] | histogram_bounds | anyarray | correlation | real | most_common_elems | anyarray | most_common_elem_freqs | real[] | elem_count_histogram | real[] |
  • 24. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Updating statistics - System statistics are updated by ANALYZE: test=# h ANALYZE Command: ANALYZE Description: collect statistics about a database Syntax: ANALYZE [ VERBOSE ] [ table_name [ ( column_name [, ...] ) ] ] - In most setups autovacuum is in charge of updating pg_statistic - In most cases statistics are not an issue
  • 25. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - How does PostgreSQL estimate costs? - seq_page_cost = 1 - random_page_cost = 4 - cpu_tuple_cost = 0.01 - cpu_operator_cost = 0.0025 - cpu_index_tuple_cost = 0.005
  • 26. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (1): test=# explain SELECT count(*) FROM t_test; QUERY PLAN ---------------------------------------------------------------------- Aggregate (cost=75100.80..75100.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..64615.04 rows=4194304 width=0) (2 rows) - total costs are at 75100.81 - costs are composed of I/O and CPU costs
  • 27. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (2): test=# SELECT pg_relation_size('t_test') / 8192; ?column? ---------- 22672 (1 row) - our table consists of 22672 blocks - each block is 8kb in size
  • 28. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Let us do the math (3): The seq scan: I/O cost = 22672 * seq_page_cost = 22672 4.194.304 * cpu_tuple_cost = 41943.04 = 64615.04 for the seq scan The aggregate: 4.194.304 * cpu_operator_cost = 10485.76 Total costs => 75.100.80 + cpu_operator_cost (we have to display the tuple)
  • 29. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Inflation at work: test=# SET seq_page_cost TO 10; SET test=# explain SELECT count(*) FROM t_test; QUERY PLAN ----------------------------------------------------------------------- Aggregate (cost=279148.80..279148.81 rows=1 width=0) -> Seq Scan on t_test (cost=0.00..268663.04 rows=4194304 width=0) (2 rows) - Costs can be changed at runtime to fine tune index usage => only do this if you are fully aware of what you are doing. It can have unintended side effects
  • 30. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Spinning disks vs. SSDs - Traditional disks are fast sequentially and pretty bad when doing random I/O - SSDs fixed the problem. => consider changing random_page_cost
  • 31. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Abusing tablespaces: test=# ALTER TABLESPACE pg_default SET (random_page_cost = 1); ALTER TABLESPACE - Allows different cost settings for various disk subsystems - It also allows to split “cached” and “uncached” data -> ugly but useful
  • 32. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# CREATE TABLE t_random AS SELECT * FROM t_test ORDER BY random(); SELECT 4194304 test=# CREATE INDEX idx_random ON t_random(id); CREATE INDEX test=# ANALYZE t_random; ANALYZE - The PostgreSQL optimizer considers the physical order of rows on disk - High-correlation will make indexes ways more likely as the optimizer reduces its estimates for I/O costs.
  • 33. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Correlation and disk layout test=# explain SELECT count(*) FROM t_test WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=75.35..75.36 rows=1 width=0) -> Index Only Scan using idx_id on t_test (cost=0.00..72.72 rows=1049 width=0) Index Cond: (id < 1000) (3 rows) test=# explain SELECT count(*) FROM t_random WHERE id < 1000; QUERY PLAN ------------------------------------------------------------------------------- Aggregate (cost=950.31..950.32 rows=1 width=0) -> Index Only Scan using idx_random on t_random (cost=0.00..947.94 rows=947 width=0) Index Cond: (id < 1000) (3 rows)
  • 34. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 2. The PostgreSQL cost model - Implications: - This is why different plans can pop up EVEN if the data is the same - There is no fixed amount of data making PostgreSQL switch from index to sequential scan - High correlation can improve performance => consider clustering the table test=# h CLUSTER Command: CLUSTER Description: cluster a table according to an index Syntax: CLUSTER [VERBOSE] table_name [ USING index_name ] CLUSTER [VERBOSE]
  • 35. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Using OR / AND: - PostgreSQL can use more than one index per table per query - PostgreSQL provides multi-column indexes - What you might see is a so called “Bitmap Scan” => don't mix it up with Oracle Bitmap Indexes
  • 36. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 OR id = 423423; QUERY PLAN --------------------------------------------------------------------------- Bitmap Heap Scan on t_test (cost=9.44..17.41 rows=2 width=9) Recheck Cond: ((id = 2343) OR (id = 423423)) -> BitmapOr (cost=9.44..9.44 rows=2 width=0) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 2343) -> Bitmap Index Scan on idx_id (cost=0.00..4.72 rows=1 width=0) Index Cond: (id = 423423) (7 rows) - PostgreSQL will scan the index twice - PostgreSQL will look for blocks in the underlying table - The condition has to be re-evaluated
  • 37. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Bitmap scans: test=# explain SELECT * FROM t_test WHERE id = 2343 AND name = 'josef'; QUERY PLAN ----------------------------------------------------------------------- Index Scan using idx_name on t_test (cost=0.00..8.27 rows=1 width=9) Index Cond: (name = 'josef'::text) Filter: (id = 2343) (3 rows) - PostgreSQL does not always use two indexes when you have 2 quals - The more selective index might be enough
  • 38. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Multicolumn indexes: test=# DROP INDEX idx_id; DROP INDEX test=# CREATE INDEX idx_combined ON t_test (id, name); CREATE INDEX test=# explain SELECT * FROM t_test WHERE id = 10; QUERY PLAN -------------------------------------------------------------------------------- Index Only Scan using idx_combined on t_test (cost=0.00..8.91 rows=1 width=9) Index Cond: (id = 10) (2 rows) - PostgreSQL can use parts of those column IF they are in the first part(s) of the index - Imagine a phone book; it is just liked a combined index
  • 39. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 3. Indexing many columns - Many indexes or combined indexes? - It depends on what you want to query - If you always use the first conditions in the index a combined index might be a good idea - Many indexes are more flexible but maybe not perfect - Sometimes a mixed-strategy can be useful
  • 40. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 4. Indexes to provide order - b-tress can be used for more than searching - Binary trees provide you with order. - Order helps to avoid repeated sorting. test=# explain SELECT * FROM t_test ORDER BY id LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------- Limit (cost=0.00..0.31 rows=10 width=9) -> Index Scan using idx_id on t_test (cost=0.00..131602.27 rows=4194304 width=9) (2 rows)
  • 41. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 5. Dealing with upper / lowercase - Upper and lower case searches are common: - If you want to do case-insensitive, don't use a functional index - Consider using “citext” test=# CREATE EXTENSION citext; CREATE EXTENSION test=# SELECT 'ABC'::citext = 'abc'::citext; ?column? ---------- t (1 row)
  • 42. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - PostgreSQL supports more than just btrees - B-Trees are fine if you are interested in things which can be sorted - Try to sort polygons => you won't find them - Geometric data and Full-Text-Search need different algorithms NOTE: This is not about, which index is faster. This is about the correct ALGORITHM
  • 43. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Index types provided by PostgreSQL - B-Trees - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 44. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 6. Different types of indexes - Indexes and algorithms - B-Trees: numbers, text, dates, etc. - Gist: Generalized Search Tree - Gin: Generalized Inverted Index - Sp-Gist: Space Partitioned Gist - Hash
  • 45. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - Gist operates on different principles than btree - it supports “contains”, “left of”, “overlaps”, etc. - “contains”, etc. are good for => Full Text Search => Geometric operations (PostGIS, etc.) => Finding genome sequences => Handling ranges (time, etc.) => Fuzzy search - Gist allows KNN-search
  • 46. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. Gist indexes - How it works internally ...
  • 47. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - Gist is a so called inverted index - Used for Full Text Search - If you have 1 mio documents containing the word “house”. Do you really want to have house inside the index 1 mio times? => Binary tree for words => A document list for each word => Classical approach to text search - FTS is not about “=”, it is about “contains” => forget btree
  • 48. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. GIN indexes - GIN internal workings:
  • 49. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - SP-Gist is a space partitioned index - Can be used for a variety of algorithms, which use space partitioning => quad trees => suffix trees => k-d trees
  • 50. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 7. SP-Gist indexes - Quad trees: A prototype example ... - We want to insert ... (6, 4) and (2, 8)
  • 51. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming: - Before searching, it makes sense to perform “stemming” test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car'); to_tsvector ----------------------------------------- 'better':5 'car':3,11 'mani':2 'one':10 (1 row)
  • 52. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Stemming is language dependent: - Stemming works nicely for “roman” languages => it is hard to do this for chinese and so on test=# SELECT to_tsvector('english', 'i am'), to_tsvector('german', 'i am'), to_tsvector('dutch', 'i am'); to_tsvector | to_tsvector | to_tsvector -------------+-------------+-------------- | 'i':1 | 'am':2 'i':1 (1 row)
  • 53. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 54. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - “contains” is your friend: - ts_query compares a search string with a so called ts_vector: test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car'); ?column? ---------- t (1 row)
  • 55. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Indexing is easy: - All you need is a functional index - Alternatively the stemmed content can be “materialized” in a separate column CREATE INDEX idx_fti ON t_test USING gist (to_tsvector('german', name));
  • 56. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - ts_vector and ts_query magic - PostgreSQL allows you to use “and” (&) and “or” (|) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', 'car & truck'); ?column? ---------- f (1 row) test=# SELECT to_tsvector('english', 'having many cars is better than to have just one car') @@ to_tsquery('english', '(car | truck) & many'); ?column? ---------- t (1 row)
  • 57. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - A stupid question: What is a “word”? - PostgreSQL is NOT limited to textual search - Remember, it is all about “contains” ... - Create yourself your own parser: test=# h CREATE TEXT SEARCH PARSER Command: CREATE TEXT SEARCH PARSER Description: define a new text search parser Syntax: CREATE TEXT SEARCH PARSER name ( START = start_function , GETTOKEN = gettoken_function , END = end_function , LEXTYPES = lextypes_function [, HEADLINE = headline_function ] )
  • 58. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility (2): test=# h CREATE TEXT SEARCH CONFIGURATION Command: CREATE TEXT SEARCH CONFIGURATION Description: define a new text search configuration Syntax: CREATE TEXT SEARCH CONFIGURATION name ( PARSER = parser_name | COPY = source_config )
  • 59. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 8. Full Text Search - Even more flexibility: test=# h CREATE TEXT SEARCH DICTIONARY Command: CREATE TEXT SEARCH DICTIONARY Description: define a new text search dictionary Syntax: CREATE TEXT SEARCH DICTIONARY name ( TEMPLATE = template [, option = value [, ... ]] ) test=# h CREATE TEXT SEARCH TEMPLATE Command: CREATE TEXT SEARCH TEMPLATE Description: define a new text search template Syntax: CREATE TEXT SEARCH TEMPLATE name ( [ INIT = init_function , ] LEXIZE = lexize_function )
  • 60. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - What does it take to organize a btree? Operator Strategy number < 1 <= 2 = 3 >= 4 < 5
  • 61. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Why care? - The way numbers are treated is pretty “common” - How about sorting this one? “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 62. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Defining indexing strategies - We can write our own operators - Those operators can be assigned to an operator class, which will tell the index how to “behave” “2305 09 04 78” “4353 07 06 77” => it seems the sort order is correct as shown => it isn't – it is an Austrian social security number => 1977 was before 1978 and not other way round
  • 63. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (1): test=# CREATE OR REPLACE FUNCTION normalize_si(text) RETURNS text AS $$ BEGIN RETURN substring($1, 9, 2) || substring($1, 7, 2) || substring($1, 5, 2) || substring($1, 1, 4); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; CREATE FUNCTION test=# SELECT normalize_si('2305090478'); normalize_si -------------- 7804092305 (1 row)
  • 64. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Writing an operator (2): test=# CREATE OR REPLACE FUNCTION si_lt(text, text) RETURNS boolean AS $$ BEGIN RETURN normalize_si($1) < normalize_si($2); END; $$ LANGUAGE 'plpgsql' IMMUTABLE; test=# CREATE OPERATOR <# ( PROCEDURE=si_lt, LEFTARG=text, RIGHTARG=text); CREATE OPERATOR CREATE FUNCTION test=# SELECT '2305090478'::text <# '4353070677'::text; ?column? ---------- f (1 row)
  • 65. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 9. Operator classes - Creating the operator class: - write operators for all operations needed - write “support functions” (= “same”, etc.) - make sure that the most important strategies have proper operators test=# h CREATE OPERATOR CLASS Command: CREATE OPERATOR CLASS Description: define a new operator class Syntax: CREATE OPERATOR CLASS name [ DEFAULT ] FOR TYPE data_type USING index_method [ FAMILY family_name ] AS { OPERATOR strategy_number operator_name [ ( op_type, op_type ) ] [ FOR SEARCH | FOR ORDER BY sort_family_name ] | FUNCTION support_number [ ( op_type [ , op_type ] ) ] function_name ( argument_type [, ...] ) | STORAGE storage_type } [, ... ]
  • 66. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - pg_trgm - Trigrams are perfect to perform fuzzy matching - Trigrams can be used nicely along with KNN-search - pg_trgm is available as extension to PostgreSQL test=# CREATE EXTENSION pg_trgm; CREATE EXTENSION - Problem: “What is the proper way to spell the name of this village? “gramatneusiedl” vs. “grammatneusiedel”?
  • 67. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm test=# CREATE TABLE t_search AS SELECT relname::text FROM pg_class; SELECT 303 test=# CREATE INDEX idx_trgm ON t_search USING gist(relname gist_trgm_ops); CREATE INDEX
  • 68. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - Testing pg_trgm (2): test=# SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; relname | ?column? --------------------------------+---------- pg_class | 0.454545 pg_opclass | 0.538462 pg_class_oid_index | 0.714286 pg_opclass_oid_index | 0.727273 pg_class_relname_nsp_index | 0.793103 pg_opclass_am_name_nsp_index | 0.8 pg_seclabel | 0.823529 pg_am | 0.833333 pg_seclabels | 0.833333 pg_shseclabel | 0.842105 (10 rows)
  • 69. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 10. Available operator classes - KNN in action: test=# explain SELECT *, 'pgclass' <-> relname FROM t_search ORDER BY 'pgclass' <-> relname LIMIT 10; QUERY PLAN ----------------------------------------------------------------------------------- Limit (cost=0.14..1.40 rows=10 width=19) -> Index Scan using idx_trgm on t_search (cost=0.14..38.20 rows=303 width=19) Order By: (relname <-> 'pgclass'::text) (3 rows)
  • 70. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - LIKE can be indexed in some cases: - The PostgreSQL optimizer can rewrite queries featuring LIKE in a fancy and efficient way => The goal is to find the “next character” in line and query for a range - This kind of rewrite only works when the next character Is actually knows to PostgreSQL - Special operator classes might be needed => varchar_pattern_ops, text_pattern_ops
  • 71. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 11. Traditional LIKE - An example: test=# CREATE INDEX idx_relname ON t_search (relname); CREATE INDEX test=# SET enable_seqscan TO off; SET test=# explain SELECT relname FROM t_search WHERE relname LIKE 'abc%'; QUERY PLAN ---------------------------------------------------------------------------------- Index Only Scan using idx_relname on t_search (cost=0.27..8.29 rows=1 width=19) Index Cond: ((relname >= 'abc'::text) AND (relname < 'abd'::text)) Filter: (relname ~~ 'abc%'::text) (3 rows)
  • 72. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de 12. Indexing MIN / MAX - An example: - MIN / MAX works by reading the index from left and right (backward scan) test=# explain SELECT min(relname), max(relname) FROM t_search; QUERY PLAN ---------------------------------------------------------------------------------- Result (cost=0.74..0.75 rows=1 width=0) InitPlan 1 (returns $0) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan using idx_relname on t_search (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) InitPlan 2 (returns $1) -> Limit (cost=0.27..0.37 rows=1 width=19) -> Index Only Scan Backward using idx_relname on t_search t_search_1 (cost=0.27..29.57 rows=303 width=19) Index Cond: (relname IS NOT NULL) (9 rows)
  • 73. Cybertec Schönig & Schönig GmbH Hans-Jürgen Schönig, www.postgresql-support.de Any question? Thanks you for your attention Any question?