SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
GIN in 9.4 and further
Heikki Linnakangas, Alexander Korotkov, Oleg Bartunov
May 23, 2014
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Two GIN applications
● Full-text search
– tsvector @@ tsquery
– Indexing tsvector data type
● Hstore
– (key,value) storage
– Indexing keys, values
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
FTS in PostgreSQL
● Full integration with PostgreSQL
● 27 built-in configurations for 10 languages
● Support of user-defined FTS configurations
● Pluggable dictionaries ( ispell, snowball, thesaurus ),
parsers
● Relevance ranking
● GiST and GIN indexes with concurrency and recovery
support
● Rich query language with query rewriting support‫‏‬
It's cool, but we want faster FTS !
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
ACID overhead is really big :(
● Foreign solutions: Sphinx, Solr,
Lucene....
– Crawl database and index (time lag)
– No access to attributes
– Additional complexity
– BUT: Very fast !
Can we improve native FTS ?
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Can we improve native FTS ?
156676 Wikipedia articles:
postgres=# explain analyze
SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank
FROM ti2
WHERE text_vector @@ to_tsquery('english', 'title')
ORDER BY rank DESC
LIMIT 3;
Limit (cost=8087.40..8087.41 rows=3 width=282) (actual time=433.750..433.752 rows=
-> Sort (cost=8087.40..8206.63 rows=47692 width=282)
(actual time=433.749..433.749 rows=3 loops=1)
Sort Key: (ts_rank(text_vector, '''titl'''::tsquery))
Sort Method: top-N heapsort Memory: 25kB
-> Bitmap Heap Scan on ti2 (cost=529.61..7470.99 rows=47692 width=282)
(actual time=15.094..423.452 rows=47855 loops=1)
Recheck Cond: (text_vector @@ '''titl'''::tsquery)
-> Bitmap Index Scan on ti2_index (cost=0.00..517.69 rows=47692 wi
(actual time=13.736..13.736 rows=47855 loops=1)
Index Cond: (text_vector @@ '''titl'''::tsquery)
Total runtime: 433.787 ms
HEAP IS SLOW
400 ms !
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Can we improve native FTS ?
156676 Wikipedia articles:
postgres=# explain analyze
SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank
FROM ti2
WHERE text_vector @@ to_tsquery('english', 'title')
ORDER BY rank DESC
LIMIT 3;
What if we have this plan ?
Limit (cost=20.00..21.65 rows=3 width=282) (actual time=18.376..18.427 rows=3 loop
-> Index Scan using ti2_index on ti2 (cost=20.00..26256.30 rows=47692 width=28
(actual time=18.375..18.425 rows=3 loops=1)
Index Cond: (text_vector @@ '''titl'''::tsquery)
Order By: (text_vector >< '''titl'''::tsquery)
Total runtime: 18.511 ms
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Can we improve native FTS ?
156676 Wikipedia articles:
postgres=# explain analyze
SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank
FROM ti2
WHERE text_vector @@ to_tsquery('english', 'title')
ORDER BY rank DESC
LIMIT 3;
18.511 ms vs 433.787 ms
We'll be FINE !
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
6.7 mln classifieds
9.3 9.3+patch 9.3+patch
functional
index
Sphinx
Table size 6.0 GB 6.0 GB 2.87 GB -
Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB
Index build
time
216 sec 303 sec 718sec 180 sec*
Queries in 8
hours
3,0 mln. 42.7 mln. 42.7 mln. 32.0 mln.
WOW !!!
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
20 mln descriptions
9.3 9.3+ patch 9.3+ patch
functional
index
Sphinx
Table size 18.2 GB 18.2 GB 11.9 GB -
Index size 2.28 GB 2.30 GB 2.30 GB 3.09 GB
Index build
time
258 sec 684 sec 1712 sec 481 sec*
Queries in 8
hours
2.67 mln. 38.7 mln. 38.7 mln. 26.7 mln.
WOW !!!
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Hstore
● Data
– 1,252,973 bookmarks from Delicious in json format
● Search, contains operator @>
– select count(*) from hs where h @> 'tags=>{{term=>NYC}}';
0.98 s (seq) vs 0.1 s (GIN) → We want faster operation !
● Observation
– GIN indexes separately keys and values
– Key 'tags' is very frequent -1138532,
value '{{term=>NYC}}' is rare — 285
– Current GIN: time (freq & rare) ~ time(freq)
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
Hstore
● Observation
– GIN indexes separately keys and values
– Key 'tags' is very frequent -1138532,
value '{{term=>NYC}}' is rare — 285
– Current GIN: time (freq & rare) ~ time(freq)
● What if GIN supports
– time (freq & rare) ~ time(rare)
=# select count(*) from hs where h::hstore @> 'tags=>{{term=>NYC}}'::hstore;
count
-------
285
(1 row)
Time: 17.372 ms
0.98 s (seq) vs 0.1 s (GIN) vs 0.017 s (GIN++)
Alexander Korotkov Next generation of GIN
Oleg Bartunov PGConf.EU-2013, Dublin
These two examples motivate
GIN improvements !
Oleg Bartunov Full-text search in PostgreSQL in milliseconds
Alexander Korotkov PGConf.EU-2012, Prague
GIN History
● Introduced at PostgreSQL Anniversary
Meeting in Toronto, Jul 7-8, 2006 by Oleg
Bartunov and Teodor Sigaev
Oleg Bartunov Full-text search in PostgreSQL in milliseconds
Alexander Korotkov PGConf.EU-2012, Prague
GIN History
● Introduced at PostgreSQL Anniversary
Meeting in Toronto, Jul 7-8, 2006 by Oleg
Bartunov and Teodor Sigaev
● Supported by JFG Networks (France)
● «Gin stands for Generalized Inverted iNdex
and should be considered as a genie, not a
drink.»
● Alexander Korotkov, Heikki Linnakangas
have joined GIN++ development in 2013
Oleg Bartunov Full-text search in PostgreSQL in milliseconds
Alexander Korotkov PGConf.EU-2012, Prague
GIN History
TODO
----
Nearest future:
* Opclasses for all types (no programming, just many catalog changes).
Distant future:
* Replace B-tree of entries to something like GiST (VODKA ! 2014)
* Add multicolumn support
* Optimize insert operations (background index insertion)
● From GIN Readme, 2006
Two major improvements
1. Compressed posting lists
Makes GIN indexes smaller. Smaller is better.
2. When combining two or more search keys, skip over items
that match one key but not the other.
Makes “rare & frequent” type queries much faster.
Part 1: Compressed posting lists
create extension btree_gin;
create table numbers (n int4);
insert into numbers
select g % 10 from generate_series(1, 10000000) g;
create index numbers_btree on numbers (n);
create index numbers_gin on numbers using gin (n);
Example table
select n, count(*) from numbers group by n order by n;
n | count
---+---------
0 | 1000000
1 | 1000000
2 | 1000000
3 | 1000000
4 | 1000000
5 | 1000000
6 | 1000000
7 | 1000000
8 | 1000000
9 | 1000000
(10 rows)
GIN index size, 9.4 vs 9.3
9.3:
postgres=# di+
Schema | Name | ... | Size | ...
--------+---------------+-----+--------+-----
public | numbers_btree | | 214 MB |
public | numbers_gin | | 58 MB |
9.4:
public | numbers_btree | | 214 MB |
public | numbers_gin | | 11 MB |
5x smaller!
How did we achieve this?
GIN is just a B-tree, with efficient storage of duplicates.
Pointers to the heap tuples are stored as ItemPointers
Each item pointer consists of a block number, and offset
within the block
6 bytes in total, 4 bytes for block number and 2 for offset
presented in text as (, )
Example of one index tuple:
key item pointer
FOOBAR: (12, 5)
GIN structure (version 9.3)
B-tree page:
FOOBAR: (12, 5)
FOOBAR: (12, 6)
FOOBAR: (12, 7)
FOOBAR: (13, 2)
GIN (entry tree) page:
FOOBAR: (12, 5), (12, 6), (12, 7), (13, 2)
GIN stores the key only once, making it efficient for storing
duplicates
GIN entry tuple in version 9.3
FOOBAR: (12, 5), (12, 6), (12, 7), (13, 2)
Tuple size:
size (bytes) description
6 fixed index tuple overhead
7 key, 6 characters + 1 byte header
24 item pointers, 4 * 6 bytes
37 bytes in total
GIN entry tuple in version 9.4
The list of item pointers is sorted (as in 9.3)
The first item pointer in the list is stored normally.
Subsequent items are stored as a delta from previous item
9.3: (12, 5) (12, 6) (13, 2) (13, 4)
9.4: (12, 5) +1 +1 +2044 +2
Varbyte encoding
most deltas are small integers.
varbyte encoding uses variable number of bytes for each
integer.
single byte for a value < 128
0XXXXXXX
1XXXXXXX 0XXXXYYY
1XXXXXXX 1XXXXYYY 0YYYYYYY
1XXXXXXX 1XXXXYYY 1YYYYYYY 0YYYYYYY
1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 0YYYYYYY
1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY
YYYYYYYY
Each item needs 1-6 bytes (was always 6 bytes before)
Compressed posting lists
When a GIN index tuple grows too large to fit on page, a
separate tree called posting tree is created
the posting tree is essentially a huge list of item pointers
Posting list compression applies to both in-line posting lists
and posting tree pages.
Compression algorithm
Plenty of encoding schemes for sorted lists of integers in
literature
Experimented with Simple9
was somewhat more compact than the chosen varbyte encoding
Could use a (compressed) bitmap
would turn GIN into a bitmap index
With the chosen algorithm, removing an item never increases index
size.
VACUUM doesn’t cause you to run out of disk space.
Summary
Best case example:
Table 346 MB
B-tree index 214 MB
GIN (9.3) 58 MB
GIN (9.4) 11 MB
The new code can read old-format pages, so pg upgrade works
but you won’t get the benefit until you REINDEX.
More expensive to do random updates
but GIN isn’t very fast with random updates anyway. . .
Part 2: “rare & frequent” queries got faster
Three fundamental GIN operations:
1. Extract keys from a value to insert or query
2. Store them on disk
3. Combine matches of several keys efficiently, and
determine which items match the overall query
Consistent function
select plainto_tsquery(
’an advanced PostgreSQL open source database’);
plainto_tsquery
--------------------------------------------------------
’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’
(1 row)
select * from foo where col @@ plainto_tsquery(
’an advanced PostgreSQL open source database’
)
3. Combine matches efficiently (0/4)
The query returns the following matches from the index:
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
3. Combine matches efficiently (1/4)
(0,1) contains only word “sourc” -> no match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
3. Combine matches efficiently (2/4)
(0,2) contains words “open” and “sourc” -> no match
advanc databas open postgresql sourc
(0,9) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
3. Combine matches efficiently (3/4)
(0,3) contains word “databas” -> no match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
3. Combine matches efficiently (4/4)
(0,8) contains all the words -> match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
9.4 is smarter
Instead of scanning through the posting lists of all the keywords,
only scan through the list with fewest items, and skip the other
lists to the next possible match.
Big improvement for “frequent-term AND rare-term” style
queries
9.4 example (1/3)
The item lists with the fewest items (estimated) are scanned first.
postgresql databas open advanc sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,41) (0,8) (0,8) (0,14) (0,2)
(0,43) (0,30) (0,17) (0,8)
(0,47) (0,33) (0,22) (0,12)
(1,32) (0,36) (0,26) (0,13)
(0,44) (0,33) (0,18)
(0,46) (0,34) (0,19)
(0,56) (0,35) (0,20)
(1,4) (0,45) (0,26)
(1,22) (0,47) (0,34)
9.4 example (2/3)
(0,8) contains all the words -> match
postgresql databas open advanc sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,41) (0,8) (0,8) (0,14) (0,2)
(0,43) (0,30) (0,17) (0,8)
(0,47) (0,33) (0,22) (0,12)
(1,32) (0,36) (0,26) (0,13)
(0,44) (0,33) (0,18)
(0,46) (0,34) (0,19)
(0,56) (0,35) (0,20)
(1,4) (0,45) (0,26)
(1,22) (0,47) (0,34)
9.4 example (3/3)
(0,41) has no match for the “databas” word -> ignore
postgresql databas open advanc sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,41) (0,8) (0,8) (0,14) (0,2)
*(0,43) (0,30) (0,17) (0,8)
(0,47) (0,33) (0,22) (0,12)
(1,32) (0,36) (0,26) (0,13)
(0,44) (0,33) (0,18)
(0,46) (0,34) (0,19)
(0,56) (0,35) (0,20)
(1,4) (0,45) (0,26)
(1,22) (0,47) (0,34)
Tri-consistent
How does the system know that all the items have to match?
select plainto_tsquery(
’an advanced PostgreSQL open source database’);
plainto_tsquery
--------------------------------------------------------
’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’
(1 row)
New support function, tri-consistent that an operator class
can define.
Takes tri-state input for each item: TRUE, FALSE, MAYBE
Multi-key searches
SELECT * FROM foo
WHERE col @@ plainto_tsquery(’PostgreSQL’)
AND col @@ plainto(’open source database’);
When multiple search conditions are ANDed in the query, they
must all match
Part 3: Misc improvements
In addition to the two big improvements, the code went through a
lot of refactoring
made crash recovery more robust
The future
Alexander submitted a third patch that was not accepted
store additional information per item
speeds up certain queries, but makes indexes much larger
Planner doesn’t know that the “rare & frequent” optimization
Various micro-optimizations:
Represent ItemPointers as 64-bit integers for fast comparisons
build a truth-table of the “consistent” function, to avoid
function call overhead.
VODKA?
Final GIN tip
GIN indexes are efficient at storing duplicates
Use a GIN index using btree gin extension for status-fields etc.
postgres=# di+
List of relations
Schema | Name | ... | Size | ...
--------+---------------+-----+--------+-----
public | numbers_btree | | 214 MB |
public | numbers_gin | | 11 MB |
(2 rows)
Questions?

Más contenido relacionado

La actualidad más candente

Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
Gael Varoquaux
 
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
zaidinvisible
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic Web
Kostis Kyzirakos
 

La actualidad más candente (20)

C07.heaps
C07.heapsC07.heaps
C07.heaps
 
Crunching Gigabytes Locally
Crunching Gigabytes LocallyCrunching Gigabytes Locally
Crunching Gigabytes Locally
 
Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities Simple representations for learning: factorizations and similarities
Simple representations for learning: factorizations and similarities
 
Ijmsr 2016-05
Ijmsr 2016-05Ijmsr 2016-05
Ijmsr 2016-05
 
PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)PyTorch 튜토리얼 (Touch to PyTorch)
PyTorch 튜토리얼 (Touch to PyTorch)
 
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
Exceeding Classical: Probabilistic Data Structures in Data Intensive Applicat...
 
Probabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. FrequencyProbabilistic data structures. Part 3. Frequency
Probabilistic data structures. Part 3. Frequency
 
Speaker Diarization
Speaker DiarizationSpeaker Diarization
Speaker Diarization
 
A survey on Fully Homomorphic Encryption
A survey on Fully Homomorphic EncryptionA survey on Fully Homomorphic Encryption
A survey on Fully Homomorphic Encryption
 
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
Aes cryptography algorithm based on intelligent blum blum-shub prn gs publica...
 
Python Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time AnalysisPython Coding Examples for Drive Time Analysis
Python Coding Examples for Drive Time Analysis
 
Representing and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic WebRepresenting and Querying Geospatial Information in the Semantic Web
Representing and Querying Geospatial Information in the Semantic Web
 
Probabilistic data structures
Probabilistic data structuresProbabilistic data structures
Probabilistic data structures
 
Application Note: Wireless Pedestrian Dead Reckoning with "oblu"
Application Note: Wireless Pedestrian Dead Reckoning with "oblu"Application Note: Wireless Pedestrian Dead Reckoning with "oblu"
Application Note: Wireless Pedestrian Dead Reckoning with "oblu"
 
Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?Data warehouse or conventional database: Which is right for you?
Data warehouse or conventional database: Which is right for you?
 
Deep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent spaceDeep Convolutional GANs - meaning of latent space
Deep Convolutional GANs - meaning of latent space
 
Bin packing
Bin packingBin packing
Bin packing
 
Applications of datastructures
Applications of datastructuresApplications of datastructures
Applications of datastructures
 
Profiling in Python
Profiling in PythonProfiling in Python
Profiling in Python
 
Computer programming 2 Lesson 14
Computer programming 2  Lesson 14Computer programming 2  Lesson 14
Computer programming 2 Lesson 14
 

Destacado

Krome Technology - Corporate Brochure
Krome Technology - Corporate BrochureKrome Technology - Corporate Brochure
Krome Technology - Corporate Brochure
Krome Technology
 

Destacado (12)

New York Fashion Week Hair Favorites (Part 3)
New York Fashion Week Hair Favorites (Part 3)New York Fashion Week Hair Favorites (Part 3)
New York Fashion Week Hair Favorites (Part 3)
 
Using Automation and Orchestration in IT service provision
Using Automation and Orchestration in IT service provisionUsing Automation and Orchestration in IT service provision
Using Automation and Orchestration in IT service provision
 
Destination Saver Corporate benefits
Destination Saver Corporate benefitsDestination Saver Corporate benefits
Destination Saver Corporate benefits
 
статистика обращений предпринимателей ноябрь 2014 1
статистика обращений предпринимателей ноябрь 2014 1статистика обращений предпринимателей ноябрь 2014 1
статистика обращений предпринимателей ноябрь 2014 1
 
Aryabhata-An Indian Mathemagician
 Aryabhata-An Indian Mathemagician Aryabhata-An Indian Mathemagician
Aryabhata-An Indian Mathemagician
 
CHANGE MANAGEMENT DEMYSTIFIED IN 5 EASY STEPS
CHANGE MANAGEMENT DEMYSTIFIED IN 5 EASY STEPSCHANGE MANAGEMENT DEMYSTIFIED IN 5 EASY STEPS
CHANGE MANAGEMENT DEMYSTIFIED IN 5 EASY STEPS
 
GRIN-Global Status I, INIAV 2016 February
GRIN-Global Status I, INIAV 2016 FebruaryGRIN-Global Status I, INIAV 2016 February
GRIN-Global Status I, INIAV 2016 February
 
Krome Technology - Corporate Brochure
Krome Technology - Corporate BrochureKrome Technology - Corporate Brochure
Krome Technology - Corporate Brochure
 
Instituto Pedagógico Montessori
Instituto Pedagógico Montessori Instituto Pedagógico Montessori
Instituto Pedagógico Montessori
 
E safety assembly
E safety assemblyE safety assembly
E safety assembly
 
Resume
ResumeResume
Resume
 
android Rajeshppt
android Rajeshpptandroid Rajeshppt
android Rajeshppt
 

Similar a PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коротков

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Ontico
 
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
HappyDev
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
Ashish Garg
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Databricks
 

Similar a PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коротков (20)

Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
Полнотекстовый поиск в PostgreSQL за миллисекунды (Олег Бартунов, Александр К...
 
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
2015-12-05 Александр Коротков, Иван Панченко - Слабо-структурированные данные...
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
Tutorial-on-DNN-07-Co-design-Precision.pdf
Tutorial-on-DNN-07-Co-design-Precision.pdfTutorial-on-DNN-07-Co-design-Precision.pdf
Tutorial-on-DNN-07-Co-design-Precision.pdf
 
TiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architectureTiReX: Tiled Regular eXpression matching architecture
TiReX: Tiled Regular eXpression matching architecture
 
Ashish garg research paper 660_CamReady
Ashish garg research paper 660_CamReadyAshish garg research paper 660_CamReady
Ashish garg research paper 660_CamReady
 
High Performance Python - Marc Garcia
High Performance Python - Marc GarciaHigh Performance Python - Marc Garcia
High Performance Python - Marc Garcia
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
 
20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English20181212 - PGconfASIA - LT - English
20181212 - PGconfASIA - LT - English
 
Parallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDAParallel Implementation of K Means Clustering on CUDA
Parallel Implementation of K Means Clustering on CUDA
 
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
Improving Top-K Retrieval Algorithms Using Dynamic Programming and Longer Ski...
 
Gbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgisGbroccolo foss4 guk2016_brin4postgis
Gbroccolo foss4 guk2016_brin4postgis
 
20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN20180920_DBTS_PGStrom_EN
20180920_DBTS_PGStrom_EN
 
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
Типы данных JSONb, соответствующие индексы и модуль jsquery – Олег Бартунов, ...
 
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander KorotkovPostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
PostgreSQL Moscow Meetup - September 2014 - Oleg Bartunov and Alexander Korotkov
 
10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL10 Reasons to Start Your Analytics Project with PostgreSQL
10 Reasons to Start Your Analytics Project with PostgreSQL
 
Better Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQLBetter Full Text Search in PostgreSQL
Better Full Text Search in PostgreSQL
 
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache S...
 
February 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDBFebruary 2016 Webinar Series - Introduction to DynamoDB
February 2016 Webinar Series - Introduction to DynamoDB
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 

Más de pgdayrussia

PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
pgdayrussia
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
pgdayrussia
 
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил ТюринPG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
pgdayrussia
 
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
pgdayrussia
 
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр КоротковPG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
pgdayrussia
 
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
pgdayrussia
 
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор СигаевPG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
pgdayrussia
 

Más de pgdayrussia (12)

PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus HaganderPG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
PG Day'14 Russia, Secure PostgreSQL Deployment, Magnus Hagander
 
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
PG Day'14 Russia, Работа со слабо-структурированными данными в PostgreSQL, Ол...
 
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав КовальPG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
PG Day'14 Russia, Социальная сеть, которая просто работает, Владислав Коваль
 
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил ТюринPG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
PG Day'14 Russia, PostgreSQL в avito.ru, Михаил Тюрин
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 3...
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 2...
 
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
PG Day'14 Russia, PostgreSQL как платформа для разработки приложений, часть 1...
 
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki LinnakangasPG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
PG Day'14 Russia, PostgreSQL System Architecture, Heikki Linnakangas
 
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
PG Day'14 Russia, PostgreSQL: архитектура, настройка и оптимизация, Илья Косм...
 
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр КоротковPG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
PG Day'14 Russia, Индексный поиск по регулярным выражениям, Александр Коротков
 
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
PG Day'14 Russia, Нетрадиционный PostgreSQL: хранение бинарных данных в БД, А...
 
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор СигаевPG Day'14 Russia, Модуль anyarray, Федор Сигаев
PG Day'14 Russia, Модуль anyarray, Федор Сигаев
 

Último

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 

Último (20)

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 

PG Day'14 Russia, GIN — Stronger than ever in 9.4 and further, Александр Коротков

  • 1. GIN in 9.4 and further Heikki Linnakangas, Alexander Korotkov, Oleg Bartunov May 23, 2014
  • 2. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Two GIN applications ● Full-text search – tsvector @@ tsquery – Indexing tsvector data type ● Hstore – (key,value) storage – Indexing keys, values
  • 3. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin FTS in PostgreSQL ● Full integration with PostgreSQL ● 27 built-in configurations for 10 languages ● Support of user-defined FTS configurations ● Pluggable dictionaries ( ispell, snowball, thesaurus ), parsers ● Relevance ranking ● GiST and GIN indexes with concurrency and recovery support ● Rich query language with query rewriting support‫‏‬ It's cool, but we want faster FTS !
  • 4. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin ACID overhead is really big :( ● Foreign solutions: Sphinx, Solr, Lucene.... – Crawl database and index (time lag) – No access to attributes – Additional complexity – BUT: Very fast ! Can we improve native FTS ?
  • 5. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Can we improve native FTS ? 156676 Wikipedia articles: postgres=# explain analyze SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank FROM ti2 WHERE text_vector @@ to_tsquery('english', 'title') ORDER BY rank DESC LIMIT 3; Limit (cost=8087.40..8087.41 rows=3 width=282) (actual time=433.750..433.752 rows= -> Sort (cost=8087.40..8206.63 rows=47692 width=282) (actual time=433.749..433.749 rows=3 loops=1) Sort Key: (ts_rank(text_vector, '''titl'''::tsquery)) Sort Method: top-N heapsort Memory: 25kB -> Bitmap Heap Scan on ti2 (cost=529.61..7470.99 rows=47692 width=282) (actual time=15.094..423.452 rows=47855 loops=1) Recheck Cond: (text_vector @@ '''titl'''::tsquery) -> Bitmap Index Scan on ti2_index (cost=0.00..517.69 rows=47692 wi (actual time=13.736..13.736 rows=47855 loops=1) Index Cond: (text_vector @@ '''titl'''::tsquery) Total runtime: 433.787 ms HEAP IS SLOW 400 ms !
  • 6. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Can we improve native FTS ? 156676 Wikipedia articles: postgres=# explain analyze SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank FROM ti2 WHERE text_vector @@ to_tsquery('english', 'title') ORDER BY rank DESC LIMIT 3; What if we have this plan ? Limit (cost=20.00..21.65 rows=3 width=282) (actual time=18.376..18.427 rows=3 loop -> Index Scan using ti2_index on ti2 (cost=20.00..26256.30 rows=47692 width=28 (actual time=18.375..18.425 rows=3 loops=1) Index Cond: (text_vector @@ '''titl'''::tsquery) Order By: (text_vector >< '''titl'''::tsquery) Total runtime: 18.511 ms
  • 7. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Can we improve native FTS ? 156676 Wikipedia articles: postgres=# explain analyze SELECT docid, ts_rank(text_vector, to_tsquery('english', 'title')) AS rank FROM ti2 WHERE text_vector @@ to_tsquery('english', 'title') ORDER BY rank DESC LIMIT 3; 18.511 ms vs 433.787 ms We'll be FINE !
  • 8. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin 6.7 mln classifieds 9.3 9.3+patch 9.3+patch functional index Sphinx Table size 6.0 GB 6.0 GB 2.87 GB - Index size 1.29 GB 1.27 GB 1.27 GB 1.12 GB Index build time 216 sec 303 sec 718sec 180 sec* Queries in 8 hours 3,0 mln. 42.7 mln. 42.7 mln. 32.0 mln. WOW !!!
  • 9. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin 20 mln descriptions 9.3 9.3+ patch 9.3+ patch functional index Sphinx Table size 18.2 GB 18.2 GB 11.9 GB - Index size 2.28 GB 2.30 GB 2.30 GB 3.09 GB Index build time 258 sec 684 sec 1712 sec 481 sec* Queries in 8 hours 2.67 mln. 38.7 mln. 38.7 mln. 26.7 mln. WOW !!!
  • 10. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Hstore ● Data – 1,252,973 bookmarks from Delicious in json format ● Search, contains operator @> – select count(*) from hs where h @> 'tags=>{{term=>NYC}}'; 0.98 s (seq) vs 0.1 s (GIN) → We want faster operation ! ● Observation – GIN indexes separately keys and values – Key 'tags' is very frequent -1138532, value '{{term=>NYC}}' is rare — 285 – Current GIN: time (freq & rare) ~ time(freq)
  • 11. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin Hstore ● Observation – GIN indexes separately keys and values – Key 'tags' is very frequent -1138532, value '{{term=>NYC}}' is rare — 285 – Current GIN: time (freq & rare) ~ time(freq) ● What if GIN supports – time (freq & rare) ~ time(rare) =# select count(*) from hs where h::hstore @> 'tags=>{{term=>NYC}}'::hstore; count ------- 285 (1 row) Time: 17.372 ms 0.98 s (seq) vs 0.1 s (GIN) vs 0.017 s (GIN++)
  • 12. Alexander Korotkov Next generation of GIN Oleg Bartunov PGConf.EU-2013, Dublin These two examples motivate GIN improvements !
  • 13. Oleg Bartunov Full-text search in PostgreSQL in milliseconds Alexander Korotkov PGConf.EU-2012, Prague GIN History ● Introduced at PostgreSQL Anniversary Meeting in Toronto, Jul 7-8, 2006 by Oleg Bartunov and Teodor Sigaev
  • 14. Oleg Bartunov Full-text search in PostgreSQL in milliseconds Alexander Korotkov PGConf.EU-2012, Prague GIN History ● Introduced at PostgreSQL Anniversary Meeting in Toronto, Jul 7-8, 2006 by Oleg Bartunov and Teodor Sigaev ● Supported by JFG Networks (France) ● «Gin stands for Generalized Inverted iNdex and should be considered as a genie, not a drink.» ● Alexander Korotkov, Heikki Linnakangas have joined GIN++ development in 2013
  • 15. Oleg Bartunov Full-text search in PostgreSQL in milliseconds Alexander Korotkov PGConf.EU-2012, Prague GIN History TODO ---- Nearest future: * Opclasses for all types (no programming, just many catalog changes). Distant future: * Replace B-tree of entries to something like GiST (VODKA ! 2014) * Add multicolumn support * Optimize insert operations (background index insertion) ● From GIN Readme, 2006
  • 16. Two major improvements 1. Compressed posting lists Makes GIN indexes smaller. Smaller is better. 2. When combining two or more search keys, skip over items that match one key but not the other. Makes “rare & frequent” type queries much faster.
  • 17. Part 1: Compressed posting lists create extension btree_gin; create table numbers (n int4); insert into numbers select g % 10 from generate_series(1, 10000000) g; create index numbers_btree on numbers (n); create index numbers_gin on numbers using gin (n);
  • 18. Example table select n, count(*) from numbers group by n order by n; n | count ---+--------- 0 | 1000000 1 | 1000000 2 | 1000000 3 | 1000000 4 | 1000000 5 | 1000000 6 | 1000000 7 | 1000000 8 | 1000000 9 | 1000000 (10 rows)
  • 19. GIN index size, 9.4 vs 9.3 9.3: postgres=# di+ Schema | Name | ... | Size | ... --------+---------------+-----+--------+----- public | numbers_btree | | 214 MB | public | numbers_gin | | 58 MB | 9.4: public | numbers_btree | | 214 MB | public | numbers_gin | | 11 MB | 5x smaller!
  • 20. How did we achieve this? GIN is just a B-tree, with efficient storage of duplicates. Pointers to the heap tuples are stored as ItemPointers Each item pointer consists of a block number, and offset within the block 6 bytes in total, 4 bytes for block number and 2 for offset presented in text as (, ) Example of one index tuple: key item pointer FOOBAR: (12, 5)
  • 21. GIN structure (version 9.3) B-tree page: FOOBAR: (12, 5) FOOBAR: (12, 6) FOOBAR: (12, 7) FOOBAR: (13, 2) GIN (entry tree) page: FOOBAR: (12, 5), (12, 6), (12, 7), (13, 2) GIN stores the key only once, making it efficient for storing duplicates
  • 22. GIN entry tuple in version 9.3 FOOBAR: (12, 5), (12, 6), (12, 7), (13, 2) Tuple size: size (bytes) description 6 fixed index tuple overhead 7 key, 6 characters + 1 byte header 24 item pointers, 4 * 6 bytes 37 bytes in total
  • 23. GIN entry tuple in version 9.4 The list of item pointers is sorted (as in 9.3) The first item pointer in the list is stored normally. Subsequent items are stored as a delta from previous item 9.3: (12, 5) (12, 6) (13, 2) (13, 4) 9.4: (12, 5) +1 +1 +2044 +2
  • 24. Varbyte encoding most deltas are small integers. varbyte encoding uses variable number of bytes for each integer. single byte for a value < 128 0XXXXXXX 1XXXXXXX 0XXXXYYY 1XXXXXXX 1XXXXYYY 0YYYYYYY 1XXXXXXX 1XXXXYYY 1YYYYYYY 0YYYYYYY 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 0YYYYYYY 1XXXXXXX 1XXXXYYY 1YYYYYYY 1YYYYYYY 1YYYYYYY YYYYYYYY Each item needs 1-6 bytes (was always 6 bytes before)
  • 25. Compressed posting lists When a GIN index tuple grows too large to fit on page, a separate tree called posting tree is created the posting tree is essentially a huge list of item pointers Posting list compression applies to both in-line posting lists and posting tree pages.
  • 26. Compression algorithm Plenty of encoding schemes for sorted lists of integers in literature Experimented with Simple9 was somewhat more compact than the chosen varbyte encoding Could use a (compressed) bitmap would turn GIN into a bitmap index With the chosen algorithm, removing an item never increases index size. VACUUM doesn’t cause you to run out of disk space.
  • 27. Summary Best case example: Table 346 MB B-tree index 214 MB GIN (9.3) 58 MB GIN (9.4) 11 MB The new code can read old-format pages, so pg upgrade works but you won’t get the benefit until you REINDEX. More expensive to do random updates but GIN isn’t very fast with random updates anyway. . .
  • 28. Part 2: “rare & frequent” queries got faster Three fundamental GIN operations: 1. Extract keys from a value to insert or query 2. Store them on disk 3. Combine matches of several keys efficiently, and determine which items match the overall query
  • 29. Consistent function select plainto_tsquery( ’an advanced PostgreSQL open source database’); plainto_tsquery -------------------------------------------------------- ’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’ (1 row) select * from foo where col @@ plainto_tsquery( ’an advanced PostgreSQL open source database’ )
  • 30. 3. Combine matches efficiently (0/4) The query returns the following matches from the index: advanc databas open postgresql sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,14) (0,8) (0,8) (0,41) (0,2) (0,17) (0,43) (0,30) (0,8) (0,22) (0,47) (0,33) (0,12) (0,26) (1,32) (0,36) (0,13) (0,33) (0,44) (0,18) (0,34) (0,46) (0,19) (0,35) (0,56) (0,20) (0,45) (1,4) (0,26) (0,47) (1,22) (0,34)
  • 31. 3. Combine matches efficiently (1/4) (0,1) contains only word “sourc” -> no match advanc databas open postgresql sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,14) (0,8) (0,8) (0,41) (0,2) (0,17) (0,43) (0,30) (0,8) (0,22) (0,47) (0,33) (0,12) (0,26) (1,32) (0,36) (0,13) (0,33) (0,44) (0,18) (0,34) (0,46) (0,19) (0,35) (0,56) (0,20) (0,45) (1,4) (0,26) (0,47) (1,22) (0,34)
  • 32. 3. Combine matches efficiently (2/4) (0,2) contains words “open” and “sourc” -> no match advanc databas open postgresql sourc (0,9) (0,3) (0,2) (0,8) (0,1) (0,14) (0,8) (0,8) (0,41) (0,2) (0,17) (0,43) (0,30) (0,8) (0,22) (0,47) (0,33) (0,12) (0,26) (1,32) (0,36) (0,13) (0,33) (0,44) (0,18) (0,34) (0,46) (0,19) (0,35) (0,56) (0,20) (0,45) (1,4) (0,26) (0,47) (1,22) (0,34)
  • 33. 3. Combine matches efficiently (3/4) (0,3) contains word “databas” -> no match advanc databas open postgresql sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,14) (0,8) (0,8) (0,41) (0,2) (0,17) (0,43) (0,30) (0,8) (0,22) (0,47) (0,33) (0,12) (0,26) (1,32) (0,36) (0,13) (0,33) (0,44) (0,18) (0,34) (0,46) (0,19) (0,35) (0,56) (0,20) (0,45) (1,4) (0,26) (0,47) (1,22) (0,34)
  • 34. 3. Combine matches efficiently (4/4) (0,8) contains all the words -> match advanc databas open postgresql sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,14) (0,8) (0,8) (0,41) (0,2) (0,17) (0,43) (0,30) (0,8) (0,22) (0,47) (0,33) (0,12) (0,26) (1,32) (0,36) (0,13) (0,33) (0,44) (0,18) (0,34) (0,46) (0,19) (0,35) (0,56) (0,20) (0,45) (1,4) (0,26) (0,47) (1,22) (0,34)
  • 35. 9.4 is smarter Instead of scanning through the posting lists of all the keywords, only scan through the list with fewest items, and skip the other lists to the next possible match. Big improvement for “frequent-term AND rare-term” style queries
  • 36. 9.4 example (1/3) The item lists with the fewest items (estimated) are scanned first. postgresql databas open advanc sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,41) (0,8) (0,8) (0,14) (0,2) (0,43) (0,30) (0,17) (0,8) (0,47) (0,33) (0,22) (0,12) (1,32) (0,36) (0,26) (0,13) (0,44) (0,33) (0,18) (0,46) (0,34) (0,19) (0,56) (0,35) (0,20) (1,4) (0,45) (0,26) (1,22) (0,47) (0,34)
  • 37. 9.4 example (2/3) (0,8) contains all the words -> match postgresql databas open advanc sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,41) (0,8) (0,8) (0,14) (0,2) (0,43) (0,30) (0,17) (0,8) (0,47) (0,33) (0,22) (0,12) (1,32) (0,36) (0,26) (0,13) (0,44) (0,33) (0,18) (0,46) (0,34) (0,19) (0,56) (0,35) (0,20) (1,4) (0,45) (0,26) (1,22) (0,47) (0,34)
  • 38. 9.4 example (3/3) (0,41) has no match for the “databas” word -> ignore postgresql databas open advanc sourc (0,8) (0,3) (0,2) (0,8) (0,1) (0,41) (0,8) (0,8) (0,14) (0,2) *(0,43) (0,30) (0,17) (0,8) (0,47) (0,33) (0,22) (0,12) (1,32) (0,36) (0,26) (0,13) (0,44) (0,33) (0,18) (0,46) (0,34) (0,19) (0,56) (0,35) (0,20) (1,4) (0,45) (0,26) (1,22) (0,47) (0,34)
  • 39. Tri-consistent How does the system know that all the items have to match? select plainto_tsquery( ’an advanced PostgreSQL open source database’); plainto_tsquery -------------------------------------------------------- ’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’ (1 row) New support function, tri-consistent that an operator class can define. Takes tri-state input for each item: TRUE, FALSE, MAYBE
  • 40. Multi-key searches SELECT * FROM foo WHERE col @@ plainto_tsquery(’PostgreSQL’) AND col @@ plainto(’open source database’); When multiple search conditions are ANDed in the query, they must all match
  • 41. Part 3: Misc improvements In addition to the two big improvements, the code went through a lot of refactoring made crash recovery more robust
  • 42. The future Alexander submitted a third patch that was not accepted store additional information per item speeds up certain queries, but makes indexes much larger Planner doesn’t know that the “rare & frequent” optimization Various micro-optimizations: Represent ItemPointers as 64-bit integers for fast comparisons build a truth-table of the “consistent” function, to avoid function call overhead. VODKA?
  • 43. Final GIN tip GIN indexes are efficient at storing duplicates Use a GIN index using btree gin extension for status-fields etc. postgres=# di+ List of relations Schema | Name | ... | Size | ... --------+---------------+-----+--------+----- public | numbers_btree | | 214 MB | public | numbers_gin | | 11 MB | (2 rows)