SlideShare una empresa de Scribd logo
Privileged and Confidential
Point Field Types in Solr
Evolution of Range Filters
amikryukov@griddynamics.com
Privileged and Confidential
Agenda
1. Recap: From query parser to TopDocsCollector.
2. TermQuery search flow.
3. How Range Filters are implemented?
4. Optimizations for Range Filters.
5. Point Fields.
2
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
3
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer? And why you need a Collector?
4
Privileged and Confidential
Recap: Query execution flow
LeafReader
5
Privileged and Confidential
Recap: From query parser to TopDocsCollector
? What is query parser?
? What is the difference between Query and Scorer?
? What is the difference between TermsEnum and PosingsEnum?
6
Privileged and Confidential
Recap: inverted index, terms, posting list
7
Privileged and Confidential
TermQuery search flow
q=Price:10
8
Privileged and Confidential
TermQuery search flow
q=Price:10
TermQuery(
field=’Price’,
val=’10’)
TermQuery
TermWeight
TermScorer
9
Privileged and Confidential
TermQuery
Idea:
Iterate over posting list of the term `10`
10
q=Price:10
Privileged and Confidential
TermQuery source code
11
Privileged and Confidential
TermScorer source code
12
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]
13
Privileged and Confidential
How Range filters are implemented?
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
14
Privileged and Confidential
MultiTermQuery
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
15
Privileged and Confidential
Naive implementation
term -> document ids
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642
In total = 11 should clauses.
16
Privileged and Confidential
Optimizations for Range Filters
? How can we improve the naive implementation of RangeFilterQuery?Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
17
Privileged and Confidential
Trie
18
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Privileged and Confidential
Trie*Field index time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
Shift 2
Shift 1
Shift 0
19
(since Lucene 2.9)
Privileged and Confidential
Trie*Field query time
Original values
421 -> [1]
423 -> [2]
445 -> [3]
446 -> [3]
448 -> [4]
521 -> [5]
522 -> [7]
632 -> [5]
633 -> [6]
634 -> [7]
641 -> [5]
642 -> [6]
644 -> [7]
Additional values
42* -> [1, 2]
44* -> [3, 4]
52* -> [5, 7]
63* -> [5, 6]
64* -> [5, 6 , 7]
4** -> [1, 2, 3, 4]
5** -> [5, 7]
6** -> [5, 6, 7]
Exploit the Trie*Field
In total = 6 should clauses in the end
20
Privileged and Confidential
Is not it enough? Distribution of terms?
Trie-based approach does not involve distribution of the terms analysis.
q=PRICE:[100 TO 2002222]Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
21
Privileged and Confidential
Is not it enough?
IO efficiency.
We need to store all original and additional values.
We need to read all Terms of the field at search time.
Original values
1 -> [1]
100 -> [2]
2000001 -> [3]
2000022 -> [3]
2000222 -> [4]
2002222 -> [5]
50000005 -> [7]
Additional values
10* -> [2]
1** -> [1, 2]
200002* -> [3]
200022* -> [4]
20002** -> [4]
200**** -> [3, 4, 5]
200222* -> [5]
20022** -> [5]
2002*** -> [5]
22
Privileged and Confidential
Point Fields
This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it
has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0)
● Based on Bkd-Tree: A Dynamic Scalable kd-Tree
Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields
which always index the same precision levels for every value regardless of how the points are
distributed.
● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index
structure to locate the blocks at search time.
● Allows to operate with multi-dimensional points. (Maps, 3D-models).
23
Privileged and Confidential
Bkd-Tree
Binary Space Partitioning tree
B - Blocked
Number of points in the cell = 2
24
Privileged and Confidential
Bkd-Tree adapts to particular distribution
Example from
https://www.elastic.co/blog/lucene-points-6.0
25
Privileged and Confidential
Point Fields: index time
Disk
Heap
Lucene - number of points in cell is 1024.
26
Privileged and Confidential
Point Fields: search time
Disk
Heap
q=PRICE:[100, 2002222]
If block overlaps with the query - we
have to check every term value inside
If block is fully contained within the query -
the documents with values in that cell are
efficiently collected without having to test
each point
27
Privileged and Confidential
Performance testing (Lucene 6.0)
28
Privileged and Confidential
Point Fields
29
Privileged and Confidential
Links
Numeric Range Queries in Lucene/Solr
http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html
Lucene Search Essentials: Scorers, Collectors and Custom Queries
https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13
Multi-dimensional points, coming in Apache Lucene 6.0
https://www.elastic.co/blog/lucene-points-6.0
Bkd-Tree: A Dynamic Scalable kd-Tree
https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf
The Evolution of Lucene & Solr Numerics from Strings to Points
https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points-
presented-by-steve-rowe-lucidworks?from_action=save
30
Privileged and Confidential 31

Más contenido relacionado

La actualidad más candente

PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
Jaemun Jung
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
Alexandre Rafalovitch
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
Dongmin Yu
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
NHN FORWARD
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
Xiang Fu
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
REST Enabling Your Oracle Database
REST Enabling Your Oracle DatabaseREST Enabling Your Oracle Database
REST Enabling Your Oracle Database
Jeff Smith
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
Databricks
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
Databricks
 
ELK Stack
ELK StackELK Stack
ELK Stack
Phuc Nguyen
 
GraalVm and Quarkus
GraalVm and QuarkusGraalVm and Quarkus
GraalVm and Quarkus
Sascha Rodekamp
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
DataWorks Summit
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
Sandeep Patil
 
Google Cloud Composer
Google Cloud ComposerGoogle Cloud Composer
Google Cloud Composer
Pierre Coste
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
Rahul Jain
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
Amit Ghosh
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
lucenerevolution
 

La actualidad más candente (20)

PySpark dataframe
PySpark dataframePySpark dataframe
PySpark dataframe
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Presto anatomy
Presto anatomyPresto anatomy
Presto anatomy
 
[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험[2018] 오픈스택 5년 운영의 경험
[2018] 오픈스택 5년 운영의 경험
 
Real-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache PinotReal-time Analytics with Presto and Apache Pinot
Real-time Analytics with Presto and Apache Pinot
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
REST Enabling Your Oracle Database
REST Enabling Your Oracle DatabaseREST Enabling Your Oracle Database
REST Enabling Your Oracle Database
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
 
Presto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation EnginesPresto on Apache Spark: A Tale of Two Computation Engines
Presto on Apache Spark: A Tale of Two Computation Engines
 
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
 
ELK Stack
ELK StackELK Stack
ELK Stack
 
GraalVm and Quarkus
GraalVm and QuarkusGraalVm and Quarkus
GraalVm and Quarkus
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Scaling Hadoop at LinkedIn
Scaling Hadoop at LinkedInScaling Hadoop at LinkedIn
Scaling Hadoop at LinkedIn
 
Apache Hive Tutorial
Apache Hive TutorialApache Hive Tutorial
Apache Hive Tutorial
 
Google Cloud Composer
Google Cloud ComposerGoogle Cloud Composer
Google Cloud Composer
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
MongoDB Aggregation
MongoDB Aggregation MongoDB Aggregation
MongoDB Aggregation
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 

Similar a Point field types in Solr. Evolution of the Range Queries.

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
Jungsu Heo
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
j9soto
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
DataArt
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
Sease
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
shradha ambekar
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
Alexander Tokarev
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
Jungsu Heo
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)
Jason Huynh
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
Jonathan Levin
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
Itamar Haber
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
thelabdude
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
Bob Cotton
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
MongoDB
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive
DataArt
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
Neo4j
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
jexp
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
Lucidworks
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Vimal Das Kammath
 

Similar a Point field types in Solr. Evolution of the Range Queries. (20)

Web analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.comWeb analytics at scale with Druid at naver.com
Web analytics at scale with Druid at naver.com
 
Writing efficient sql
Writing efficient sqlWriting efficient sql
Writing efficient sql
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Interactive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval MeetupInteractive Questions and Answers - London Information Retrieval Meetup
Interactive Questions and Answers - London Information Retrieval Meetup
 
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100xOscon 2019 - Optimizing analytical queries on Cassandra by 100x
Oscon 2019 - Optimizing analytical queries on Cassandra by 100x
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Druid at naver.com - part 1
Druid at naver.com - part 1Druid at naver.com - part 1
Druid at naver.com - part 1
 
OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)OQL querying and indexes with Apache Geode (incubating)
OQL querying and indexes with Apache Geode (incubating)
 
Scaling MySQL Strategies for Developers
Scaling MySQL Strategies for DevelopersScaling MySQL Strategies for Developers
Scaling MySQL Strategies for Developers
 
What's new in Redis v3.2
What's new in Redis v3.2What's new in Redis v3.2
What's new in Redis v3.2
 
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
DataStax | DSE Search 5.0 and Beyond (Nick Panahi & Ariel Weisberg) | Cassand...
 
Benchmarking Solr Performance at Scale
Benchmarking Solr Performance at ScaleBenchmarking Solr Performance at Scale
Benchmarking Solr Performance at Scale
 
Time series denver an introduction to prometheus
Time series denver   an introduction to prometheusTime series denver   an introduction to prometheus
Time series denver an introduction to prometheus
 
MongoDB Roadmap
MongoDB RoadmapMongoDB Roadmap
MongoDB Roadmap
 
SQL Server Deep Drive
SQL Server Deep Drive SQL Server Deep Drive
SQL Server Deep Drive
 
Top 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & TricksTop 10 Cypher Tuning Tips & Tricks
Top 10 Cypher Tuning Tips & Tricks
 
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptxGraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
GraphConnect 2022 - Top 10 Cypher Tuning Tips & Tricks.pptx
 
Benchmarking Solr Performance
Benchmarking Solr PerformanceBenchmarking Solr Performance
Benchmarking Solr Performance
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data FormatIntroducing Apache Carbon Data - Hadoop Native Columnar Data Format
Introducing Apache Carbon Data - Hadoop Native Columnar Data Format
 

Último

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
mz5nrf0n
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
Green Software Development
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
Bert Jan Schrijver
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
Alina Yurenko
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
Peter Muessig
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
mz5nrf0n
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
rodomar2
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
Sven Peters
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
Marcin Chrost
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
Octavian Nadolu
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
Green Software Development
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
Karya Keeper
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
Grant Fritchey
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
sjcobrien
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
AnkitaPandya11
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
YousufSait3
 

Último (20)

原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
原版定制美国纽约州立大学奥尔巴尼分校毕业证学位证书原版一模一样
 
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, FactsALGIT - Assembly Line for Green IT - Numbers, Data, Facts
ALGIT - Assembly Line for Green IT - Numbers, Data, Facts
 
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
J-Spring 2024 - Going serverless with Quarkus, GraalVM native images and AWS ...
 
All you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVMAll you need to know about Spring Boot and GraalVM
All you need to know about Spring Boot and GraalVM
 
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling ExtensionsUI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
UI5con 2024 - Boost Your Development Experience with UI5 Tooling Extensions
 
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
在线购买加拿大英属哥伦比亚大学毕业证本科学位证书原版一模一样
 
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CDKuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
KuberTENes Birthday Bash Guadalajara - Introducción a Argo CD
 
Microservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we workMicroservice Teams - How the cloud changes the way we work
Microservice Teams - How the cloud changes the way we work
 
Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !Enums On Steroids - let's look at sealed classes !
Enums On Steroids - let's look at sealed classes !
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
Artificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension FunctionsArtificia Intellicence and XPath Extension Functions
Artificia Intellicence and XPath Extension Functions
 
Energy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina JonuziEnergy consumption of Database Management - Florina Jonuzi
Energy consumption of Database Management - Florina Jonuzi
 
Project Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdfProject Management: The Role of Project Dashboards.pdf
Project Management: The Role of Project Dashboards.pdf
 
Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
Using Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query PerformanceUsing Query Store in Azure PostgreSQL to Understand Query Performance
Using Query Store in Azure PostgreSQL to Understand Query Performance
 
Malibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed RoundMalibou Pitch Deck For Its €3M Seed Round
Malibou Pitch Deck For Its €3M Seed Round
 
fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.fiscal year variant fiscal year variant.
fiscal year variant fiscal year variant.
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
zOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL DifferenceszOS Mainframe JES2-JES3 JCL-JECL Differences
zOS Mainframe JES2-JES3 JCL-JECL Differences
 

Point field types in Solr. Evolution of the Range Queries.

  • 1. Privileged and Confidential Point Field Types in Solr Evolution of Range Filters amikryukov@griddynamics.com
  • 2. Privileged and Confidential Agenda 1. Recap: From query parser to TopDocsCollector. 2. TermQuery search flow. 3. How Range Filters are implemented? 4. Optimizations for Range Filters. 5. Point Fields. 2
  • 3. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? 3
  • 4. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? And why you need a Collector? 4
  • 5. Privileged and Confidential Recap: Query execution flow LeafReader 5
  • 6. Privileged and Confidential Recap: From query parser to TopDocsCollector ? What is query parser? ? What is the difference between Query and Scorer? ? What is the difference between TermsEnum and PosingsEnum? 6
  • 7. Privileged and Confidential Recap: inverted index, terms, posting list 7
  • 8. Privileged and Confidential TermQuery search flow q=Price:10 8
  • 9. Privileged and Confidential TermQuery search flow q=Price:10 TermQuery( field=’Price’, val=’10’) TermQuery TermWeight TermScorer 9
  • 10. Privileged and Confidential TermQuery Idea: Iterate over posting list of the term `10` 10 q=Price:10
  • 13. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642] 13
  • 14. Privileged and Confidential How Range filters are implemented? term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 14
  • 15. Privileged and Confidential MultiTermQuery term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 15
  • 16. Privileged and Confidential Naive implementation term -> document ids 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] q=PRICE:[423 TO 642]q=PRICE:423 PRICE:445 PRICE:446 … PRICE:642 In total = 11 should clauses. 16
  • 17. Privileged and Confidential Optimizations for Range Filters ? How can we improve the naive implementation of RangeFilterQuery?Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] 17
  • 18. Privileged and Confidential Trie 18 Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7]
  • 19. Privileged and Confidential Trie*Field index time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field Shift 2 Shift 1 Shift 0 19 (since Lucene 2.9)
  • 20. Privileged and Confidential Trie*Field query time Original values 421 -> [1] 423 -> [2] 445 -> [3] 446 -> [3] 448 -> [4] 521 -> [5] 522 -> [7] 632 -> [5] 633 -> [6] 634 -> [7] 641 -> [5] 642 -> [6] 644 -> [7] Additional values 42* -> [1, 2] 44* -> [3, 4] 52* -> [5, 7] 63* -> [5, 6] 64* -> [5, 6 , 7] 4** -> [1, 2, 3, 4] 5** -> [5, 7] 6** -> [5, 6, 7] Exploit the Trie*Field In total = 6 should clauses in the end 20
  • 21. Privileged and Confidential Is not it enough? Distribution of terms? Trie-based approach does not involve distribution of the terms analysis. q=PRICE:[100 TO 2002222]Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] 21
  • 22. Privileged and Confidential Is not it enough? IO efficiency. We need to store all original and additional values. We need to read all Terms of the field at search time. Original values 1 -> [1] 100 -> [2] 2000001 -> [3] 2000022 -> [3] 2000222 -> [4] 2002222 -> [5] 50000005 -> [7] Additional values 10* -> [2] 1** -> [1, 2] 200002* -> [3] 200022* -> [4] 20002** -> [4] 200**** -> [3, 4, 5] 200222* -> [5] 20022** -> [5] 2002*** -> [5] 22
  • 23. Privileged and Confidential Point Fields This feature replaces the now deprecated numeric fields (Trie*Field) and numeric range query since it has better overall performance and is more general - allowing multidimensions. (since Lucene 6.0) ● Based on Bkd-Tree: A Dynamic Scalable kd-Tree Naturally adapt to each data set's particular distribution. In contrast to legacy numeric fields which always index the same precision levels for every value regardless of how the points are distributed. ● Most of the data structure resides in on-disk blocks, with a small in-heap binary tree index structure to locate the blocks at search time. ● Allows to operate with multi-dimensional points. (Maps, 3D-models). 23
  • 24. Privileged and Confidential Bkd-Tree Binary Space Partitioning tree B - Blocked Number of points in the cell = 2 24
  • 25. Privileged and Confidential Bkd-Tree adapts to particular distribution Example from https://www.elastic.co/blog/lucene-points-6.0 25
  • 26. Privileged and Confidential Point Fields: index time Disk Heap Lucene - number of points in cell is 1024. 26
  • 27. Privileged and Confidential Point Fields: search time Disk Heap q=PRICE:[100, 2002222] If block overlaps with the query - we have to check every term value inside If block is fully contained within the query - the documents with values in that cell are efficiently collected without having to test each point 27
  • 28. Privileged and Confidential Performance testing (Lucene 6.0) 28
  • 30. Privileged and Confidential Links Numeric Range Queries in Lucene/Solr http://blog-archive.griddynamics.com/2014/10/numeric-range-queries-in-lucenesolr.html Lucene Search Essentials: Scorers, Collectors and Custom Queries https://www.slideshare.net/lucenerevolution/lucene-search-essentials-scorers-collectors-and-custom-queries-dublin13 Multi-dimensional points, coming in Apache Lucene 6.0 https://www.elastic.co/blog/lucene-points-6.0 Bkd-Tree: A Dynamic Scalable kd-Tree https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf The Evolution of Lucene & Solr Numerics from Strings to Points https://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-numerics-from-strings-to-points- presented-by-steve-rowe-lucidworks?from_action=save 30