SlideShare a Scribd company logo
1 of 26
Download to read offline
Query Parsing
    Tips & Tricks
Presented by Erik Hatcher of LucidWorks




                                          © Copyright 2012
Description

    Interpreting what the user meant and what they ideally
    would like to find is tricky business. This talk will cover
    useful tips and tricks to better leverage and extend
    Solr's analysis and query parsing capabilities to more
    richly parse and interpret user queries.




2
                                                         © Copyright 2012
Abstract

    In this talk, Solr's built-in query parsers will be detailed
    included when and how to use them. Solr has nested
    query parsing capability, allowing for multiple query
    parsers to be used to generate a single query. The
    nested query parsing feature will be described and
    demonstrated. In many domains, e-commerce in
    particular, parsing queries often means interpreting
    which entities (e.g. products, categories, vehicles) the
    user likely means; this talk will conclude with
    techniques to achieve richer query interpretation.




3
                                                          © Copyright 2012
Query Parsers in Solr




4
                            © Copyright 2012
Query Parsers in Solr




5
                            © Copyright 2012
lucene Query Parser, Solr style

    •FieldType awareness
     - range queries, numerics
     - allows date math
     - reverses wildcard terms, if indexing used ReverseWildcardFilter
    •Magic fields
     - _val_: function query injection
     - _query_: nested query, to use a different query parser
    •Multi-term analysis (type="multiterm")
     - Analyzes prefix, wildcard, regex expressions
      »to normalize diacritics, lowercase, etc
     - If not explicitly defined, all MultiTermAwareComponent's from query
       analyzer are used, or KeywordTokenizer for effectively no analysis
    •http://wiki.apache.org/solr/SolrQuerySyntax#lucene


6
                                                                      © Copyright 2012
dismax

    • Simple constrained syntax
     - "supports phrases" +requiredTerms -prohibitedTerms loose terms
    • Spreads terms across specified query fields (qf) and entire query
      string across phrase fields (pf)
     - with field-specific boosting
     - and explicit and implicit phrase slop
     - scores each document with the maximum score for that document as produced
       by any subquery; primary score associated with the highest boost, not the sum
       of the field scores (as BooleanQuery would give)
    • Minimum match (mm) allows query fields gradient between AND
      and OR
     - some number of terms must match, but not all necessarily, and can vary
       depending on number of actual query terms
    • Additive boost queries (bq) and boost functions (bf)
    • Debug output includes parsed boost and function queries


7
                                                                            © Copyright 2012
Specifying the Query Parser

    •defType=parser_name
     - defines main query parser
    •{!parser_name local=param...}expression
     - Can specify parser per query expression
    •These are equivalent:
     - q=FC Schalke 04&defType=dismax&mm=2&qf=name
     - q={!dismax qf=name mm=2}FC Schalke 04
     - q={!dismax qf=name mm=2 v='FC Schalke 04'}




8
                                                     © Copyright 2012
Local Parameter Substitution

    •/document?id=13




9
                                   © Copyright 2012
Nested Query Parsing

     •Leverages the "lucene" query parser's _query_ trick
     •Example:
      - q=_query_:"{!dismax qf='title^2 body' v=$user_query}" AND
          _query_:"{!dismax qf='keywords^5 description^2' v=$topic}"
      - &user_query=hoffenheim schalke
      - &topic=news
     •Setting the complex nested q parameter in a request
      handler can make the client request lean and clean
      - And even qf and other parameters can be substituted:
       »{!dismax qf=$title_qf pf=$title_pf v=$title_query}
       »&title_qf=title^5 subtitle^2...
     •Real world example, Stanford University Libraries:
      - http://searchworks.stanford.edu/advanced
      - Insanely complex sets of nested dismax's and qf/pf settings

10
                                                                      © Copyright 2012
edismax: Extended Dismax Query Parser

     •"An advanced multi-field query parser based on the dismax
      parser"
      - Handles "lucene" syntax as well as dismax features
     •Fields available to user may be limited (uf)
      - including negations and dynamic fields, e.g. uf=* -cost -timestamp
     •Shingles query into 2 and 3 term phrases
      - Improves quality of results when query contains terms across multiple fields
      - pf2/pf3 and ps2/ps3
      - removes stop words from shingled phrase queries
     •multiplicative "boost" functions
     •Additional features
      - Query comprised entirely of "stopwords" optionally allowed
         »if indexed, but query analyzer is set to remove them
      - Allow "lowercaseOperators" by default; or/OR, and/AND


11
                                                                             © Copyright 2012
term Query Parser

     •FieldType aware, no analysis
      - converts to internal representation automatically
     •"raw" query parser is similar
      - though raw parser is not field type aware; no internal representation
        conversion
     •Best practice for filtering on single facet value
      - fq={!term f=facet_field}crazy:value :)
       »no query string escaping needed; but of course still need URL encoding
        when appropriate




12
                                                                           © Copyright 2012
prefix Query Parser

     •No field type awareness
     •{!prefix f=field_name}prefixValue
      - Similar to Lucene query parser field_name:prefixValue*
      - Solr's "lucene" query parser has multiterm analysis capability, but
        the prefix query parser does not analyze




13
                                                                       © Copyright 2012
boost Query Parser

     •Multiplicative to wrapped query score
      - Internally used by edismax "boost"
     •{!boost b=recip(ms(NOW,mydatefield),3.16e-11,1,1)}foo




14
                                                       © Copyright 2012
field Query Parser

     •Same as handling of field:"Some Text" clause by Solr's
      "lucene" query parser
     •FieldType aware
      - TermQuery generated, unless field type has special handling
     •TextField
      - PhraseQuery: if multiple tokens in different positions
      - MultiPhraseQuery: if multiple tokens share some positions
      - BooleanQuery: if multiple terms all in same position
      - TermQuery: if only a single token
     •Other types that handle field queries specially:
      - currency, spatial types (point, latlon, etc)
      - {!field f=location}49.25,8.883333



15
                                                                      © Copyright 2012
surround Query Parser

     •Creates Lucene SpanQuery's for fine-grained proximity
      matching, including use of wildcards
     •Uses infix and prefix notation
      - infix: AND/OR/NOT/nW/nN/()
      - prefix: AND/OR/nW/nN
      - Supports Lucene query parser basics
        »field:value, boost^5, wild?c*rd, prefix*
      - Proximity operators:
        »N: ordered
        »W: unordered
     •No analysis of clauses
      - requires user or search client to lowercase, normalize, etc
     •Example:
      - q={!surround}hoffenheim 4w schalke


16
                                                                      © Copyright 2012
join Query Parser

     •Pseudo-join
      - Field values from inner result set used to map to another field to select final
        result set
      - No information from inner result set carries to final result set, such as scores
        or field values (it's not SQL!)
     •Can join from another local Solr core
      - Allows for different types of entities to be indexed in separate indexes
        altogether, modeled into clean schemas
      - Separate cores can scale independently, especially with commit and
        warming issues
     •Syntax:
      - {!join from=... to=... [fromIndex=core_name]}query
     •For more information:
      - Yonik's Lucene Revolution 2011 presentation: http://vimeo.com/25015101
      - http://wiki.apache.org/solr/Join


17
                                                                                © Copyright 2012
spatial Query Parsers

     •Operates on geohash, latlon, and point types
     •geofilt
      - Exact distance filtering
      - fq={!geofilt sfield=location pt=10.312,-20.556 d=3.5}
     •bbox
      - Alternatively use a range query:
        »fq=location:[45,-94 TO 46,-93]
     •Can use in conjunction with geodist() function
      - Sorting:
        »sort=geodist() asc
      - Returning distance:
        »fl=_dist_:geodist()




18
                                                                © Copyright 2012
frange Query Parser: function range

     •Match a field term range, textual or numeric
     •Example:
      - fq={!frange l=0 u=2.2}sum(user_ranking,editor_ranking)




19
                                                                 © Copyright 2012
PostFilter

     •Query's implementing PostFilter interface consulted after
      query and all other filters have narrowed documents for
      consideration
     •Queries supporting PostFilter
      - frange, geofilt, bbox
     •Enabled by setting cache=false and cost >= 100
      - Example:
       »fq={!frange l=5 cache=false cost=200}div(log(popularity),sqrt(geodist()))
     •More info:
      - Advanced filter caching
       »http://searchhub.org/2012/02/10/advanced-filter-caching-in-solr/
      - Custom security filtering
       »http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/



20
                                                                              © Copyright 2012
Phonetic, Stem, and Synonym Matching

     •Users tend to expect loose matching
      - but with "more exact" matches ranked higher
     •Various mechanisms for loosening matching:
      - Phonetic sounds-like: cat/kat, similar/similer
      - Stemming: search/searches/searched/searching
      - Synonyms: cat/feline, dog/canine
     •Distinguish ranking between exact and looser matching:
      - copyField original to a new (unstored, yet indexed) field with desired
        looser matching analysis
      - query across original field and looser field, with higher boosting for
        original field
       »/select?q=Monchengladbach&defType=dismax&qf=name^5 name_phonetic




21
                                                                       © Copyright 2012
Suggesting Things, Not Strings

     •Model It As You Need It
      - Leverage Lucene's Document/Field/Query/score & sort & highlight
     •Example 1: Selling automobile parts
      - Exact year/make/model is needed to pick the right parts
      - Suggest a vehicle as user types
       »from the main parts index: tricky, requires lots of special fields and analysis
        tricks and even then you're suggesting fields from "parts"
       »Another (better?) approach: model vehicles as a separate core, "search"
        when suggesting, return documents, not field terms
         ▪ maybe even separate core for makes and models
     •Example 2: Bundesliga Teams
      - /select?q=fr*&wt=csv&fl=name
       »Eintracht Frankfurt
       »Sport-Club Freiburg



22
                                                                                 © Copyright 2012
Development and Troubleshooting Tools

     •Analysis
      - /analysis/field
        »?analysis.fieldname=name
        »&analysis.fieldvalue=FC ApacheCon 2012
        »&q=apachecon
        »&analysis.showmatch=true
      - Also /analysis/document
      - admin UI analysis tool
     •Query Parsing
      - &debug=query
     •Relevancy
      - &debug=results
        »shows scoring explanations



23
                                                  © Copyright 2012
Future of Solr Query Parsing

     •XML Query Parser
      - Will allow a rich query "tree"
      - Parameters will fill in variables in a server-side query tree definition, or can
        provide full query tree
      - Useful for "advanced" query, multi-valued, input
      - https://issues.apache.org/jira/browse/SOLR-839
     •PayloadTermQuery
      - Solr supports indexing payload data on terms using
        DelimitedPayloadTokenFilter, but currently no support for querying with
        payloads
      - Requires custom Similarity implementation to provide score factor for
        payload data
      - https://issues.apache.org/jira/browse/SOLR-1485
     •(ToParent|ToChild)BlockJoinQuery
      - https://issues.apache.org/jira/browse/SOLR-3076


24
                                                                                 © Copyright 2012
Additional Information

     •Mark Miller on Query Parsers
      - http://searchhub.org/dev/2009/02/22/exploring-query-parsers/
     •LucidWorks
      - http://www.lucidworks.com
     •SearchHub
      - http://searchhub.org
      - Search Lucene/Solr (and more) e-mail lists, JIRA issues, wiki
        pages, etc




25
                                                                        © Copyright 2012
Query Parsing
    Tips & Tricks
Presented by Erik Hatcher of LucidWorks




                                          © Copyright 2012

More Related Content

What's hot

Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScyllaDB
 
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~Fujio Kojima
 
Alfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursAlfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursJ V
 
Samba4を「ふつうに」使おう!
Samba4を「ふつうに」使おう!Samba4を「ふつうに」使おう!
Samba4を「ふつうに」使おう!基信 高橋
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoScyllaDB
 
関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐりKazuyuki TAKASE
 
JenkinsとSeleniumの活用事例
JenkinsとSeleniumの活用事例JenkinsとSeleniumの活用事例
JenkinsとSeleniumの活用事例Takeshi Kondo
 
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크Jeongkyu Shin
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...NTT DATA Technology & Innovation
 
Content Security Policy - Lessons learned at Yahoo
Content Security Policy - Lessons learned at YahooContent Security Policy - Lessons learned at Yahoo
Content Security Policy - Lessons learned at YahooBinu Ramakrishnan
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionDatabricks
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016Brendan Gregg
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemDataWorks Summit
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyDaniel Bimschas
 
Zookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringZookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringNoahKIM36
 
Akkaで分散システム入門
Akkaで分散システム入門Akkaで分散システム入門
Akkaで分散システム入門Shingo Omura
 

What's hot (20)

Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
 
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~
C# 式木 (Expression Tree) ~ LINQをより深く理解するために ~
 
HDFS Erasure Coding in Action
HDFS Erasure Coding in Action HDFS Erasure Coding in Action
HDFS Erasure Coding in Action
 
Alfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy BehavioursAlfresco Content Modelling and Policy Behaviours
Alfresco Content Modelling and Policy Behaviours
 
Samba4を「ふつうに」使おう!
Samba4を「ふつうに」使おう!Samba4を「ふつうに」使おう!
Samba4を「ふつうに」使おう!
 
Introduction to Perf
Introduction to PerfIntroduction to Perf
Introduction to Perf
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
 
関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり関数型プログラミングのデザインパターンひとめぐり
関数型プログラミングのデザインパターンひとめぐり
 
JenkinsとSeleniumの活用事例
JenkinsとSeleniumの活用事例JenkinsとSeleniumの活用事例
JenkinsとSeleniumの活用事例
 
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
Backend.AI: 오픈소스 머신러닝 인프라 프레임워크
 
Logstash
LogstashLogstash
Logstash
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...
9/14にリリースされたばかりの新LTS版Java 17、ここ3年間のJavaの変化を知ろう!(Open Source Conference 2021 O...
 
Content Security Policy - Lessons learned at Yahoo
Content Security Policy - Lessons learned at YahooContent Security Policy - Lessons learned at Yahoo
Content Security Policy - Lessons learned at Yahoo
 
Making Structured Streaming Ready for Production
Making Structured Streaming Ready for ProductionMaking Structured Streaming Ready for Production
Making Structured Streaming Ready for Production
 
Linux Systems Performance 2016
Linux Systems Performance 2016Linux Systems Performance 2016
Linux Systems Performance 2016
 
Managing enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystemManaging enterprise users in Hadoop ecosystem
Managing enterprise users in Hadoop ecosystem
 
Zero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with NettyZero-Copy Event-Driven Servers with Netty
Zero-Copy Event-Driven Servers with Netty
 
Zookeeper 활용 nifi clustering
Zookeeper 활용 nifi clusteringZookeeper 활용 nifi clustering
Zookeeper 활용 nifi clustering
 
Akkaで分散システム入門
Akkaで分散システム入門Akkaで分散システム入門
Akkaで分散システム入門
 

Viewers also liked

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniqueslucenerevolution
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrVadim Kirilchuk
 
Simple fuzzy name matching in solr
Simple fuzzy name matching in solrSimple fuzzy name matching in solr
Simple fuzzy name matching in solrDavid Murgatroyd
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solrlucenerevolution
 
Understanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal KucUnderstanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal Kuclucenerevolution
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...lucenerevolution
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesLucidworks (Archived)
 

Viewers also liked (8)

Advanced query parsing techniques
Advanced query parsing techniquesAdvanced query parsing techniques
Advanced query parsing techniques
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
Numeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and SolrNumeric Range Queries in Lucene and Solr
Numeric Range Queries in Lucene and Solr
 
Simple fuzzy name matching in solr
Simple fuzzy name matching in solrSimple fuzzy name matching in solr
Simple fuzzy name matching in solr
 
Grouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/SolrGrouping and Joining in Lucene/Solr
Grouping and Joining in Lucene/Solr
 
Understanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal KucUnderstanding and visualizing solr explain information - Rafal Kuc
Understanding and visualizing solr explain information - Rafal Kuc
 
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
Boosting Documents in Solr by Recency, Popularity and Personal Preferences - ...
 
Boosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User PreferencesBoosting Documents in Solr by Recency, Popularity, and User Preferences
Boosting Documents in Solr by Recency, Popularity, and User Preferences
 

Similar to Query Parsing - Tips and Tricks

Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash courseTommaso Teofili
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Kai Chan
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" DataArt
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunktdthomassld
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platformTommaso Teofili
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Luceneotisg
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverLucidworks (Archived)
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersSematext Group, Inc.
 
Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8Simon Ritter
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!Paul Borgermans
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaCominvent AS
 
Decoupled Libraries for PHP
Decoupled Libraries for PHPDecoupled Libraries for PHP
Decoupled Libraries for PHPPaul Jones
 

Similar to Query Parsing - Tips and Tricks (20)

Solr5
Solr5Solr5
Solr5
 
Apache Solr crash course
Apache Solr crash courseApache Solr crash course
Apache Solr crash course
 
Apache Solr for begginers
Apache Solr for begginersApache Solr for begginers
Apache Solr for begginers
 
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
Search Engine Building with Lucene and Solr (So Code Camp San Diego 2014)
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys" IT talk SPb "Full text search for lazy guys"
IT talk SPb "Full text search for lazy guys"
 
Apache solr
Apache solrApache solr
Apache solr
 
From Lucene to Solr 4 Trunk
From Lucene to Solr 4 TrunkFrom Lucene to Solr 4 Trunk
From Lucene to Solr 4 Trunk
 
Apache Solr - Enterprise search platform
Apache Solr - Enterprise search platformApache Solr - Enterprise search platform
Apache Solr - Enterprise search platform
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Finite State Queries In Lucene
Finite State Queries In LuceneFinite State Queries In Lucene
Finite State Queries In Lucene
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than EverApache Solr 1.4 – Faster, Easier, and More Versatile than Ever
Apache Solr 1.4 – Faster, Easier, and More Versatile than Ever
 
Scaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch ClustersScaling Massive Elasticsearch Clusters
Scaling Massive Elasticsearch Clusters
 
Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8Improved Developer Productivity In JDK8
Improved Developer Productivity In JDK8
 
Find it, possibly also near you!
Find it, possibly also near you!Find it, possibly also near you!
Find it, possibly also near you!
 
Oslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alphaOslo Solr MeetUp March 2012 - Solr4 alpha
Oslo Solr MeetUp March 2012 - Solr4 alpha
 
Decoupled Libraries for PHP
Decoupled Libraries for PHPDecoupled Libraries for PHP
Decoupled Libraries for PHP
 

More from Erik Hatcher

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes WorkshopErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)Erik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 

More from Erik Hatcher (20)

Ted Talk
Ted TalkTed Talk
Ted Talk
 
Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 
Solr Recipes Workshop
Solr Recipes WorkshopSolr Recipes Workshop
Solr Recipes Workshop
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)code4lib 2011 preconference: What's New in Solr (since 1.4.1)
code4lib 2011 preconference: What's New in Solr (since 1.4.1)
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 

Recently uploaded

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxMarkSteadman7
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 

Recently uploaded (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 

Query Parsing - Tips and Tricks

  • 1. Query Parsing Tips & Tricks Presented by Erik Hatcher of LucidWorks © Copyright 2012
  • 2. Description Interpreting what the user meant and what they ideally would like to find is tricky business. This talk will cover useful tips and tricks to better leverage and extend Solr's analysis and query parsing capabilities to more richly parse and interpret user queries. 2 © Copyright 2012
  • 3. Abstract In this talk, Solr's built-in query parsers will be detailed included when and how to use them. Solr has nested query parsing capability, allowing for multiple query parsers to be used to generate a single query. The nested query parsing feature will be described and demonstrated. In many domains, e-commerce in particular, parsing queries often means interpreting which entities (e.g. products, categories, vehicles) the user likely means; this talk will conclude with techniques to achieve richer query interpretation. 3 © Copyright 2012
  • 4. Query Parsers in Solr 4 © Copyright 2012
  • 5. Query Parsers in Solr 5 © Copyright 2012
  • 6. lucene Query Parser, Solr style •FieldType awareness - range queries, numerics - allows date math - reverses wildcard terms, if indexing used ReverseWildcardFilter •Magic fields - _val_: function query injection - _query_: nested query, to use a different query parser •Multi-term analysis (type="multiterm") - Analyzes prefix, wildcard, regex expressions »to normalize diacritics, lowercase, etc - If not explicitly defined, all MultiTermAwareComponent's from query analyzer are used, or KeywordTokenizer for effectively no analysis •http://wiki.apache.org/solr/SolrQuerySyntax#lucene 6 © Copyright 2012
  • 7. dismax • Simple constrained syntax - "supports phrases" +requiredTerms -prohibitedTerms loose terms • Spreads terms across specified query fields (qf) and entire query string across phrase fields (pf) - with field-specific boosting - and explicit and implicit phrase slop - scores each document with the maximum score for that document as produced by any subquery; primary score associated with the highest boost, not the sum of the field scores (as BooleanQuery would give) • Minimum match (mm) allows query fields gradient between AND and OR - some number of terms must match, but not all necessarily, and can vary depending on number of actual query terms • Additive boost queries (bq) and boost functions (bf) • Debug output includes parsed boost and function queries 7 © Copyright 2012
  • 8. Specifying the Query Parser •defType=parser_name - defines main query parser •{!parser_name local=param...}expression - Can specify parser per query expression •These are equivalent: - q=FC Schalke 04&defType=dismax&mm=2&qf=name - q={!dismax qf=name mm=2}FC Schalke 04 - q={!dismax qf=name mm=2 v='FC Schalke 04'} 8 © Copyright 2012
  • 9. Local Parameter Substitution •/document?id=13 9 © Copyright 2012
  • 10. Nested Query Parsing •Leverages the "lucene" query parser's _query_ trick •Example: - q=_query_:"{!dismax qf='title^2 body' v=$user_query}" AND _query_:"{!dismax qf='keywords^5 description^2' v=$topic}" - &user_query=hoffenheim schalke - &topic=news •Setting the complex nested q parameter in a request handler can make the client request lean and clean - And even qf and other parameters can be substituted: »{!dismax qf=$title_qf pf=$title_pf v=$title_query} »&title_qf=title^5 subtitle^2... •Real world example, Stanford University Libraries: - http://searchworks.stanford.edu/advanced - Insanely complex sets of nested dismax's and qf/pf settings 10 © Copyright 2012
  • 11. edismax: Extended Dismax Query Parser •"An advanced multi-field query parser based on the dismax parser" - Handles "lucene" syntax as well as dismax features •Fields available to user may be limited (uf) - including negations and dynamic fields, e.g. uf=* -cost -timestamp •Shingles query into 2 and 3 term phrases - Improves quality of results when query contains terms across multiple fields - pf2/pf3 and ps2/ps3 - removes stop words from shingled phrase queries •multiplicative "boost" functions •Additional features - Query comprised entirely of "stopwords" optionally allowed »if indexed, but query analyzer is set to remove them - Allow "lowercaseOperators" by default; or/OR, and/AND 11 © Copyright 2012
  • 12. term Query Parser •FieldType aware, no analysis - converts to internal representation automatically •"raw" query parser is similar - though raw parser is not field type aware; no internal representation conversion •Best practice for filtering on single facet value - fq={!term f=facet_field}crazy:value :) »no query string escaping needed; but of course still need URL encoding when appropriate 12 © Copyright 2012
  • 13. prefix Query Parser •No field type awareness •{!prefix f=field_name}prefixValue - Similar to Lucene query parser field_name:prefixValue* - Solr's "lucene" query parser has multiterm analysis capability, but the prefix query parser does not analyze 13 © Copyright 2012
  • 14. boost Query Parser •Multiplicative to wrapped query score - Internally used by edismax "boost" •{!boost b=recip(ms(NOW,mydatefield),3.16e-11,1,1)}foo 14 © Copyright 2012
  • 15. field Query Parser •Same as handling of field:"Some Text" clause by Solr's "lucene" query parser •FieldType aware - TermQuery generated, unless field type has special handling •TextField - PhraseQuery: if multiple tokens in different positions - MultiPhraseQuery: if multiple tokens share some positions - BooleanQuery: if multiple terms all in same position - TermQuery: if only a single token •Other types that handle field queries specially: - currency, spatial types (point, latlon, etc) - {!field f=location}49.25,8.883333 15 © Copyright 2012
  • 16. surround Query Parser •Creates Lucene SpanQuery's for fine-grained proximity matching, including use of wildcards •Uses infix and prefix notation - infix: AND/OR/NOT/nW/nN/() - prefix: AND/OR/nW/nN - Supports Lucene query parser basics »field:value, boost^5, wild?c*rd, prefix* - Proximity operators: »N: ordered »W: unordered •No analysis of clauses - requires user or search client to lowercase, normalize, etc •Example: - q={!surround}hoffenheim 4w schalke 16 © Copyright 2012
  • 17. join Query Parser •Pseudo-join - Field values from inner result set used to map to another field to select final result set - No information from inner result set carries to final result set, such as scores or field values (it's not SQL!) •Can join from another local Solr core - Allows for different types of entities to be indexed in separate indexes altogether, modeled into clean schemas - Separate cores can scale independently, especially with commit and warming issues •Syntax: - {!join from=... to=... [fromIndex=core_name]}query •For more information: - Yonik's Lucene Revolution 2011 presentation: http://vimeo.com/25015101 - http://wiki.apache.org/solr/Join 17 © Copyright 2012
  • 18. spatial Query Parsers •Operates on geohash, latlon, and point types •geofilt - Exact distance filtering - fq={!geofilt sfield=location pt=10.312,-20.556 d=3.5} •bbox - Alternatively use a range query: »fq=location:[45,-94 TO 46,-93] •Can use in conjunction with geodist() function - Sorting: »sort=geodist() asc - Returning distance: »fl=_dist_:geodist() 18 © Copyright 2012
  • 19. frange Query Parser: function range •Match a field term range, textual or numeric •Example: - fq={!frange l=0 u=2.2}sum(user_ranking,editor_ranking) 19 © Copyright 2012
  • 20. PostFilter •Query's implementing PostFilter interface consulted after query and all other filters have narrowed documents for consideration •Queries supporting PostFilter - frange, geofilt, bbox •Enabled by setting cache=false and cost >= 100 - Example: »fq={!frange l=5 cache=false cost=200}div(log(popularity),sqrt(geodist())) •More info: - Advanced filter caching »http://searchhub.org/2012/02/10/advanced-filter-caching-in-solr/ - Custom security filtering »http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/ 20 © Copyright 2012
  • 21. Phonetic, Stem, and Synonym Matching •Users tend to expect loose matching - but with "more exact" matches ranked higher •Various mechanisms for loosening matching: - Phonetic sounds-like: cat/kat, similar/similer - Stemming: search/searches/searched/searching - Synonyms: cat/feline, dog/canine •Distinguish ranking between exact and looser matching: - copyField original to a new (unstored, yet indexed) field with desired looser matching analysis - query across original field and looser field, with higher boosting for original field »/select?q=Monchengladbach&defType=dismax&qf=name^5 name_phonetic 21 © Copyright 2012
  • 22. Suggesting Things, Not Strings •Model It As You Need It - Leverage Lucene's Document/Field/Query/score & sort & highlight •Example 1: Selling automobile parts - Exact year/make/model is needed to pick the right parts - Suggest a vehicle as user types »from the main parts index: tricky, requires lots of special fields and analysis tricks and even then you're suggesting fields from "parts" »Another (better?) approach: model vehicles as a separate core, "search" when suggesting, return documents, not field terms ▪ maybe even separate core for makes and models •Example 2: Bundesliga Teams - /select?q=fr*&wt=csv&fl=name »Eintracht Frankfurt »Sport-Club Freiburg 22 © Copyright 2012
  • 23. Development and Troubleshooting Tools •Analysis - /analysis/field »?analysis.fieldname=name »&analysis.fieldvalue=FC ApacheCon 2012 »&q=apachecon »&analysis.showmatch=true - Also /analysis/document - admin UI analysis tool •Query Parsing - &debug=query •Relevancy - &debug=results »shows scoring explanations 23 © Copyright 2012
  • 24. Future of Solr Query Parsing •XML Query Parser - Will allow a rich query "tree" - Parameters will fill in variables in a server-side query tree definition, or can provide full query tree - Useful for "advanced" query, multi-valued, input - https://issues.apache.org/jira/browse/SOLR-839 •PayloadTermQuery - Solr supports indexing payload data on terms using DelimitedPayloadTokenFilter, but currently no support for querying with payloads - Requires custom Similarity implementation to provide score factor for payload data - https://issues.apache.org/jira/browse/SOLR-1485 •(ToParent|ToChild)BlockJoinQuery - https://issues.apache.org/jira/browse/SOLR-3076 24 © Copyright 2012
  • 25. Additional Information •Mark Miller on Query Parsers - http://searchhub.org/dev/2009/02/22/exploring-query-parsers/ •LucidWorks - http://www.lucidworks.com •SearchHub - http://searchhub.org - Search Lucene/Solr (and more) e-mail lists, JIRA issues, wiki pages, etc 25 © Copyright 2012
  • 26. Query Parsing Tips & Tricks Presented by Erik Hatcher of LucidWorks © Copyright 2012