Audio available: https://www.liferay.com/web/events-symposium-north-america/recap
Liferay makes it easy to integrate your application with powerful search engines. However, it may be hard to diagnose why your most important content isn't showing up the way you need it to. This session will recap the key concepts for indexing and querying with Liferay Search, and present a number of techniques to guarantee your documents will be found with best possible relevance.
André de Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been a Java developer and architect for the last 15 years. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.
3. #LSNA17
SEO Relevance
Pages Liferay assets
Whole text is indexed Key/value docs are indexed
Opaque ranking criteria Scored queries, filters, field types
Reverse engineer Fine tune
Third party algorithms Search engine that you control
6. #LSNA17
(Term Frequency/Inverse Document Frequency)
In question form... Score increases...
Term frequency How often a term appears in a field? + When the term pops up a lot of
times along the text
Inverse Document
Frequency
How rare is the term in the whole index? + When the term is found in this
document and not many others
Field-length norm How short is the field where the term is? + When there isn't much else in
the same field (like, a title)
9. #LSNA17
Natural
language?
string:
text
● TF/IDF
● case insensitive
Score!
IDs and
Serials?
string:
keyword
● not_analyzed
● case sensitive
● match | no match
No score!
Non string
data?
integer,
date,
geo_point...
● match | no match No score!
(... "no score" really a const = 1)
14. #LSNA17
• Match →
• Prefix →
• Phrase →
Know your field, use the right queries.
15. #LSNA17
Write a field specific query builder
@Component(service = FieldQueryBuilder.class, immediate = true)
public class MyFieldQueryBuilder implements FieldQueryBuilder {
public Query build(String field, String keywords) {
Fine tune the right queries for your field
myBooleanQuery.add(q1, MUST); myBooleanQuery.add(q2, SHOULD); ...
16. #LSNA17
多言語検索
• Map
• suffix →
• "b" "a" "d"
• Stemming, stopwords
(https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html)
Pick the right language analyzer.
18. #LSNA17
• description, content
• title, title_en_US
• content
2x matching query clauses = inflated relevance.
Match once and only once.
19. #LSNA17
If already indexing once...
document.addText(getLocalizedName("myField", languageId), translation);
… no need to index twice...
// DON'T //// document.addText(" myField", content);
… match once and only once.
addSearchLocalizedTerm(searchQuery, searchContext, " myField");
// DON'T //// addSearchTerm(searchQuery, searchContext, " myField");