Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio

Eche un vistazo a continuación

1 de 31 Anuncio

Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

Descargar para leer sin conexión

Audio available: https://www.liferay.com/web/events-symposium-north-america/recap

Liferay makes it easy to integrate your application with powerful search engines. However, it may be hard to diagnose why your most important content isn't showing up the way you need it to. This session will recap the key concepts for indexing and querying with Liferay Search, and present a number of techniques to guarantee your documents will be found with best possible relevance.

André de Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been a Java developer and architect for the last 15 years. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.

Audio available: https://www.liferay.com/web/events-symposium-north-america/recap

Liferay makes it easy to integrate your application with powerful search engines. However, it may be hard to diagnose why your most important content isn't showing up the way you need it to. This session will recap the key concepts for indexing and querying with Liferay Search, and present a number of techniques to guarantee your documents will be found with best possible relevance.

André de Oliveira joined Liferay in early 2014 as a senior engineer and leads the Search Infrastructure team. He's been a Java developer and architect for the last 15 years. Ever since discovering Elasticsearch, he's vowed never to write another SQL WHERE clause again.

Anuncio
Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA (20)

Anuncio

Más reciente (20)

Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

  1. 1. #LSNA17
  2. 2. #LSNA17
  3. 3. #LSNA17 SEO Relevance Pages Liferay assets Whole text is indexed Key/value docs are indexed Opaque ranking criteria Scored queries, filters, field types Reverse engineer Fine tune Third party algorithms Search engine that you control
  4. 4. #LSNA17 GET /_search?explain { "query" : { "term" : { "tag" : "LSNA17" } } } GET /index/type/0/ _explain?q=user_id:2 "value" : 2.7051764, "description" : "score(doc=0,freq=1.0), product of:", "details" : [ { "value" : 0.66422296, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 0.16309182, "description" : "queryNorm" } ] }, { "value" : 4.0726933, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 1.0, "description" : "fieldNorm(doc=0)" "failure to match filter: cache(user_id:[2 TO 2])"
  5. 5. #LSNA17 query = apple eclipse zzz yyy xxx qqq kkk ttt rrr 2.345 doc1: apple banana 2.345 doc2: eclipse moon sun 16.415 doc3: zzz yyy xxx qqq kkk ttt rrr 111
  6. 6. #LSNA17 (Term Frequency/Inverse Document Frequency) In question form... Score increases... Term frequency How often a term appears in a field? + When the term pops up a lot of times along the text Inverse Document Frequency How rare is the term in the whole index? + When the term is found in this document and not many others Field-length norm How short is the field where the term is? + When there isn't much else in the same field (like, a title)
  7. 7. #LSNA17 • • { "must" : { "bool" : { "should" : [ { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }
  8. 8. #LSNA17 ● → FacetedSearcher → ● Indexer ● fields ● score { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }
  9. 9. #LSNA17 Natural language? string: text ● TF/IDF ● case insensitive Score! IDs and Serials? string: keyword ● not_analyzed ● case sensitive ● match | no match No score! Non string data? integer, date, geo_point... ● match | no match No score! (... "no score" really a const = 1)
  10. 10. #LSNA17 // IndexSettingsContributor typeMappingsHelper. addTypeMappings(indexName, myCustomFieldMappings); liferay-type-mappings.json "content": { "index": "analyzed", "type": "string" }, "organizationId": { "index": "not_analyzed", "type": "string" }, "publishDate": { "format": "yyyyMMddHHmmss", "type": "date" }
  11. 11. #LSNA17 • Analyzed human searches • query types • combinations • best relevance Favor text fields over keyword fields.
  12. 12. #LSNA17 "*ubstrin*" • lowercase • * → "full scan" ↓↓↓ • don't score
  13. 13. #LSNA17 1. full text search 2. Prefix 3. n-grams
  14. 14. #LSNA17 • Match → • Prefix → • Phrase → Know your field, use the right queries.
  15. 15. #LSNA17 Write a field specific query builder @Component(service = FieldQueryBuilder.class, immediate = true) public class MyFieldQueryBuilder implements FieldQueryBuilder { public Query build(String field, String keywords) { Fine tune the right queries for your field myBooleanQuery.add(q1, MUST); myBooleanQuery.add(q2, SHOULD); ...
  16. 16. #LSNA17 多言語検索 • Map • suffix → • "b" "a" "d" • Stemming, stopwords (https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html) Pick the right language analyzer.
  17. 17. #LSNA17 document.addText(" myField_ja_JP", japanese); document.addText(" myField_en_US", english); Locale defaultLocale = portal. getSiteDefaultLocale (groupId); document.addText( getLocalizedName ("myField", defaultLocale), translation); addSearchLocalizedTerm (searchQuery, searchContext, " myField"); searchContext. setLocale(themeDisplay.getLocale()); liferay-type-mappings.json "template_ja": { "mapping": { "analyzer": "kuromoji" }, "match": "w+_ja_[A-Z]{2}b" }
  18. 18. #LSNA17 • description, content • title, title_en_US • content 2x matching query clauses = inflated relevance. Match once and only once.
  19. 19. #LSNA17 If already indexing once... document.addText(getLocalizedName("myField", languageId), translation); … no need to index twice... // DON'T //// document.addText(" myField", content); … match once and only once. addSearchLocalizedTerm(searchQuery, searchContext, " myField"); // DON'T //// addSearchTerm(searchQuery, searchContext, " myField");
  20. 20. #LSNA17 • docs • value • display • highlight Index for rendering, render from doc.
  21. 21. #LSNA17 analyzed ✔ ✗ [30] Liferay [15] DXP [15] Symposium
  22. 22. #LSNA17 not_analyzed ✔ ✗ [15] Liferay DXP [15] Liferay Symposium
  23. 23. #LSNA17 • Aggregate not_analyzed – [15] Liferay DXP – [15] Liferay Symposium • Match analyzed – 2 fields, 1 analyzed, 1 not_analyzed.
  24. 24. #LSNA17 Search on the text field new MatchQuery("myfield", keywords); Aggregate on the keyword field myFacet.setFieldName("myfield.raw");
  25. 25. #LSNA17 • multifields (https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html) • Copy Fields (https://wiki.apache.org/solr/SchemaXml#Copy_Fields) • analyzed • not_analyzed
  26. 26. #LSNA17 • elasticsearch-head • Solr Admin • query string • explain Tweak clauses, re-run query, repeat.
  27. 27. #LSNA17
  28. 28. #LSNA17
  29. 29. #LSNA17
  30. 30. Thank you! And lots of relevant content at #LSNA17
  31. 31. #LSNA17

×