"Search" is one of those things that our users take for granted but is surprisingly difficult to get right. In this talk we'll dig into the essential concepts in implementing a proper full-text search, highlighting the challenges such as sorting by relevancy, term and column boosting, stemming and lemmatisation, and the brutality of compound words. We'll look at implementing full-text search with an embedded search engine as well as with the true workhorse of both iOS and Android developers alike – SQLite and its FTS tables.
8. Supla
Audio podcast application
for Finland’s largest
commercial radio
broadcasting company
producing both original
content and on-demand
episodes of radio
programming.
8
10. FULL-TEXT SEARCHSEARCHSEARCH
10
?
A search that compares every word in a document, as
opposed to searching an abstract or a set of keywords
associated with the document.
I type some words and
the app gives me relevant content
despite minor spelling differences,
sorted in a way that makes sense.
15. Ranking defines how good your Search is
• Number of occurrences
• Length of match vs length of text
• Starting position of first match
• Full word match vs prefix match
• Match in title vs match in body
• Date of matching document
• Popularity amongst other users
• Behavioural conditioning
15
16. Search Interface Guidelines
• Don’t block the UI thread with searches.
• Don’t block the database with updates.
• Don’t block new searches with old ones.
• Don’t delay showing results any longer than you must.
• Indicate activity while searching.
• Differentiate “no matches” from “didn’t do anything”.
• Adding terms should narrow the results.
16
22. CLucene + BRFullTextSearch
searching documents
// ’t’ ~ kBRSearchFieldNameTitle, ‘k’ ~ kBRSearchFieldNameValue
let query = “t:(surf* OR board*) OR v:(surf* OR board*)"
let boostedQuery = "t:("surf*") OR v:("board*"^10)"
let results = lucene.search(query)
results.iterateWithBlock({ (i, result, stop) in
NSLog("Match: (result.identifier):
(result.dictionaryRepresentation())")
})
22
23. MobileLucene
searching documents
// Using wrapper classes from Github user ‘lukhnos’:
import org.lukhnos.lucenestudy.SearchResult;
import org.lukhnos.lucenestudy.Searcher;
import org.lukhnos.lucenestudy.Document;
Searcher searcher = new Searcher("Stuff.idx");
SearchResult result = searcher.search(query, 100);
for (Document doc : result.documents) {
Log.d("SEARCH", "Found: " + doc.title);
}
searcher.close()
23
24. SQLite FTS4 tables
indexing documents
// create an FTS table for the index
CREATE VIRTUAL TABLE pages
USING fts4(title, body, tokenize=icu fi_FI);
// add a document to the index
INSERT INTO pages (docid, title, body)
VALUES(42, 'MCE^3 2016', 'Still Pure awesomeness');
// optimise the index when the app is idle
INSERT INTO pages(pages) VALUES('optimize');
24
25. SQLite FTS4 tables
searching documents
-- search across all columns, order by "matchinfo"
01 SELECT * FROM pages WHERE pages
02 MATCH 'surf* OR board*'
03 ORDER BY matchinfo(pages) DESC;
25
26. SQLite FTS4 tables
searching documentssearching documents
01 SELECT title, docid FROM pages
02
03
04
05
06
07
09
10 WHERE pages MATCH 'surf* OR board*'
02 JOIN (
03 SELECT docid,
04 bm25f(matchinfo(pages,'pcxnal'), 10, 1) AS rank
05 -- 'bm25f' is a custom SQL function!
06 FROM pages WHERE pages MATCH 'surf* OR board*'
07 ORDER BY rank DESC LIMIT 1000 OFFSET 0
09 ) AS ranktable USING(docid)
10
11 ORDER BY ranktable.rank DESC
26