2. Coffee, Danish & Search
Alan Woodward @romseygeek
Charlie Hull @flaxsearch
Flax
3. 3
01
Who are Flax?
We build, tune and support fast, accurate and highly
scalable search, analytics and Big Data applications
We use (and create) open source software
We're independent, honest and have 15+ years
experience
We also:
– Run and attend many meetups, events & conferences
– Write extensively about search & related matters
– Train and mentor
• Based in Cambridge U.K with clients across the world
4. 4
01
Who are we?
Charlie Hull - @flaxsearch
– Co-founder & Managing Director of Flax
– Runs the London Lucene/Solr Meetup
Alan Woodward - @romseygeek
– Director of Flax
– Solr committer & PMC member
6. 6
01
The client
• Founded in 2003
• The leading Danish provider of media monitoring and media analysis
• Largest and oldest Danish Media archive with access to approximately
75 million searchable articles
• Based in Copenhagen
7. 7
01
The client
• Founded in 2003
• The leading Danish provider of media monitoring and media analysis
• Largest and oldest Danish Media archive with access to approximately
75 million searchable articles
• Based in Copenhagen
– Coffee (and beer) very expensive!
8. 8
01
Two systems
• Media monitoring
– Runs stored search queries against incoming articles
– Very old (2001) system based on Verity
– At maximum capacity needing constant attention
– Unsupported by HP and not scalable
9. 9
01
Two systems
• Media monitoring
– Runs stored search queries against incoming articles
– Very old (2001) system based on Verity
– At maximum capacity needing constant attention
– Unsupported by HP and not scalable
• Archive search
– Allows users to query a multi-year archive of articles
– Slightly less old system based on Autonomy IDOL
– Different query language to Verity
– Not performing well
10. 1
01
The project
• Build a completely new search architecture to replace Verity and IDOL
• Define our own query language, IQL, owned and controlled by Infomedia
• Translate over 8000 old monitoring queries to this new IQL syntax
12. 1
01
The plan
• For archive search – Solr
• For media monitoring (stored search) – Luwak
– A library based on Lucene
– Up to 40x faster than Elasticsearch Percolator
– Used by Bloomberg, Booz Allen Hamilton, ….
– https://github.com/flaxsearch/luwak
14. 1
01
Search turned upside down (2)
Docs
Result
Query
Query
Stored
Queries
1 million queries
Some 250k long
Complex rules
1 million new documents
a day
$$$
$$$
Within 5-100ms
15. 1
01
Search turned upside down (3)
Docs
Result
Query
Query
Stored
Queries
1 million queries
Some 250k long
Complex rules
1 million new documents
a day
$$$
$$$
Within 5-100ms
16. 1
01
Search turned upside down (4)
Docs
Query
Query
Stored
Queries
1.
Pre
Query
Subset
Doc
1 million queries
Some 250k long
Complex rules 1 million new documents
a day
~200
17. 1
01
Search turned upside down (5)
Docs
Query
Query
Stored
Queries
1.
Pre
Query
Subset
Result
2.
Search
Doc
1 million queries
Some 250k long
Complex rules 1 million new documents
a day
~200
19. 1
01
Archive search
• We had to build some Solr components:
– A shared Query Parser for both monitor and archive
– A shared Highlighter
20. 2
01
Archive search
• We had to build some Solr components:
– A shared Query Parser for both monitor and archive
– A shared Highlighter
• We had to deal with multiple languages
– English, Danish, German and Faroese
– Language analysis for each of these is very different
e.g. in Danish ‘eleven’ = ‘the student’ and should be
stemmed to ‘elev’
21. 2
01
Query parser
• Monitoring systems have special requirements for complex
query building
l Nested proximity: x w/30 (y notw/10 z)
l Multiple analysis of terms: exact, stemmed, capitalised
22. 2
01
Query parser
• Monitoring systems have special requirements for complex
query building
l Nested proximity: x w/30 (y notw/10 z)
l Multiple analysis of terms: exact, stemmed, capitalised
l Existing query parsers don’t capture this functionality well
l so define a new query language (IQL) and build a parser
around it
23. 2
01
Query parser (2)
• Second query parser that converts legacy queries (Verity)
to new query language (IQL)
l Because we have control of the query parser, we can
ensure that we only use queries that we can highlight
l Custom analyzers that index multiple versions of a token at
the same position
24. 2
01
Highlighter
• Monitor uses large, complex queries with multiple
subclauses
• Need for an accurate highlighter to show exactly where in a
document the query matched
• Existing highlighters give ‘best guesses’ and snippets
25. 2
01
Highlighter (2)
l To get exact matches, we can use the SpanCollector
API introduced in LUCENE-6371
l To highlight a document, build a MemoryIndex on-the-fly,
and extract matching Spans
l Limitations:
l only works with queries that can be rewritten as
SpanQueries - so no sloppy phrases, for example
26. 2
01
Scary queries((((aymnasoesoeekzueazez* OR 'aymnasoeeleveenes lanssueaanosazoun*' OR 'aymnasoelæeeenes oseæzslæeeefueenona*' OR 'aymnasoeskuleenes oseæzslæeeefueenona*' OR 'aymnasoeskuleenes læeeefueenona*' OR 'aymnasoeenes mazemazoklæeeefueenona*' OR aymnasoeeefuem* OR xeasona:unasumsussannelse* OR ((aymnasoum*
OR aymnasoee*) NEAR/15 (kaeakzeefukus* OR elevzal*)) OR (((aymnasoe* OR unasumsussannelse* OR 'aymnasoal* ussannelse*' OR aymnasoum) NEAR/14 (kaeakzeekeav* OR asaanaskeav* OR beuaeebezalona* OR besqae* OR nesskæe*)) AND aeuuqfoels:z_ms_lanss) OR
l xeasona:unasumsussannelse* OR (aymnasoe* NEAR/9 valafaa*) OR (xeasona:~Gymnasoum) OR (aymnasoee* NEAR/9 xanselsskule*) OR ((~HF OR aymnasoe* OR aymnasoum* OR unasumsussannelse*) NEAR/9 lekzoe*) OR xeasona:~Gymnasoum OR (((aymnasoez* OR aymnasoum*) NEAR/14 unasumsussannelse*) AND
(aeuuqfoels:z_ms_lanss OR aeuuqfoels:z_ms_eea)) OR ('~Danske ~Gymnasoee' AND aeuuqfoels:z_ms_lanss) OR(('~Danske ~Gymnasoee' AND unasumsussannelse*) AND aeuuqfoels:z_ms_lanss) OR ((((aymnasoe* OR aymnasoum* OR unasumsussannelse*) NEAR/14 kaeakzeeskala*) NOT (aeuuqfoels:z_mu_web_1))) OR ((aymnasoe* OR
aymnasoum*) NEAR/9 (zaxamezee*)) OR ((unasumsussannelse*) NEAR/15 (ussannelsesumeåse*)) OR (('feemzosens aymnasoum' OR 'sez almene aymnasoum*') AND aeuuqfoels:z_ms_lanss) OR ((((aymnasoe* OR aymnasoum* OR unasumsussannelse* OR 'aymnasoal ussannelse*') NEAR/15 (szusenzeeeksamen* OR kaeakzeeee* OR
suqqleeonasfaa* OR suqqleeonaskuesus*)) AND aeuuqfoels:z_ms_lanss)) OR (szusenzeeeksamen* NEAR/19 ('aymnasoale faa' OR suqqleeonasfaa* OR suqqleeonaskuesus*)) OR ((aymnasoe* AND kaeakzeekeav*) AND aeuuqfoels:z_ms_lanss)
l OR (aymnasoe* NEAR/14 asaanaskeav*) OR '~EYES DK' OR 'nyz aymnasoum' OR (('uno c' OR 'uno cs' OR ('sanmaeks oz cenzee' NEAR/4 ussannelse*) OR (~EMU NEAR/9 (unseevosnona* OR *quezal* OR xjemmesose* OR websoze*)) OR 'emu.sk*' OR ~SkuleInzea* OR '%læeeeonzea' OR uno?c OR easy?a OR 'sez
szusoeasmonoszeazove syszem*' OR (~SIS NEAR/15 (szusoeuesnona* OR szusoequezal OR 'szusoe onfuemazoun' OR 'szusoe onfuemazoun* syszem*')) OR 'elevqlan.sk' OR elevqlan?sk OR (emu NEAR/15 ('elekzeunosk møseszes unseevosnona*' OR unseevosnona* OR elekzeunosk* OR unseevosnonasquezal* OR 'unseevosnona?
quezal*')) OR 'elekzeunosk møseszes fue unseevosnona*' OR 'elekzeunosk møseszes unseevosnona*' OR fuesknonasnezzez* OR 'fuesknonasnezzez.sk' OR fuesknonasnezzez?sk OR mazeeoaleqlazfuem* OR 'uqzaaelse.sk*' OR uqzaaelse?sk* OR 'qeakzokqlassen.sk*' OR qeazokqlassen?sk* OR sekzuenez* OR sunsxesssazanez* OR
((luaon OR seevee*) NEAR/4 uno) OR ~SkuleInzea* OR 'ussannelsesszazoszok.sk*' OR sekzuenez* OR 'sekzue nez' OR skulekum OR ~SkuleKum* OR ~SkuDa* OR (skusa* NEAR/15 'skuleenes sazabase*') OR ~SkulePeu* OR ussannelsesfueum* OR 'ussannelsesszazoszok.sk*' OR mazeeoaleqlazfuem* OR 'uno seevee*' OR unoseevee* OR
~Szusoeqlan*) NEAR/14 (*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR xanselsaymnaso*)) OR xeasona:unasumsqaelamenz* OR subxeasona:unasumsqaelamenz* OR
l (((*aymnaso* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck*' OR zoezaenskulen*) NEAR/14 læeee) NEAR/14 ('uffenzloa* ansaz*' OR aebejssmoljø* OR aebejssvolkåe* OR
*uveeenskumsz* OR løn OR lønfuexanslona*)) OR (aymnasoelæeee* NEAR/14 ('uffenzloa* ansaz*' OR aebejssmoljø* OR aebejssvolkåe* OR *uveeenskumsz* OR løn OR lønfuexanslona*)) OR (allxeasonas:((aymnasoelevee* OR aymnasoum* OR aymnasoum*))) OR ('sansk onszozuz' NEAR/9 aymnasoeqæsaauaok*) OR (aymnaso* NEAR/4
(xf*)) OR (aymnaso* NEAR/9 (sannelse OR almensannelse OR 'ubloaazueosk faa' OR valafaa OR feemmessqeua* OR qensum OR ((onnuvazoun* OR onnuvaz* OR ovæeksæzzee* OR ovæeksæzzeeo) NEAR/14 unseevosnona))) OR (aymnaso* NEAR/9 (nesskæeona OR sqaee OR unasumsbueameszee OR ussannelsesbueameszee)) OR
(([allxeasonas,subxeasona]:((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen))) AND ussannelse*) OR (((*aymnasoez* OR *aymnasoum* OR aymnasoal*
OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 eekzue*) AND (aymnasoelevee*)) OR (((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk
aymnaso*' OR xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 (lekzoe* OR kaeakzeeee* OR åeskaeakzee)) AND (aymnasoelevee*)) OR ((*aymnasoez* OR *aymnasoum* OR aymnasoal* OR ~HF OR xf OR xxx OR xanselsaymnas* OR 'zeknosk aymnaso*' OR
xanselsaymnaso* OR købmanssskule OR xanselsskule OR xanselsaymnaso* OR 'noels beuck' OR zoezaenskulen) NEAR/10 ('mubol lab*' OR 'eneeao xuesens*' OR eexveevslov* OR voeksumxesee* OR unseevosnonasmon* OR selveje* OR fonansluv* OR 'uffenzloa* sekzue*' OR eeaoun OR szeukzuekummossoun* OR kummuneeefuem* OR
kummunaleefuem* OR szeukzueeefuem* OR faafueenona* OR faafuebuns* OR zollosseeqeæsenzanz* OR aebejsszos* OR klassekvuzoenz* OR *feavæe* OR feafals* OR feafalssqeucenz* OR aennemføesel* OR eekeuzzeeona* OR zollæaslosze* OR qjæk* OR læsnona* OR læseqlan* OR unseevosnona* OR faaloaxes OR 'faaloa* noveau*'
OR valafaa* OR fællesfaa* OR zolvala* OR nazuevosenskab* OR samfunssvosenskab* OR oseæz* OR eksame* OR 'uqeeazoun saasvæek')) OR (xeasona:aymnasoe* AND aeuuqfoels:z_ms_lanss) OR (subxeasona:aymnasoe* AND aeuuqfoels:z_ms_lanss) OR zuqocs:'aymnasoale ussannelsee' AND *aymnaso*[3..]) AND (ueoaonazue:=aae
OR ueoaonazue:=bny OR ueoaonazue:=bee OR ueoaonazue:=bya OR ueoaonazue:=buu OR ueoaonazue:=ccw OR ueoaonazue:=sea OR ueoaonazue:=vvs OR ueoaonazue:=sku OR ueoaonazue:=sob OR ueoaonazue:=soo OR ueoaonazue:=sju OR ueoaonazue:=slm OR ueoaonazue:=efs OR ueoaonazue:=elc OR ueoaonazue:=eex OR
ueoaonazue:=faa OR ueoaonazue:=fmc OR ueoaonazue:=fuz OR ueoaonazue:=fus OR ueoaonazue:=fuq OR ueoaonazue:=fuk OR ueoaonazue:=foz OR ueoaonazue:=ful OR ueoaonazue:=ffu OR ueoaonazue:=fes OR ueoaonazue:=feo OR ueoaonazue:=fys OR ueoaonazue:=aym OR ueoaonazue:=xsk OR ueoaonazue:=xsn OR
ueoaonazue:=xkq OR ueoaonazue:=xko OR ueoaonazue:=xkl OR ueoaonazue:=kum OR ueoaonazue:=xku OR ueoaonazue:=xkl OR ueoaonazue:=xkv OR ueoaonazue:=xke OR ueoaonazue:=xku OR ueoaonazue:=xkz OR ueoaonazue:=xks OR ueoaonazue:=xuj OR ueoaonazue:=ona OR ueoaonazue:=jmo OR ueoaonazue:=juu OR
ueoaonazue:=lav OR ueoaonazue:=lbf OR ueoaonazue:=lbn OR ueoaonazue:=lbs OR ueoaonazue:=lbu OR ueoaonazue:=mum OR ueoaonazue:=mma OR ueoaonazue:=esn OR ueoaonazue:=suq OR ueoaonazue:=suc OR ueoaonazue:=sql OR ueoaonazue:=luu OR ueoaonazue:=mmm OR ueoaonazue:=uns OR ueoaonazue:=ve2 OR
ueoaonazue:=vej OR ueoaonazue:=ueu OR ueoaonazue:=kmu OR ueoaonazue:=aeb OR ueoaonazue:=bza OR ueoaonazue:=bes OR ueoaonazue:=bma OR ueoaonazue:=eks OR ueoaonazue:=onf OR ueoaonazue:=jyq OR ueoaonazue:=kes OR ueoaonazue:=waa OR ueoaonazue:=loc OR ueoaonazue:=efl OR ueoaonazue:=qul OR
ueoaonazue:=eel OR ueoaonazue:=bew OR ueoaonazue:=bex OR ueoaonazue:=eoz OR ueoaonazue:=nqq OR ueoaonazue:=skf OR suuecename:ST 'alzonaez.sk' OR ueoaonazue:=4a2 OR ueoaonazue:=4a1 OR ueoaonazue:=skw OR ueoaonazue:=4a5 OR ueoaonazue:=4a7 OR ueoaonazue:=4a9 OR ueoaonazue:=4ab OR
aeuuqfoels:z_ms_eea OR aeuuqfoels:z_ms_uae OR ueoaonazue:='bu+' OR ueoaonazue:='sy+' OR ueoaonazue:='jv+' OR ueoaonazue:='sa+' OR ueoaonazue:='nu+' OR ueoaonazue:='ue+' OR ueoaonazue:=uek OR ueoaonazue:='24+' OR ueoaonazue:='mx+' OR ueoaonazue:='sj+' OR ueoaonazue:='nv+')) NOT ((xeasona:(('nyz jub' OR 'nuzee:
onslans' OR 'nuzee: uslans' OR 'kuez ua ausz' OR 'eunsz o saa' OR 'eunsz o mueaen' OR 'eunsz o uveemueaen' OR onsbeus* OR luqqemaekes* OR 'åbenz xus*' OR zyveeo OR 'nyz o nuzee' OR 'saaen o saa' OR føsselssaa OR *beylluq* OR 'kaeeoeee kuez' OR cozazxoszueoe OR 'nyxesee fea uslansez o kuez fuem'))) OR
(aeuuqfoels:z_ms_eea AND wuescuunz:<100) OR xeasona:=valakalensee OR xeasona:=søse OR (xeasona:((squezmaszee* OR musokaeuqqe* OR 'xae o xøez' OR ~SOSU OR qæsaauasemonae* OR 'åes szusenzeejubolæum*'))) OR (xeasona:=%navne OR xeasona:='%akzuelle %navne' OR xeasona:=%mæekesaae OR xeasona:=
%føsselssaae OR xeasona:=%navnenyz OR xeasona:=%søse OR xeasona:=%søs OR xeasona:=%ansæzzelse OR xeasona:=%ansæzzelsee OR xeasona:=%feazeæselse OR xeasona:=%feazeæselsee OR xeasona:=%nyføsz OR xeasona:=%nyføsze OR xeasona:=%baenesåb OR xeasona:=%søbz OR xeasona:=%kunfoemazoun OR
xeasona:=%kunfoemazounee OR xeasona:=%kunfoemansee OR xeasona:=%beylluqssaa OR xeasona:=%beylluqssaae OR xeasona:=%kubbeebeylluq OR xeasona:=%sølvbeylluq OR xeasona:=%eubonbeylluq OR xeasona:=%aulsbeylluq OR xeasona:=%soamanzbeylluq OR xeasona:=%keunsoamanzbeylluq OR xeasona:=%jeenbeylluq OR
xeasona:=%beylluq OR xeasona:=%uesenee OR xeasona:=%mesalje OR xeasona:=%uslæez OR xeasona:=%svenseqeøve OR xeasona:=%jubolæum OR xeasona:=%jubolæee OR xeasona:=%szusenzee OR xeasona:=%ausoens OR xeasona:=%søssfals OR xeasona:=%ussannelse OR xeasona:=%usnævnz OR xeasona:=%usnævnelse OR
xeasona:=%'fylsee åe'OR xeasona:(%'navnloa navne' OR %'saaens navne' OR %'nyz um navne' OR %'øveoae navne' OR %'navne o nuzee' OR %'navne o saa' OR %'ansee navne' OR %'lukale navne' OR %nyansæzzelse OR "Nyz jub" OR %'eunse saae' OR %'eunse åe' OR %'eunsz o saa' OR %'eunsz o mueaen' OR %'eunse føsselssaae' OR
%'eunse zal o mueaen' OR %'eunse zal o saa' OR %'eunsz zal o mueaen' OR %'eunsz zal o saa' OR %'eunsz sønsaa' OR %'euns saa' OR %'eunse saae' OR %'føsselssaa o saa' OR %'føsselssaa o mueaen' OR %'osaa fylsee' OR %'o mueaen fylsee' OR %'bosæzzelsee ua beaeavelsee' OR %'beaeavelsee ua bosæzzelsee') OR
l (qaaename:(navne OR menneskee) AND (xeasona:(usnævnelse OR %jubnyz OR voelse OR beylluq OR velsoanelse OR jubolæum OR juboleeee OR %eeceqzoun OR %uesenee OR %efzeeløn OR %søssfals OR %'ee søs' OR %nekeulua OR %monseues OR %leaaz OR monseleaaz* OR fueskeeleaaz* OR æeesleaaz* OR qeos OR
qeosvonsee* OR %bosæzzelsee OR %beaeavelsee)
l OR xeasona:(%nuze AND (%navne OR %mz OR %eksamen OR %szusenzee OR %nyussannese OR %ussannez OR %svenseqeøve)) OR (xeasona:%svense AND svenseqeøve)
l OR xeasona:='%nye %assoszenzee' OR xeasona:='%nye %cxaufføeee' OR xeasona:='%nye %elekzeokeee' OR xeasona:='%nye %aaezneee' OR xeasona:='%nye %xjælqeee' OR xeasona:='%nye %xånsvæekeee' OR xeasona:='%nye %onaenoøeee' OR xeasona:='%nye %labueanzee' OR xeasona:='%nye %lansmæns' OR xeasona:='%nye
%læaee' OR xeasona:='%nye %læeeee' OR xeasona:='%nye %maleee' OR xeasona:='nye mekanokeee' OR xeasona:='%nye %meszee' OR xeasona:='%nye %munzøeee' OR xeasona:='nye %mueeee' OR xeasona:='%nye %uqeeazøeee' OR xeasona:='%nye %qeæszee' OR xeasona:='%nye %eåsaoveee' OR xeasona:='%nye %qæsaauaee'
OR xeasona:='%nye %slaazeee' OR xeasona:='%nye %smese' OR xeasona:='%nye %syaeqlejeeskee' OR xeasona:='%nye %zeknokeee' OR xeasona:='%nye %zeeaqeuzee' OR xeasona:='%nye %zømeeee' OR xeasona:='%nye %økunumee')) OR xeasona:=%'?0 åe' OR xeasona:=%'?5 åe' OR xeasona:="I DAG" OR xeasona:="I MORGEN"
OR xeasona:="DAGEN I DAG" OR xeasona:ST [%monseues, %føsselssaa] OR %'xus seunnona maeaeezxe qå' OR %'fue sen kunaeloae belønnonasmesalje' OR %'følaense zakkese fue usnævnelse zol' OR xeasona:=%'se zakkese seunnonaen' OR xeasona:=%'xus seunnonaen' OR xeasona:(%'o ausoens' AND seunnona) OR xeasona:=%'o
ausoens xus seunnonaen' OR xeasona:=%'o ausoens' OR xeasona:=%ausoens OR xeasona:=%'seunnonaen zua omus' OR xeasona:=%'zak fue uesenee ua mesaljee' OR (seunnona AND (afskessausoens OR 'zolselona af eosseekuesez' OR 'usnævn* zol eossee af sannebeua' OR 'zak fue usnævnelsen zol' OR 'xus seunnonaen fue az zakke
fue' OR 'zakkese fue eosseekuesez af')) OR q1:(%seunnonaen AND fuezjenszmesalje)) OR (xeasona:=%'sez skee' OR xeasona:=%'sez skee:' OR xeasona:=%buakalensee OR xeasona:=%kunsz OR xeasona:(ST %'sez skee' AND (%mansaa OR %zoessaa OR %unssaa OR %zuessaa OR %feesaa OR %løesaa OR %sønsaa)) OR xeasona:
(%'sez skee' AND ("AUGUSTENBORG" OR "GRÅSTEN" OR "SØNDERBORG")) OR
l xeasona:(ST %sez AND (%'sez skee o' OR %'sez skee nezuq nu' OR %'sez skee uae' OR %'sez skee qå' OR %'sez skee lukalz' OR %'sez skee o saa' OR %'skansonavoen o næsze uae' OR %'kalnsee fue koeke')) OR xeasona:(ST zos AND (zos NEAR/2 szes)) OR (ueoaonazue:ST jv* AND (xeasona:=%'o saa' OR xeasona:=%'o mueaen')) OR
xeasona:=%aeeanaemenzee OR xeasona:=%'fasze aeeanaemenzee' OR xeasona:=%'kummense aeeanaemenzee' OR xeasona:=%kalensee OR xeasona:ST "Kalensee" OR xeasona:ST "KALENDER:" OR xeasona:=%kalenseeen OR xeasona:ST %qlakazen OR xeasona:ST %kulzuekalensee OR xeasona:(%'qlakazen feesaa' OR
kulzuekalensee*) OR xeasona:=%kalenseekloq OR xeasona:(%'squez o weekensen' OR %'zee zona o weekensen') OR (xeasona:ST uaens AND xeasona:'uaens folm o') OR (xeasona:ST %uaen AND xeasona:%'uaen see kummee') OR (xeasona:ST saa AND xeasona:%'saa fue saa') OR (ueoaonazue:=BMA AND xeasona:ST %'xuls øje mes'
AND seczounname:'auk.sk') OR xeasona:=%åbnonaszosee OR xeasona:=%usszollona OR xeasona:=%usszollonaen OR xeasona:=%usszollonaee OR (xeasona:ST %usszollonaee AND xeasona:%'usszollonaee o') OR ((xeasona:ST %akzuelle OR xeasona:ST %saaens OR xeasona:ST %uaens OR xeasona:ST %månesens) AND xeasona:
(%usszollona OR %usszollonaee OR %fuzuusszollona OR %fuzuusszollonaee)) OR xeasona:=%kulzueuaen OR xeasona:=%zeazee OR (xeasona:ST %saaens AND xeasona:(%'saaens folm' OR %'saaens kunceez' OR %'saaens zeazee' OR %'saaens zoq' OR %'saaens usszollona')) OR (xeasona:ST %weekensens AND xeasona:
%'weekensens folm') OR (xeasona:ST uaens AND xeasona:%'uaens usvalaze' AND xeasona:(%kunceezee OR %kunsz OR %scene)) OR (xeasona:ST %lukal AND xeasona:(%'lukal kunsz o' OR %'lukal kunsz fea' OR %'lukal kunsz xus' OR 'lukal kunsz qå')) OR (xeasona:ST %kunsz AND xeasona:(%'kunsz o' OR %'kunsz xus' OR %'kunsz fea'
OR %'kunsz qå')) OR xeasona:=%folm OR (xeasona:ST %folm AND xeasona:(%'folm o' OR %'folm füe senoueen')) OR (xeasona:ST %zv AND xeasona:(%'zv o saa' OR %'zv-fueumzale' OR %'zv-umzale')) OR xeasona:=%kunceez OR (xeasona:ST %kunceez AND xeasona:(%'kunceez o' OR '%kunceez mes' OR %'kunceez qå' OR %'kunceez
ves' OR %'kunceez fue' OR %'kunceezee klassosk' OR %'kunceezee eyzmosk')) OR (xeasona:ST %uqeea AND xeasona:(%'uqeea o' OR %'uqeea qå' OR %'uqeea mes' OR %'uqeea ves' OR %'uqeea fue')) OR xeasona:ST %eevy AND xeasona:(%'eevy o' OR %'eevy qå')) OR (xeasona:ST %zeazee AND xeasona:(%'zeazee qå' OR %'zeazee
fea' OR %'zeazee o' OR %'zeazee fue' OR %'zeazee um')) OR (xeasona:ST %'auose:' AND (suuecename:'obyen.sk' OR xeasona:%weekens)) OR xeasona:=%ausszjeneszee OR xeasona:=ausszjenesze OR xeasona:('%saaens ausszjeneszee' OR 'ausszjenesze o' OR 'ausszjenesze qå' OR 'ausszjenesze sønsaa' OR %'ausszjenesze fue' OR
'ausszjenesze mes' OR ausszjeneszelosze* OR 'onaen ausszjenesze' OR 'sønsaaens ausszjeneszee') OR 'see ee aeazos asaana zol kunceezen' OR %'see ee aeazos asaana zol aeeanaemenzez' OR %'see ee aeazos asaana zol fueeseaaez' OR %'see ee aeazos asaana zol museez' OR %'see ee aeazos asaana zol fueeszollonaen' OR %'see
ee aeazos asaana zol usszollonaen') OR v_emnee:folm OR v_emnee:musok OR xeasona:=leaaz OR aeuuqfoels:z_ms_uae OR v_emnee:feasuez_squez OR (qlaces:uslans NOT (qlaces:sanmaek OR ueaanosazouns:[Djøf, 'euskolse unoveesozez', 'kummuneenes lanssfueenona', fuebeuaeeumbussmansen, 'szazsfænaslez o veossløselolle',
fulkezonaez, 'moljø- ua føsevaeemonoszeeoez', 'Nuezxsose feszoval', qeessenævnez, 'euskolse feszoval', HK, 3F, FOA, 'sez ezoske eås', eoasxusqozalez, 'nuesosk wolm', 'købenxavns unoveesozez', kummune, eezzen, byeez, qulozo, 'wulkeskulen (sanmaek)', eeaoun, lansseez, 'syssansk unoveesozez', 'sanmaeks zeknoske unoveesozez',
'Dez nazounale wuesknonascenzee wue velwæes', 'aela wuuss', 'alzeenazovez (qaezo)', 'købenxavns wunssbøes', 'szazens seeum onszozuz', asvukazsamwunsez, 'onszozuz wue menneskeeezzoaxesee', 'aalbuea unoveesozez', 'sez kunaeloae boblouzek', 'Heelev xusqozal', 'sanske eeaounee', 'sanske qazoenzee', wuebeuaeeeåsez,
'wuesokeona & qensoun', læaewueenonaen, eneeaoszyeelsen, bulous, sunsxessszyeelsen, 'xvosuvee xusqozal', 'szazens nazuexoszueoske museum', wøsevaeeszyeelsen, 'sansk onsuszeo', 'lansbeua & wøsevaeee', 'Aaexus unoveesozez', DR, 'bøene- ua unasumsqæsaauaeenes lansswuebuns', 'sanmaeks eejsebueeau wueenona', 'aaexus
unoveesozezsxusqozal', 'sansk aebejssaoveewueenona', 'Sø- ua xanselseezzen', søwaezsszyeelsen, szazsmonoszeeoez, 'Købenxavns byeez', baamanssqulozoez, wøzex, 'bæeesyazoaz lansbeua', 'usense unoveesozezsxusqozal', 'syssansk musokkunseevazueoum', 'købenxavns xuvesbaneaåes', venszee, sucoalsemukeazeene, 'lobeeal
alloance', 'sansk wulkeqaezo', 'sez kunseevazove wulkeqaezo', enxessloszen, alzeenazovez, 'easokale venszee', 'sanmaeks szazoszok', 'eexveevs ua vækszmonoszeeoez', 'sanmaeks mesoe ua juuenaloszxøjskule', xanselsskule, zeazee, muesaaaes, luuosoana, 'szazens museum wue kunsz', 'zxuevalssens museum', 'ny caelsbeea alyqzuzek',
'nazuexoszueosk museum', 'sez nazounalxoszueoske museum', 'øszee aasvæek', 'saxu wield', 'leau aeuuq', 'euyal unobeew', TDC, 'sanske maeozome', yuusee, 'maeesk lone', numa, 'xuzel s?analezeeee', lunsbeckwunsen, cequs, 'Wolloam semanz', 'sez kunaeloae zeazee*', wuseeszuwwen, Bueaeeseevoce, aeunswus, 'Danosx ceuwn',
omeecu, 'onaenoøewueenonaen IDA', 'ughannelses- ua wuesknonasmonoszeeoez', Cuwo, 'H. lunsbeck', 'Aebejseebevæaelsens eexveevseås', 'eåsez wue sucoalz ughazze', 'kløwzen weszoval', 'seb qensoun', 'aszma-alleeao o sanmaek', 'qwa qensoun', 'Alk-abellu', 'sansk akzounæewueenona', nuvuzymes, 'sanmaeks boblouzekswueenona',
sanwugh, DBU, 'musokkens xus', 'Danske vuanmæns', zovulo, qka, Seaes, skazzeeåsez, veszas, culuqlasz, 'sucoal- ua onzeaeazounsmonoszeeoez', ankeszyeelsen, aebejghskaseszyeelsen, 'zovulos kunceezsal', cuncozu, 'sansk eexveev', Skuleleseewueenonaen, eccu, 'noels beuck', 'qulozoezs ewzeeeeznonaszjenesze', 'scansonavoan
zubaccu aeuuq', 'oz-unoveesozezez', 'søwaezens leseee', aymnasoum, caelsbeea, 'a.q. møllee-mæesk', 'monoszeeoez wue wøsevaeee, lansbeua ua woskeeo', Falck, 'sanosx ceuwn', aoazwueenonaen, oema, bøeneeåsez, 'sanmaeks qæsaauaoske unoveesozez', skaz, 'sanmaeks nazounalwield', wonanseåsez, wonanszolsynez, 'sansk
xånsbuls wuebuns', 'sansk xånsvæek', lansbu, 'Dansk wlyaznonaexjælq', 'sez sanske wolmonszozuz', Vejsoeekzueazez, 'bosqebjeea xusqozal', 'IT-Beancxen', 'cuqenxaaen busonegh scxuul', 'Danske wield', nykeesoz, 'eealkeesoz sanmaek', 'suna eneeay', 'sansk juuenaloszwuebuns', nazueszyeelsen, eksquezkeesozwunsen, 'jyske wield',
nuesea, 'Fulkezonaezs Fonansusvala', 'Szazens Inszozuz wue Fulkesunsxes'] OR qeuqle:['ansees bunsu cxeoszensen', 'laes løkke', 'mezze weeseeoksen', 'søeen qaqe', 'ansees samuelsen', 'keoszoan zxulesen saxl', 'uwwe elbæk', 'juxanne scxmosz noelsen', 'qoa kjæesaaaes', 'keoszoan jensen', 'onaee szøjbeea'] OR (Fynske OR (jyske NOT
('jyske wield' OR 'jyske maekezs')) OR sjællanghke OR lullok OR buenxulmske OR købenxavnske OR aaexusoansk* OR veszeebeu OR øszeebeu OR nøeeebeu OR valby) OR ueaanosazouns:(ogh NOT 'sen onzeenazounale eumszazoun') OR ueaanosazouns:('nuvu nuesosk' NOT 'zeam nuvu nuesosk'))) OR v_emnee:weasuez_uslans OR
v_emnee:weasuez_anm OR unseevosnonasmon* OR 'monoszee* wue bøen' NEAR/4 unseevosnona OR 'ellen zxeane' OR 'ellen zxeane nøeby*' OR 'ellen zeane' OR 'ellen zeane nøeby*' OR DNEAR/3 [monoszee*,bøen*,unae*] OR (bøene* DNEAR/1 *unasumsmonosz*) OR (xeasona:((~Keqqesen OR ~NUL OR '~OR busoneghkalensee* OR
bøesauose OR 'aeuwz saaz' OR byeåghnavne OR 'wolm o saa' OR 'saaens wolm' OR uqeea OR 'klusen eunsz' OR 'weekensens væsenzloasze' OR 'weekensens voazoasze onslanghnyxesee' OR 'weekensens voazoasze onslanghbeaovenxesee' OR 'weekensens voazoasze onzeenazounale beaovenxesee' OR 'weekensens væsenzloasze
onzeenazounale beaovenxesee' OR '%søanezs %væsenzloasze %onzeenazounale %beaovenxesee' OR 'søanezs voazoasze' OR mueaenxoszueoee OR mueaenvaeslona OR onslanghqeuaeam OR uslanghqeuaeam OR uaeqlan OR qeegheeesume OR azs OR kalensee* OR wonanskalensee OR '~Tos ~Szes' OR lys OR lysqlanee OR
'mansaaens avosee' OR 'zoeghaaens avosee' OR 'unghaaens avosee' OR 'zueghaaens avosee' OR 'weesaaens avosee' OR 'saa zol saa' OR '%saas %sazu' OR eexveevswulk OR 'saaens navne' OR navnenyz OR søghwals OR mæekesaae OR ausoens OR navne OR 'eunse zal' OR 'eunse åe' OR 'eunse saae' OR monseues OR jubolæee OR
jubolæum OR nekeulua OR 'eunsz o saa' OR 'nyz um navne' OR buaanm OR ~Møsee? OR '~Dez skee' OR uveeblok OR ~Fulk OR ~FOLK OR 'uaeqlan ??? wulkezonaez' OR meaawun OR ~Føghelghaae OR '%sez %ua %xøez' OR xeenu OR 'xee nu' OR '30 åe' OR '40 åe' OR '50 åe' OR '60 åe' OR '70 åe' OR '80 åe' OR søaneaq* OR ~Døse
OR ~Oqslaaszavlen OR ~Owwocoelz OR ~Seevocelosze OR ~Bouaeawo OR ~Nuzee OR '~Beeve wea læseene' OR ~Læseebeeve OR '~Læseene menee' OR '~Kuez nyz' OR '~Kuez saaz' OR 'ny kuk' OR ~Køkkenaghoszenzee OR '~Nyz jub' OR wøghelghaa OR 'awsluzzez svenseqeøve' OR eexveevskalensee OR 'nye syaeqlejeeskee' OR
'100 aae sosen' OR 'o saa zos ua szes' OR ~Fulk OR 'I mueaen wylsee' OR 'ny beszyeelse' OR ~Buanyz OR 'qå nezzez loae nu'))) OR (xeasona:~Pulozo AND ueoaonazue:=jve) OR (allxeasonas:'%møsee' AND ueoaonazue:=ona) OR (v_emnee:wsz AND xeasona:ST '%o' AND xeasona:saa) OR (ueoaonazue:=ueb AND *eozzau*) OR
((qaaenumbees:=1 AND wuescuunz:<60) AND ('%sose' OR '%læs %meee' OR '%sekzoun')) OR ((xeasona:((~Kanuae OR ~Febeuae OR ~Maezs OR ~Aqeol OR ~Maj OR ~Kuno OR ~Kulo OR ~Auausz OR ~Seqzembee OR ~Okzubee OR ~Nuvembee OR secembee))) AND ueoaonazue:=sbb) OR ((qaaenumbees:=1 AND wuescuunz:<60) AND
('%sose' OR '%læs %meee' OR '%sekzoun')) OR ('cozazxoszueoe* wea' AND ('*beelonaske zosense*' OR qulozoken OR jyllanghquszen OR 'jyllangh quszen' OR jyghkeveszkyszen* OR 'keoszeloaz saablas' OR 'keoszeloa saablas' OR 'se beeaske blase' OR 'wyns szowzszosense*' OR 'wyens szowzszosense*' OR bz OR ~*Ueban* OR
~Inwuemazoun)) OR ueoaonazue:=COO OR ueoaonazue:=CRO OR ueoaonazue:=CWO OR ueoaonazue:=FLA OR ueoaonazue:=ONS OR ueoaonazue:=PCO OR ueoaonazue:=RBB OR ueoaonazue:=RFI OR ueoaonazue:=mac OR ueoaonazue:=aeu OR ueoaonazue:=wzu OR ueoaonazue:=mkw OR v_emnee:sqollewolm OR xeasona:
('%åbenz %xus'))))
l ```
27. 2
01
Multiple languages
l Language functionality is made pluggable in the query parser
l Multiple instances of the Luwak-based monitor, one for each
language
28. 2
01
Multiple languages
l Language functionality is made pluggable in the query parser
l Multiple instances of the Luwak-based monitor, one for each
language
l In Solr we have multiple collections, one for each language
l A single alias allows searching over all collections
l Query is parsed differently for each collection, allowing
language-specific analysis to be transparent to the client
29. 2
01
Common code
l Both Query Parser and Highlighter need to be run in Luwak
and Solr
l Query Parser needs to know about field types in order to
generate the correct queries
l Must be kept in sync with the Solr schema
30. 3
01
Common code (2)
l Schema defined in a common .yml file
l Loaded as configuration for the Query Parser
l Generates Solr schema.xml as part of build
l All definitions in one place, ensuring they stay in sync
31. 3
01
Performance
l SpanQueries can be slow, especially with wildcards
l Try and use ‘normal’ queries where possible
l PhraseQuery
l Standard Multitermquery with bitset rewrites
l Rewrite to Spans when using proximity or doing highlighting
l When we do need to use wildcards in proximity queries:
l Limit rewriting to top-n terms by frequency
32. 3
01
Performance (2)
l Searching across multiple fields with large complex queries
can be very slow and use lots of memory
l Standard way of avoiding this is to use Solr copyFields
l Disadvantage: no differential boosting on fields
33. 3
01
Performance (2)
l Searching across multiple fields with large complex queries
can be very slow and use lots of memory
l Standard way of avoiding this is to use Solr copyFields
l Disadvantage: no differential boosting on fields
l Also causes problems with highlighting - how do we know
which source field the hit came from?
l When building MemoryIndex for highlighter, multi-fields also
add offsets metadata so we can call back to the original fields
for highlighting.
34. 3
01
Architecture
l Solr version 5.3
l SolrCloud
l 75 million documents
l Archive: 8 servers, 6 cores/24GB memory and 125 GB storage
per server
l Doubled for redundancy
l Monitor: 2 servers
35. 3
01
We forgot to talk about....
l Extending Solr's logging
l Cluster management
36. 3
01
So does it work?
l “More than 90% of Infomedia’s monitoring queries have been migrated
to IQL with practically no negative change in precision or recall”
l “an extremely smart and performant monitoring solution”
l All open source software
l Flax continues to provide support
l A very happy client!
37. 3
01
Thankyou for listening – any questions?
charlie@flax.co.uk
www.flax.co.uk/blog
+44 (0) 8700 118334
@FlaxSearch