3. - Ted Sullivan, PhD
“(old Phuddy Duddy)”
“Senior (very much so I’m afraid)
Solutions (I hope)
Architect (and sometime plumber)”
4. - Ted Sullivan
When is my search app done?
“How do you get there grasshopper? Add semantic
intelligence to the engine!”
5. In his own words...
For the past 15 or so years now I have been building search applications, first with Verity K2 for a project
with a publishing company H.W. Wilson, then with most of the vendor products in the search space,
Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and
develop from an interesting little search engine to a major force in the search technology business. Before
that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was
working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in
Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still
developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's
book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I
am struggling up the AngularJS learning curve.
Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math
games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more
precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in
St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as
much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler).
Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in
1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that
I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll
settle for Solr Evangelist.
I'll settle for Solr Evangelist.
7. Random Rants from the
Search Curmudgeon
• https://lucidworks.com/2015/03/09/random-
rants-search-curmudgeon/
• Search vs. Information Access
8. Data Science for
Dummies
• https://lucidworks.com/2016/09/06/data-
science-for-dummies/
• "A conditional probability is like the probability
that you are a moron if you text while driving
(pretty high it turns out – and would be a good
source of Darwin awards except for the innocent
people that also suffer from this lunacy.)"
9. The Twilight of the Vengine Gods
(Die Göttervenginedämmerung) or
Die Hard with A Vengines!!!
• https://lucidworks.com/2016/10/18/the-
twilight-of-the-vengine-gods-die-
gottervenginedammerung/
• "The Curmudgeon doesn’t dispense news, he just
tells you what information, new or old sucks or
what pisses him off and then rants about it. "
10. Where did all the
Librarians go?
• https://lucidworks.com/2017/11/21/where-did-
all-the-librarians-go/
• "You’ve probably gotten tired of me by now, that’s
OK because I’m tired of me too."
11. Search Legacy
• Blogs: as Search Curmudgeon and himself
• Lucidworks: heavy duty implementations
• Techniques: autophrasing and query autofiltering
• Presentations: Revolutions and inaugural Haystack
12. Automatic Phrase Tokenization:
Improving Lucene Search Precision
by More Precise Linguistic Analysis
• https://lucidworks.com/2014/07/02/automatic-
phrase-tokenization-improving-lucene-search-
precision-by-more-precise-linguistic-analysis/
• Takeaway: moving from bag of words towards bag
of things
13. Solution for Multi-term Synonyms in
Lucene/Solr Using the Auto
Phrasing TokenFilter
• https://lucidworks.com/2014/07/12/solution-for-
multi-term-synonyms-in-lucenesolr-using-the-auto-
phrasing-tokenfilter/
• LUCENE-2605 & Friends resolved over two years
later
• split on whitespace = false
15. The Well Tempered Search
Application – Fugue
• https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/
• autophrasing
• "red sofa" problem
• Takeaway: ahead of its time (evolving into Solr Text Tagger and query
rewriting)
• "seed crystals of knowledge": SME tagging
16. Introducing Query
Autofiltering
• https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
• "autotagging of the incoming query where the knowledge source is the
search index itself"
• we already have the information that we need to “do the right thing”
we just don’t use it
• "Another approach that was suggested by Erik Hatcher, is to have a
separate collection that is specialized as a knowledge store and query it to
get the categories with which to autofilter on the content collection."
• The key is that in both cases, we are using the search index itself as a
knowledge source that we can use for intelligent query introspection
and thus powerful inferential search!!
17. Thoughts on
“Search vs. Discovery”
• https://lucidworks.com/2015/03/02/thoughts-search-
vs-discovery/
• "findability", facets, aboutness, relatedness
• "However if a document is not appropriately tagged, it
may become invisible..."; Data quality really matters here!
• Auto classification and manual subject matter expert
tagging
• Visualization, search driven analytics
18. Query Autofiltering Revisited
– Lets be more precise!!!
• https://lucidworks.com/2015/05/13/query-autofiltering-
revisited-can-precise/
• "blue red lion socks"
19. Query Autofiltering Extended –
On Language and Logic in Search
• https://lucidworks.com/2015/06/06/query-
autofiltering-extended-language-logic-search/
• If you've got metadata, use (autofilter) it. If you've
got known multi-word phrases, use them.
• Language usage understanding of AND vs. OR
20. Focusing on Search Quality at
Lucene/Solr Revolution 2015
• https://lucidworks.com/2015/10/19/focusing-on-
search-quality-at-lucenesolr-revolution-2015/
• "Again, the “knowledge base” ... can be the Solr/
Lucene index itself!"
• “On-The-Fly Predictive Analytics” – as we say in
the search quality biz – its ALL about context!
21. Query Autofiltering IV:
A Novel Approach to NLP
• https://lucidworks.com/2015/11/19/query-
autofiltering-chapter-4-a-novel-approach-to-
natural-language-processing/
• Verbs
• Bob Dylan cover tunes
• Query Introspection: inferring user intent
• POS mapped to query fields
22. Pivoting to the Query: Using Pivot
Facets to build a Multi-Field
Suggester
• https://lucidworks.com/2016/08/12/pivoting-to-the-
query-using-pivot-facets-to-build-a-multi-field-suggester/
• Pivot facets: "Think of it as a way of generating a facet
value “taxonomy” – on the fly."
• Facet Phrases
• Once we commit to building a special Solr collection (also
known as a ‘sidecar’ collection) just for typeahead, there
are other powerful search features that we now have to
work with. One of them is contextual metadata. [!!!]
23. Building a Subject Classifier using
Automatically Discovered Keyword
Clusters, Part I
• https://lucidworks.com/2017/02/28/building-a-
subject-classifier-using-automatically-discovered-
keyword-clusters-part-i/
• subject classifier that uses automatically discovered
key term “clusters” that can then be used to classify
documents
• autophrasing + /terms....
• blah blah relatedness(...) blah blah
24. Why Facets are Even More
Fascinating than you Might Have
Thought
• https://lucidworks.com/2017/09/22/why-facets-are-even-more-
fascinating-than-you-might-have-thought/
• Context matters!
• Spatial metaphor: N-Dimensional hyperspace
• "Paul McCartney" => "John Lennon"
• contextual usage of first result to boost second
• Facets and UI
• This is “surfin’ the meta-informational universe” that is your Solr collection.
• The Facet Theorem
25. When Worlds Collide – Artificial
Intelligence Meets Search
• https://lucidworks.com/2018/04/30/when-worlds-collide-artificial-
intelligence-meets-search/
• The Search Loop: questions, answers, then more questions
• Inferring User Intent: NLP, POS, head-tail analysis, directed pattern-
based
• Information Spaces: conceptually near
• Knowledge Spaces and Semantic Reference Frames
• Word Embedded Vectors
• Knowledge Graphs: taxonomies and ontologies
27. “the Curmudgeon doesn’t dispense
news, he just tells you what
information, new or old sucks or
what pisses him off and then rants
about it. ”
28. “You may be thinking – "Who’s this
Search Curmudgeon guy? He’s a real
jerk". No argument there.”
29. “hey IT guys – Buy More Memory for
chrissake! Thanks to Moore’s Law it’s
pretty cheap now so don’t be such a
tight-ass”
30. “And the role of DBA will likely be
staffed by curmudgeons like me – so
be nice to them – they can save your
ass. We’ve seen our share of techno
cliff jumpers – it doesn’t end well.”
31. “what we old guys know is that some
of the hot things that you whiz kids
are doing now were done before, i.e.,
`back in the day`. ”
32. “You are not as smart as you think
you are kiddies – dual quad core, 3
GHz CPUs and 512 GB of RAM can
hide lots of coding sins. ”
33. “When I was your age sonny, we had
to walk three miles through snow to
submit our box of punch cards … talk
about crappy BAUD rates!)”
34. “....because in my opinion (notice that
I didn’t say ‘humble’ because that is
one thing that the Curmudgeon is
definitely NOT)...”
35. “I’m a humanist believe it or not – I
like humans even if they don’t like
me sometimes – I EARNED my
nickname of ‘curmudgeon’ you
know.”
36. “proper care and feeding of these
"analysis chains" can make you
some serious money – especially you
eCommerce guys”
37. “You’ve probably gotten tired of me
by now, that’s OK because I’m tired
of me too. Believe me, you don’t have
to live with me – I do.”
38. Ted on...
• IDOL: "should really be spelled IDLE"
• Fast vs. Solr: "One is named Fast, the other actually is fast"
• Endeca: "what took several hours in Endeca indexed in
about 10 minutes in Solr"
• elidedsearch: "The name of the company is like the material
that is used to hold up my Jockey Shorts (hint, hint)", Fruit-
of-the-Loom Finders, Tightie Whitie Quest, RubberBand
Finders, Brain Splitters, BungeeSeek
40. Ted's Big Adventure
• Semantics: bag of things, not bag of words
• synonyms, autophrasing, lemmatization
• "in text search – semantics matter"
• Linguistics: noun phrases, POS, NLP
• Facets
• autofiltering
• The Facet Theorem
• Relatedness
• Knowledge Space, Semantic Reference Frames
• Context matters
41. The Facet Theorem
• Lemma 1: Similar things tend to occur in similar
contexts
• Lemma 2: Facets are a tool for exploring meta-
informational contexts
•it therefore follows that:
• Theorem: Facets can be used to find similar things.