SlideShare una empresa de Scribd logo
1 de 48
Descargar para leer sin conexión
Ted Talk
Ted Sullivan
(Well Before Back in the Day - 2018)
- Ted Sullivan, PhD
“(old Phuddy Duddy)”
“Senior (very much so I’m afraid)

Solutions (I hope)

Architect (and sometime plumber)”
- Ted Sullivan
When is my search app done?

“How do you get there grasshopper? Add semantic
intelligence to the engine!”
In his own words...
For the past 15 or so years now I have been building search applications, first with Verity K2 for a project
with a publishing company H.W. Wilson, then with most of the vendor products in the search space,
Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and
develop from an interesting little search engine to a major force in the search technology business. Before
that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was
working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in
Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still
developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's
book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I
am struggling up the AngularJS learning curve.
Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math
games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more
precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in
St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as
much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler).
Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in
1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that
I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll
settle for Solr Evangelist.
I'll settle for Solr Evangelist.
The Search Curmudgeon
• Learned
• Wise
• Pragmatic
• Caring
Random Rants from the
Search Curmudgeon
• https://lucidworks.com/2015/03/09/random-
rants-search-curmudgeon/
• Search vs. Information Access
Data Science for
Dummies
• https://lucidworks.com/2016/09/06/data-
science-for-dummies/
• "A conditional probability is like the probability
that you are a moron if you text while driving
(pretty high it turns out – and would be a good
source of Darwin awards except for the innocent
people that also suffer from this lunacy.)"
The Twilight of the Vengine Gods
(Die Göttervenginedämmerung) or
Die Hard with A Vengines!!!
•  https://lucidworks.com/2016/10/18/the-
twilight-of-the-vengine-gods-die-
gottervenginedammerung/
• "The Curmudgeon doesn’t dispense news, he just
tells you what information, new or old sucks or
what pisses him off and then rants about it. "
Where did all the
Librarians go?
• https://lucidworks.com/2017/11/21/where-did-
all-the-librarians-go/
• "You’ve probably gotten tired of me by now, that’s
OK because I’m tired of me too."
Search Legacy
• Blogs: as Search Curmudgeon and himself
• Lucidworks: heavy duty implementations
• Techniques: autophrasing and query autofiltering
• Presentations: Revolutions and inaugural Haystack
Automatic Phrase Tokenization:
Improving Lucene Search Precision
by More Precise Linguistic Analysis
• https://lucidworks.com/2014/07/02/automatic-
phrase-tokenization-improving-lucene-search-
precision-by-more-precise-linguistic-analysis/
• Takeaway: moving from bag of words towards bag
of things
Solution for Multi-term Synonyms in
Lucene/Solr Using the Auto
Phrasing TokenFilter
• https://lucidworks.com/2014/07/12/solution-for-
multi-term-synonyms-in-lucenesolr-using-the-auto-
phrasing-tokenfilter/
• LUCENE-2605 & Friends resolved over two years
later
• split on whitespace = false
The Well Tempered Search
Application – Prelude
• https://lucidworks.com/2015/01/27/well-tempered-search-application-
prelude/
• Semantic Search, linguistics, context
• Best Bets (landing pages / rules)
• Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning
/ classification, NLP/AI
The Well Tempered Search
Application – Fugue
• https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/
• autophrasing
• "red sofa" problem
• Takeaway: ahead of its time (evolving into Solr Text Tagger and query
rewriting)
• "seed crystals of knowledge": SME tagging
Introducing Query
Autofiltering
• https://lucidworks.com/2015/02/17/introducing-query-autofiltering/
• "autotagging of the incoming query where the knowledge source is the
search index itself"
• we already have the information that we need to “do the right thing”
we just don’t use it
• "Another approach that was suggested by Erik Hatcher, is to have a
separate collection that is specialized as a knowledge store and query it to
get the categories with which to autofilter on the content collection."
• The key is that in both cases, we are using the search index itself as a
knowledge source that we can use for intelligent query introspection
and thus powerful inferential search!!
Thoughts on 

“Search vs. Discovery”
• https://lucidworks.com/2015/03/02/thoughts-search-
vs-discovery/
• "findability", facets, aboutness, relatedness
• "However if a document is not appropriately tagged, it
may become invisible..."; Data quality really matters here!
• Auto classification and manual subject matter expert
tagging
• Visualization, search driven analytics
Query Autofiltering Revisited
– Lets be more precise!!!
• https://lucidworks.com/2015/05/13/query-autofiltering-
revisited-can-precise/
• "blue red lion socks"
Query Autofiltering Extended –
On Language and Logic in Search
• https://lucidworks.com/2015/06/06/query-
autofiltering-extended-language-logic-search/
• If you've got metadata, use (autofilter) it. If you've
got known multi-word phrases, use them.
• Language usage understanding of AND vs. OR
Focusing on Search Quality at
Lucene/Solr Revolution 2015
• https://lucidworks.com/2015/10/19/focusing-on-
search-quality-at-lucenesolr-revolution-2015/
• "Again, the “knowledge base” ... can be the Solr/
Lucene index itself!"
• “On-The-Fly Predictive Analytics” – as we say in
the search quality biz – its ALL about context!
Query Autofiltering IV:
A Novel Approach to NLP
• https://lucidworks.com/2015/11/19/query-
autofiltering-chapter-4-a-novel-approach-to-
natural-language-processing/
• Verbs
• Bob Dylan cover tunes
• Query Introspection: inferring user intent
• POS mapped to query fields
Pivoting to the Query: Using Pivot
Facets to build a Multi-Field
Suggester
• https://lucidworks.com/2016/08/12/pivoting-to-the-
query-using-pivot-facets-to-build-a-multi-field-suggester/
• Pivot facets: "Think of it as a way of generating a facet
value “taxonomy” – on the fly."
• Facet Phrases
• Once we commit to building a special Solr collection (also
known as a ‘sidecar’ collection) just for typeahead, there
are other powerful search features that we now have to
work with. One of them is contextual metadata. [!!!]
Building a Subject Classifier using
Automatically Discovered Keyword
Clusters, Part I
• https://lucidworks.com/2017/02/28/building-a-
subject-classifier-using-automatically-discovered-
keyword-clusters-part-i/
• subject classifier that uses automatically discovered
key term “clusters” that can then be used to classify
documents
• autophrasing + /terms....
• blah blah relatedness(...) blah blah
Why Facets are Even More
Fascinating than you Might Have
Thought
• https://lucidworks.com/2017/09/22/why-facets-are-even-more-
fascinating-than-you-might-have-thought/
• Context matters!
• Spatial metaphor: N-Dimensional hyperspace
• "Paul McCartney" => "John Lennon"
• contextual usage of first result to boost second
• Facets and UI
• This is “surfin’ the meta-informational universe” that is your Solr collection.
• The Facet Theorem
When Worlds Collide – Artificial
Intelligence Meets Search
• https://lucidworks.com/2018/04/30/when-worlds-collide-artificial-
intelligence-meets-search/
• The Search Loop: questions, answers, then more questions
• Inferring User Intent: NLP, POS, head-tail analysis, directed pattern-
based
• Information Spaces: conceptually near
• Knowledge Spaces and Semantic Reference Frames
• Word Embedded Vectors
• Knowledge Graphs: taxonomies and ontologies
-Ted says
“Sh*t...”
“the Curmudgeon doesn’t dispense
news, he just tells you what
information, new or old sucks or
what pisses him off and then rants
about it. ”
“You may be thinking – "Who’s this
Search Curmudgeon guy? He’s a real
jerk". No argument there.”
“hey IT guys – Buy More Memory for
chrissake! Thanks to Moore’s Law it’s
pretty cheap now so don’t be such a
tight-ass”
“And the role of DBA will likely be
staffed by curmudgeons like me – so
be nice to them – they can save your
ass. We’ve seen our share of techno
cliff jumpers – it doesn’t end well.”
“what we old guys know is that some
of the hot things that you whiz kids
are doing now were done before, i.e.,
`back in the day`. ”
“You are not as smart as you think
you are kiddies – dual quad core, 3
GHz CPUs and 512 GB of RAM can
hide lots of coding sins. ”
“When I was your age sonny, we had
to walk three miles through snow to
submit our box of punch cards … talk
about crappy BAUD rates!)”
“....because in my opinion (notice that
I didn’t say ‘humble’ because that is
one thing that the Curmudgeon is
definitely NOT)...”
“I’m a humanist believe it or not – I
like humans even if they don’t like
me sometimes – I EARNED my
nickname of ‘curmudgeon’ you
know.”
“proper care and feeding of these
"analysis chains" can make you
some serious money – especially you
eCommerce guys”
“You’ve probably gotten tired of me
by now, that’s OK because I’m tired
of me too. Believe me, you don’t have
to live with me – I do.”
Ted on...
• IDOL: "should really be spelled IDLE"
• Fast vs. Solr: "One is named Fast, the other actually is fast"
• Endeca: "what took several hours in Endeca indexed in
about 10 minutes in Solr"
• elidedsearch: "The name of the company is like the material
that is used to hold up my Jockey Shorts (hint, hint)", Fruit-
of-the-Loom Finders, Tightie Whitie Quest, RubberBand
Finders, Brain Splitters, BungeeSeek
-Search Curmudgeon
Big Data: 

“50 foot tall Brent Spiner”
Ted's Big Adventure
• Semantics: bag of things, not bag of words
• synonyms, autophrasing, lemmatization
• "in text search – semantics matter"
• Linguistics: noun phrases, POS, NLP
• Facets
• autofiltering
• The Facet Theorem
• Relatedness
• Knowledge Space, Semantic Reference Frames
• Context matters
The Facet Theorem
• Lemma 1: Similar things tend to occur in similar
contexts
• Lemma 2: Facets are a tool for exploring meta-
informational contexts
•it therefore follows that:
• Theorem: Facets can be used to find similar things.
PubTed
• https://github.com/lucidworks/
• auto-phrase-tokenfilter
• query-autofiltering-component (also SOLR-7539)
• https://github.com/detnavillus/
• multifield_suggester_code
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk
Ted Talk

Más contenido relacionado

La actualidad más candente

Internet101 Presentation
Internet101 PresentationInternet101 Presentation
Internet101 Presentationmacfam6
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012OSCON Byrum
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Uche Ogbuji
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Dr. Starr Hoffman
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538Krishna Sankar
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic WebGIS in the Rockies
 

La actualidad más candente (8)

Future of semantic apps
Future of semantic appsFuture of semantic apps
Future of semantic apps
 
Internet101 Presentation
Internet101 PresentationInternet101 Presentation
Internet101 Presentation
 
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012The Art of Social Media Analysis with Twitter & Python-OSCON 2012
The Art of Social Media Analysis with Twitter & Python-OSCON 2012
 
Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)Linked Data: The Real Web 2.0 (from 2008)
Linked Data: The Real Web 2.0 (from 2008)
 
Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)Development of the CyberCemetery (2011)
Development of the CyberCemetery (2011)
 
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
 
2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web2018 GIS in Development: Semantic Web
2018 GIS in Development: Semantic Web
 
Basics of Web Research for ELA 10
Basics of Web Research for ELA 10Basics of Web Research for ELA 10
Basics of Web Research for ELA 10
 

Similar a Ted Talk

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Roi Blanco
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesDorothea Salo
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012Laksamee Putnam
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkKrishna Sankar
 
Looking into the future with web media analytics marshall sponder - montreal...
Looking into the future with web media analytics  marshall sponder - montreal...Looking into the future with web media analytics  marshall sponder - montreal...
Looking into the future with web media analytics marshall sponder - montreal...Marshall Sponder
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinAyon Sinha
 
Connecting the Dots
Connecting the DotsConnecting the Dots
Connecting the DotsInnoTech
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchDavid Nzoputa Ofili
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialBarbara Starr
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2Laksamee Putnam
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsJon Voss
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationLorri Mon
 
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...Branded3
 
Lesson 2 network and the internet
Lesson 2 network and the internetLesson 2 network and the internet
Lesson 2 network and the internetMaria Theresa
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalRoi Blanco
 

Similar a Ted Talk (20)

Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations Beyond document retrieval using semantic annotations
Beyond document retrieval using semantic annotations
 
Preservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanitiesPreservation and institutional repositories for the digital arts and humanities
Preservation and institutional repositories for the digital arts and humanities
 
Intro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & MuseumsIntro to Linked Open Data in Libraries, Archives & Museums
Intro to Linked Open Data in Libraries, Archives & Museums
 
COSC 111 Research Fall 2012
COSC 111 Research Fall 2012COSC 111 Research Fall 2012
COSC 111 Research Fall 2012
 
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache SparkThe Hitchhiker's Guide to Machine Learning with Python & Apache Spark
The Hitchhiker's Guide to Machine Learning with Python & Apache Spark
 
Looking into the future with web media analytics marshall sponder - montreal...
Looking into the future with web media analytics  marshall sponder - montreal...Looking into the future with web media analytics  marshall sponder - montreal...
Looking into the future with web media analytics marshall sponder - montreal...
 
Ubiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil TwinUbiquitous Solr - A Database's not-so-evil Twin
Ubiquitous Solr - A Database's not-so-evil Twin
 
Can you Cope
Can you CopeCan you Cope
Can you Cope
 
Library Linked Data
Library Linked DataLibrary Linked Data
Library Linked Data
 
Taming Text
Taming TextTaming Text
Taming Text
 
Connecting the Dots
Connecting the DotsConnecting the Dots
Connecting the Dots
 
Information Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based ResearchInformation Discovery and Search Strategies for Evidence-Based Research
Information Discovery and Search Strategies for Evidence-Based Research
 
Semtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorialSemtech bizsemanticsearchtutorial
Semtech bizsemanticsearchtutorial
 
TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2TSEM Cooper Fall 2012 Session 2
TSEM Cooper Fall 2012 Session 2
 
Linked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & MuseumsLinked Open Data in Libraries, Archives & Museums
Linked Open Data in Libraries, Archives & Museums
 
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & EvaluationFSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
FSU SLIS InfoSvcs Wk 3 - Web Search & Evaluation
 
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
SearchLeeds 2018 - Dawn Anderson - Power from what lies beneath ... The icebe...
 
Google Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEOGoogle Machine Learning Algorithms and SEO
Google Machine Learning Algorithms and SEO
 
Lesson 2 network and the internet
Lesson 2 network and the internetLesson 2 network and the internet
Lesson 2 network and the internet
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 

Más de Erik Hatcher

Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Erik Hatcher
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksErik Hatcher
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered LibrariesErik Hatcher
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - ChicagoErik Hatcher
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and TricksErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to SolrErik Hatcher
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with SolrErik Hatcher
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr DevelopersErik Hatcher
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0Erik Hatcher
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development TutorialErik Hatcher
 

Más de Erik Hatcher (20)

Solr Payloads
Solr PayloadsSolr Payloads
Solr Payloads
 
it's just search
it's just searchit's just search
it's just search
 
Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)Lucene's Latest (for Libraries)
Lucene's Latest (for Libraries)
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
Solr Powered Libraries
Solr Powered LibrariesSolr Powered Libraries
Solr Powered Libraries
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago"Solr Update" at code4lib '13 - Chicago
"Solr Update" at code4lib '13 - Chicago
 
Query Parsing - Tips and Tricks
Query Parsing - Tips and TricksQuery Parsing - Tips and Tricks
Query Parsing - Tips and Tricks
 
Solr 4
Solr 4Solr 4
Solr 4
 
Solr Recipes
Solr RecipesSolr Recipes
Solr Recipes
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Solr Flair
Solr FlairSolr Flair
Solr Flair
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
Introduction to Solr
Introduction to SolrIntroduction to Solr
Introduction to Solr
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Lucene for Solr Developers
Lucene for Solr DevelopersLucene for Solr Developers
Lucene for Solr Developers
 
What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0What's New in Solr 3.x / 4.0
What's New in Solr 3.x / 4.0
 
Solr Application Development Tutorial
Solr Application Development TutorialSolr Application Development Tutorial
Solr Application Development Tutorial
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 

Ted Talk

  • 2. Ted Sullivan (Well Before Back in the Day - 2018)
  • 3. - Ted Sullivan, PhD “(old Phuddy Duddy)” “Senior (very much so I’m afraid)
 Solutions (I hope)
 Architect (and sometime plumber)”
  • 4. - Ted Sullivan When is my search app done?
 “How do you get there grasshopper? Add semantic intelligence to the engine!”
  • 5. In his own words... For the past 15 or so years now I have been building search applications, first with Verity K2 for a project with a publishing company H.W. Wilson, then with most of the vendor products in the search space, Ultraseek, Fast, Autonomy, Endeca, Vivissimo, MarkLogic and Exalead. I watched Lucene grow and develop from an interesting little search engine to a major force in the search technology business. Before that, I was building collaborative battlefield planning applications for the U.S. Army and before that I was working on Internet stuff back in the dawn of the Web (well almost - 1994). I have been programming in Java since 1995 and professionally since 1996 or so. I was learning JavaScript when Netscape was still developing it, but only recently have begun to truly understand its power! John Resig and Bear Bibeault's book "Secrets of the JavaScript Ninja" is a must read for anyone that wants to follow this path. Currently, I am struggling up the AngularJS learning curve. Before my work in the web with my friend Jim Spatz at Spatz Computer Graphics, I published some Math games for kids on the original Mac OS, and before that, I did science - Auditory Neuroscience to be more precise. I studied the auditory system of 'fly-by-night' critters, bats and owls first at Washington University in St Louis, then at Caltech and Princeton. I was pretty good at Science but didn't like the writing part as much as I should have. I had much more fun writing code (C, FORTRAN and PDP 8/11 assembler). Currently, I am enjoying becoming part of the Open Source Revolution working at Lucidworks. Back in 1995 when Linux came out, I had a bet with my boss Jim Spatz about its future - I'm happy to say now that I lost that bet. I would aspire to be an Open Source evangelist but there are enough of those already. I'll settle for Solr Evangelist. I'll settle for Solr Evangelist.
  • 6. The Search Curmudgeon • Learned • Wise • Pragmatic • Caring
  • 7. Random Rants from the Search Curmudgeon • https://lucidworks.com/2015/03/09/random- rants-search-curmudgeon/ • Search vs. Information Access
  • 8. Data Science for Dummies • https://lucidworks.com/2016/09/06/data- science-for-dummies/ • "A conditional probability is like the probability that you are a moron if you text while driving (pretty high it turns out – and would be a good source of Darwin awards except for the innocent people that also suffer from this lunacy.)"
  • 9. The Twilight of the Vengine Gods (Die Göttervenginedämmerung) or Die Hard with A Vengines!!! •  https://lucidworks.com/2016/10/18/the- twilight-of-the-vengine-gods-die- gottervenginedammerung/ • "The Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. "
  • 10. Where did all the Librarians go? • https://lucidworks.com/2017/11/21/where-did- all-the-librarians-go/ • "You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too."
  • 11. Search Legacy • Blogs: as Search Curmudgeon and himself • Lucidworks: heavy duty implementations • Techniques: autophrasing and query autofiltering • Presentations: Revolutions and inaugural Haystack
  • 12. Automatic Phrase Tokenization: Improving Lucene Search Precision by More Precise Linguistic Analysis • https://lucidworks.com/2014/07/02/automatic- phrase-tokenization-improving-lucene-search- precision-by-more-precise-linguistic-analysis/ • Takeaway: moving from bag of words towards bag of things
  • 13. Solution for Multi-term Synonyms in Lucene/Solr Using the Auto Phrasing TokenFilter • https://lucidworks.com/2014/07/12/solution-for- multi-term-synonyms-in-lucenesolr-using-the-auto- phrasing-tokenfilter/ • LUCENE-2605 & Friends resolved over two years later • split on whitespace = false
  • 14. The Well Tempered Search Application – Prelude • https://lucidworks.com/2015/01/27/well-tempered-search-application- prelude/ • Semantic Search, linguistics, context • Best Bets (landing pages / rules) • Synonyms, stemming, lemmatization, taxonomy, ontology, machine learning / classification, NLP/AI
  • 15. The Well Tempered Search Application – Fugue • https://lucidworks.com/2015/02/03/well-tempered-search-application-fugue/ • autophrasing • "red sofa" problem • Takeaway: ahead of its time (evolving into Solr Text Tagger and query rewriting) • "seed crystals of knowledge": SME tagging
  • 16. Introducing Query Autofiltering • https://lucidworks.com/2015/02/17/introducing-query-autofiltering/ • "autotagging of the incoming query where the knowledge source is the search index itself" • we already have the information that we need to “do the right thing” we just don’t use it • "Another approach that was suggested by Erik Hatcher, is to have a separate collection that is specialized as a knowledge store and query it to get the categories with which to autofilter on the content collection." • The key is that in both cases, we are using the search index itself as a knowledge source that we can use for intelligent query introspection and thus powerful inferential search!!
  • 17. Thoughts on 
 “Search vs. Discovery” • https://lucidworks.com/2015/03/02/thoughts-search- vs-discovery/ • "findability", facets, aboutness, relatedness • "However if a document is not appropriately tagged, it may become invisible..."; Data quality really matters here! • Auto classification and manual subject matter expert tagging • Visualization, search driven analytics
  • 18. Query Autofiltering Revisited – Lets be more precise!!! • https://lucidworks.com/2015/05/13/query-autofiltering- revisited-can-precise/ • "blue red lion socks"
  • 19. Query Autofiltering Extended – On Language and Logic in Search • https://lucidworks.com/2015/06/06/query- autofiltering-extended-language-logic-search/ • If you've got metadata, use (autofilter) it. If you've got known multi-word phrases, use them. • Language usage understanding of AND vs. OR
  • 20. Focusing on Search Quality at Lucene/Solr Revolution 2015 • https://lucidworks.com/2015/10/19/focusing-on- search-quality-at-lucenesolr-revolution-2015/ • "Again, the “knowledge base” ... can be the Solr/ Lucene index itself!" • “On-The-Fly Predictive Analytics” – as we say in the search quality biz – its ALL about context!
  • 21. Query Autofiltering IV: A Novel Approach to NLP • https://lucidworks.com/2015/11/19/query- autofiltering-chapter-4-a-novel-approach-to- natural-language-processing/ • Verbs • Bob Dylan cover tunes • Query Introspection: inferring user intent • POS mapped to query fields
  • 22. Pivoting to the Query: Using Pivot Facets to build a Multi-Field Suggester • https://lucidworks.com/2016/08/12/pivoting-to-the- query-using-pivot-facets-to-build-a-multi-field-suggester/ • Pivot facets: "Think of it as a way of generating a facet value “taxonomy” – on the fly." • Facet Phrases • Once we commit to building a special Solr collection (also known as a ‘sidecar’ collection) just for typeahead, there are other powerful search features that we now have to work with. One of them is contextual metadata. [!!!]
  • 23. Building a Subject Classifier using Automatically Discovered Keyword Clusters, Part I • https://lucidworks.com/2017/02/28/building-a- subject-classifier-using-automatically-discovered- keyword-clusters-part-i/ • subject classifier that uses automatically discovered key term “clusters” that can then be used to classify documents • autophrasing + /terms.... • blah blah relatedness(...) blah blah
  • 24. Why Facets are Even More Fascinating than you Might Have Thought • https://lucidworks.com/2017/09/22/why-facets-are-even-more- fascinating-than-you-might-have-thought/ • Context matters! • Spatial metaphor: N-Dimensional hyperspace • "Paul McCartney" => "John Lennon" • contextual usage of first result to boost second • Facets and UI • This is “surfin’ the meta-informational universe” that is your Solr collection. • The Facet Theorem
  • 25. When Worlds Collide – Artificial Intelligence Meets Search • https://lucidworks.com/2018/04/30/when-worlds-collide-artificial- intelligence-meets-search/ • The Search Loop: questions, answers, then more questions • Inferring User Intent: NLP, POS, head-tail analysis, directed pattern- based • Information Spaces: conceptually near • Knowledge Spaces and Semantic Reference Frames • Word Embedded Vectors • Knowledge Graphs: taxonomies and ontologies
  • 27. “the Curmudgeon doesn’t dispense news, he just tells you what information, new or old sucks or what pisses him off and then rants about it. ”
  • 28. “You may be thinking – "Who’s this Search Curmudgeon guy? He’s a real jerk". No argument there.”
  • 29. “hey IT guys – Buy More Memory for chrissake! Thanks to Moore’s Law it’s pretty cheap now so don’t be such a tight-ass”
  • 30. “And the role of DBA will likely be staffed by curmudgeons like me – so be nice to them – they can save your ass. We’ve seen our share of techno cliff jumpers – it doesn’t end well.”
  • 31. “what we old guys know is that some of the hot things that you whiz kids are doing now were done before, i.e., `back in the day`. ”
  • 32. “You are not as smart as you think you are kiddies – dual quad core, 3 GHz CPUs and 512 GB of RAM can hide lots of coding sins. ”
  • 33. “When I was your age sonny, we had to walk three miles through snow to submit our box of punch cards … talk about crappy BAUD rates!)”
  • 34. “....because in my opinion (notice that I didn’t say ‘humble’ because that is one thing that the Curmudgeon is definitely NOT)...”
  • 35. “I’m a humanist believe it or not – I like humans even if they don’t like me sometimes – I EARNED my nickname of ‘curmudgeon’ you know.”
  • 36. “proper care and feeding of these "analysis chains" can make you some serious money – especially you eCommerce guys”
  • 37. “You’ve probably gotten tired of me by now, that’s OK because I’m tired of me too. Believe me, you don’t have to live with me – I do.”
  • 38. Ted on... • IDOL: "should really be spelled IDLE" • Fast vs. Solr: "One is named Fast, the other actually is fast" • Endeca: "what took several hours in Endeca indexed in about 10 minutes in Solr" • elidedsearch: "The name of the company is like the material that is used to hold up my Jockey Shorts (hint, hint)", Fruit- of-the-Loom Finders, Tightie Whitie Quest, RubberBand Finders, Brain Splitters, BungeeSeek
  • 39. -Search Curmudgeon Big Data: 
 “50 foot tall Brent Spiner”
  • 40. Ted's Big Adventure • Semantics: bag of things, not bag of words • synonyms, autophrasing, lemmatization • "in text search – semantics matter" • Linguistics: noun phrases, POS, NLP • Facets • autofiltering • The Facet Theorem • Relatedness • Knowledge Space, Semantic Reference Frames • Context matters
  • 41. The Facet Theorem • Lemma 1: Similar things tend to occur in similar contexts • Lemma 2: Facets are a tool for exploring meta- informational contexts •it therefore follows that: • Theorem: Facets can be used to find similar things.
  • 42. PubTed • https://github.com/lucidworks/ • auto-phrase-tokenfilter • query-autofiltering-component (also SOLR-7539) • https://github.com/detnavillus/ • multifield_suggester_code