SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
Implementing Conceptual Search in Solr
Simon Hughes
Chief Data Scientist, Dice.com
3
•  Chief Data Scientist at Dice.com, under Yuri Bykov
•  Key Projects involving Solr:
Who Am I?
•  Recommender Systems – more jobs like this, more seekers like
this (uses custom Solr index)
•  Custom Dice Solr MLT handler (real-time recommendations)
•  Did you mean functionality
•  Title, skills and company type-ahead
•  Relevancy improvements in dice jobs search
4
•  Supply Demand Analysis
•  Dice Skills pages – http://www.dice.com/skills
Other Projects
PhD
•  PhD candidate at DePaul University, studying natural language processing and
machine learning
7
Q. What is the Most Common Relevancy Tuning Mistake?
8
Q. What is the Most Common Relevancy Tuning Mistake?
A. Ignoring the importance of RECALL
9
Relevancy Tuning
•  Key performance metrics to measure:
•  Precision
•  Recall
•  F1 Measure - 2*(P*R)/(P+R)
•  Precision is easier – correct mistakes in the top search results
•  Recall - need to know which relevant documents don’t come back
•  Hard to accurately measure
•  Need to know all the relevant documents present in the index
10
What is Conceptual Search?
•  A.K.A. Semantic Search
•  Two key challenges with keyword matching:
•  Polysemy: Words have more than one meaning
•  e.g. engineer – mechanical? programmer? automation engineer?
•  Synonymy: Many different words have the same meaning
•  e.g. QA, quality assurance, tester; VB, Visual Basic, VB.Net
•  Other related challenges -
•  Typos, Spelling Errors, Idioms
•  Conceptual search attempts to solve these problems by learning concepts
11
Why Conceptual Search?
•  We will attempt to improve recall without diminishing precision
•  Can match relevant documents containing none of the query terms
12
Concepts
•  Conceptual search allows us to retrieve documents by how similar the concepts
in the query are to the concepts in a document
•  Concepts represent important high-level ideas in a given domain (e.g. java
technologies, big data jobs, helpdesk support, etc)
•  Concepts are automatically learned from documents using machine learning
•  Words can belong to multiple concepts, with varying strengths of association
with each concept
13
Traditional Techniques
•  Many algorithms have been used for concept learning, include LSA (Latent
Semantic Analysis), LDA (Latent Dirichlet Allocation) and Word2Vec
•  All involve mapping a document to a low dimensional dense vector (an array
of numbers)
•  Each element of the vector is a number representing how well the document
represents that concept
•  E.g. LSA powers the similar skills found in dice’s skills pages
14
Traditional Techniques Don’t Scale
•  LSALSI, LDA and related techniques rely on factorization of very large term-
document matrices – very slow and computationally intensive
•  Require embedding a machine learning model with the search engine to
map new queries to the concept space (latent or topic space)
•  Query performance is very poor – unable to utilize the inverted index as all
documents have the same number of concepts
•  What we want is a way to map words not documents to concepts. Then we
can embed this in Solr via synonym filters and custom query parsers
15
Word2Vec and ‘Word Math’
•  Word2Vec was developed by google around 2013 for learning vector
representations for words, building on earlier work from Rumelhart, Hinton
and Williams in 1986 (see paper below for citation of this work)
•  Word2Vec Paper:
Efficient Estimation of Word Representations in Vector Space
•  It works by training a machine learning model to predict the words
surrounding a word in a sentence
•  Similar words get similar vector representations
16
“Word Math” Example
•  Using basic vector arithmetic, you get some interesting patterns
•  This illustrates how it represents relationships between words
•  E.g. man – king + woman = queen
17
The algorithm learns to
represent different types
of relationships between
words in vector form
18
At this point you may be thinking…
19
20
Why Do I Care? This is a Search Conference…
21
Why Do I Care? This is a Search Conference…
•  This algorithm can be used to represent documents as vectors of concepts
•  We can them use these representations to do conceptual search
•  This will surface many relevant documents missed by keyword matching
•  This boosts recall
•  This technique can also be used to automatically learn synonyms
22
A Quick Demo
Using our Dice.com active jobs index, some example common user queries:
•  Data Scientist
•  Big Data
•  Information Retrieval
•  C#
•  Web Developer
•  CTO
•  Project Manager
23
How?
GitHub- DiceTechJobs/ConceptualSearch:
1.  Pre-Process documents – parse html, strip noise characters, tokenize words
2.  Define important keywords for your domain, or use my code to auto extract top
terms and phrases
3.  Train Word2Vec model on documents to produce a word2vec model
4.  Using this model, either:
1.  Vectors: Use the raw vectors and embed them in Solr using synonyms + payloads
2.  Top N Similar: Or extract the top n similar terms with similarities and embed these as
weighted synonyms using my custom queryboost parser and tokenizer
3.  Clusters: Cluster these vectors by similarity, and map terms to clusters in a synonym file
24
Define Top Domain Specific Keywords
•  If you have a set of documents belonging to a specific domain, it is
important to define the important keywords for your domain:
•  Use top few thousand search keywords
•  Or use my fast keyword and phrase extraction tool (in GitHub)
•  Or use SolrLucene shingle filter to extract top 1 - 4 word sequences by
document frequency
•  Important to map common phrases to single tokens, e.g. data scientist =>
data_scientist, java developer=>java_developer
25
Do It Yourself
•  All code for this talk is now publicly available on GitHub:
•  https://github.com/DiceTechJobs/SolrPlugins - Solr plugins to work with
conceptual search, and other dice plugins, such as a custom MLT handler
•  https://github.com/DiceTechJobs/SolrConfigExamples - Examples of Solr
configuration file entries to enable conceptual search and other Dice
plugins:
•  https://github.com/DiceTechJobs/ConceptualSearch - Python code to
compute the Word2Vec word vectors, and generate Solr synonym files
26
Some Solr Tricks to Make this Happen
1.  Keyword Extraction: Use the synonym filter to extract key words from your
documents
27
28
Some Solr Tricks to Make this Happen
1.  Keyword Extraction: Use the synonym filter to extract key words from your
documents
2.  Synonym Expansion using Payloads:
•  Use the synonym filter to expand a keyword to multiple tokens
•  Each token has an associated payload – used to adjust relevancy scores at
index or query time
29
30
Synonym File Examples – Vector Method
•  Each keyword maps to a set of tokens via a synonym file
•  Vector Synonym file entry (5 element vector, usually100+ elements):
•  java developer=>001|0.4 002|0.1 003|0.5 005|.9
•  Uses a custom token filter that averages these vectors over the entire
document (see GitHub - DiceTechJobs/SolrPlugins)
•  Relatively fast at index time but some additional indexing overhead
•  Very slow to query
32
Synonym File Examples – Top N Method
•  Each keyword maps to a set of most similar keywords via a synonym file
•  Top N Synonym file entry (top 5):
•  java_developer=>java_j2ee_developer|0.907526 java_architect|0.889903
lead_java_developer|0.867594 j2ee_developer|0.864028 java_engineer|0.861407
•  Can configure this at index time with payloads, a payload aware query parser and a
payload similarity function
•  Or you can configure this at query time with a special token filter that converts
payloads into term boosts, along with a special parser (see
GitHub - DiceTechJobs/SolrPlugins)
•  Fast at index and query time if N is reasonable (10-30)
33
Searching over Clustered Terms
•  After we have learned word vectors, we can use a clustering algorithm to
cluster terms by their vectors to give clusters of related words
•  Can learn several different sizes of cluster, such as 500, 1000, 5000 clusters,
and map each of these to a separate field
•  Apply stronger boosts to the fields containing smaller clusters (e.g. the 5000
cluster field) using the edismax qf parameter - tighter clusters get more
weight
•  Code for clustering vectors in GitHub - DiceTechJobs/ConceptualSearch
34
Synonym File Examples – Clustering Method
•  Each keyword in a cluster maps the same artificial token for that cluster
•  Cluster Synonym file entries:
•  java=>cluster_171
•  java applications=>cluster_171
•  java coding=>cluster_171
•  java design=>cluster_171
•  Doesn’t use payloads so does not require any special plugins
•  No noticeable impact on query or indexing performance
35
Example Clusters Learned from Dice Job Postings
•  Note: Labels in bold are manually assigned for interpretability:
•  Natural Languages: bi lingual, bilingual, chinese, fluent, french, german, japanese,
korean, lingual, localized, portuguese, russian, spanish, speak, speaker
•  Apple Programming Languages: cocoa, swift
•  Search Engine Technologies: apache solr, elasticsearch, lucene, lucene solr,
search, search engines, search technologies, solr, solr lucene
•  Microsoft .Net Technologies: c# wcf, microsoft c#, microsoft.net, mvc web, wcf
web services, web forms, webforms, windows forms, winforms, wpf wcf
36
Example Clusters Learned from Dice Job Postings
AttentionAttitude:
attention, attentive, close attention, compromising, conscientious, conscious, customer
oriented, customer service focus, customer service oriented, deliver results, delivering results,
demonstrated commitment, dependability, dependable, detailed oriented, diligence,
diligent, do attitude, ethic, excellent follow, extremely detail oriented, good attention,
meticulous, meticulous attention, organized, orientated, outgoing, outstanding customer
service, pay attention, personality, pleasant, positive attitude, professional appearance,
professional attitude, professional demeanor, punctual, punctuality, self motivated, self
motivation, superb, superior, thoroughness
37
Summary
•  It’s easy to overlook recall when performing relevancy tuning
•  Conceptual search improves recall while maintaining high precision by matching
documents on concepts or ideas.
•  In reality this involves learning which terms are related to one another
•  Word2Vec is a scalable algorithm for learning related words from a set of documents, that
gives state of the art results in word analogy tasks
•  We can train a Word2Vec model offline, and embed it’s output into Solr by using the in-built
synonym filter and payload functionality, combined with some custom plugins

Más contenido relacionado

La actualidad más candente

Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --
Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --
Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --Jun Kurihara
 
Redmine 4.1 新機能評価ガイド <速報版>
Redmine 4.1 新機能評価ガイド <速報版>Redmine 4.1 新機能評価ガイド <速報版>
Redmine 4.1 新機能評価ガイド <速報版>Go Maeda
 
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5Takahiro YAMADA
 
かずきのUWP入門
かずきのUWP入門かずきのUWP入門
かずきのUWP入門一希 大田
 
Active directoryと認証・認可
Active directoryと認証・認可Active directoryと認証・認可
Active directoryと認証・認可Hiroki Kamata
 
SSIとDIDで何を解決したいのか?(β版)
SSIとDIDで何を解決したいのか?(β版)SSIとDIDで何を解決したいのか?(β版)
SSIとDIDで何を解決したいのか?(β版)Naohiro Fujie
 
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介 【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介 日本マイクロソフト株式会社
 
Uwpアプリケーション開発入門
Uwpアプリケーション開発入門Uwpアプリケーション開発入門
Uwpアプリケーション開発入門Makoto Nishimura
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsGlobalLogic Ukraine
 
Node-RED TIPS:functionノード間で関数を共有する方法
Node-RED TIPS:functionノード間で関数を共有する方法Node-RED TIPS:functionノード間で関数を共有する方法
Node-RED TIPS:functionノード間で関数を共有する方法Kazuki Saito
 
react-scriptsはwebpackで何をしているのか
react-scriptsはwebpackで何をしているのかreact-scriptsはwebpackで何をしているのか
react-scriptsはwebpackで何をしているのか暁 三宅
 
微服務基礎建設 - Message Queue
微服務基礎建設 - Message Queue微服務基礎建設 - Message Queue
微服務基礎建設 - Message QueueAndrew Wu
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinTiago Simões
 
今なら間に合う分散型IDとEntra Verified ID
今なら間に合う分散型IDとEntra Verified ID今なら間に合う分散型IDとEntra Verified ID
今なら間に合う分散型IDとEntra Verified IDNaohiro Fujie
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2Docker, Inc.
 
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinars
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinarsFIWARE アーキテクチャの保護 - FIWARE WednesdayWebinars
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinarsfisuda
 
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)NTT DATA Technology & Innovation
 

La actualidad más candente (20)

Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --
Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --
Modern Authentication -- FIDO2 Web Authentication (WebAuthn) を学ぶ --
 
Redmine 4.1 新機能評価ガイド <速報版>
Redmine 4.1 新機能評価ガイド <速報版>Redmine 4.1 新機能評価ガイド <速報版>
Redmine 4.1 新機能評価ガイド <速報版>
 
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
最適なOpenJDKディストリビューションの選び方 #codetokyo19B3 #ccc_l5
 
Tour of Dapr
Tour of DaprTour of Dapr
Tour of Dapr
 
かずきのUWP入門
かずきのUWP入門かずきのUWP入門
かずきのUWP入門
 
Active directoryと認証・認可
Active directoryと認証・認可Active directoryと認証・認可
Active directoryと認証・認可
 
Jenkins Tutorial.pdf
Jenkins Tutorial.pdfJenkins Tutorial.pdf
Jenkins Tutorial.pdf
 
SSIとDIDで何を解決したいのか?(β版)
SSIとDIDで何を解決したいのか?(β版)SSIとDIDで何を解決したいのか?(β版)
SSIとDIDで何を解決したいのか?(β版)
 
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介 【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介
【BS3】Visual Studio 2022 と .NET 6 での Windows アプリ開発技術の紹介
 
Uwpアプリケーション開発入門
Uwpアプリケーション開発入門Uwpアプリケーション開発入門
Uwpアプリケーション開発入門
 
Modern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOpsModern CI/CD Pipeline Using Azure DevOps
Modern CI/CD Pipeline Using Azure DevOps
 
Node-RED TIPS:functionノード間で関数を共有する方法
Node-RED TIPS:functionノード間で関数を共有する方法Node-RED TIPS:functionノード間で関数を共有する方法
Node-RED TIPS:functionノード間で関数を共有する方法
 
react-scriptsはwebpackで何をしているのか
react-scriptsはwebpackで何をしているのかreact-scriptsはwebpackで何をしているのか
react-scriptsはwebpackで何をしているのか
 
微服務基礎建設 - Message Queue
微服務基礎建設 - Message Queue微服務基礎建設 - Message Queue
微服務基礎建設 - Message Queue
 
How to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelinHow to configure a hive high availability connection with zeppelin
How to configure a hive high availability connection with zeppelin
 
今なら間に合う分散型IDとEntra Verified ID
今なら間に合う分散型IDとEntra Verified ID今なら間に合う分散型IDとEntra Verified ID
今なら間に合う分散型IDとEntra Verified ID
 
Docker Registry V2
Docker Registry V2Docker Registry V2
Docker Registry V2
 
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinars
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinarsFIWARE アーキテクチャの保護 - FIWARE WednesdayWebinars
FIWARE アーキテクチャの保護 - FIWARE WednesdayWebinars
 
NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話NGINXをBFF (Backend for Frontend)として利用した話
NGINXをBFF (Backend for Frontend)として利用した話
 
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)
モノリスからマイクロサービスへの移行 ~ストラングラーパターンの検証~(Spring Fest 2020講演資料)
 

Similar a Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by Simon Hughes, Dice.com

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkSimon Hughes
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Lucidworks
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingSimon Hughes
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesOpenSource Connections
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectorsSimon Hughes
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingDataWorks Summit
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Lucidworks
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...lucenerevolution
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformLucidworks (Archived)
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorialYiqun Liu
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...Aman Grover
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Lucidworks
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comLucidworks
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...RuleML
 

Similar a Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by Simon Hughes, Dice.com (20)

Dice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank TalkDice.com Bay Area Search - Beyond Learning to Rank Talk
Dice.com Bay Area Search - Beyond Learning to Rank Talk
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
Vectors in Search – Towards More Semantic Matching - Simon Hughes, Dice.com
 
Vectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic MatchingVectors in Search - Towards More Semantic Matching
Vectors in Search - Towards More Semantic Matching
 
Haystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon HughesHaystack 2019 - Search with Vectors - Simon Hughes
Haystack 2019 - Search with Vectors - Simon Hughes
 
Searching with vectors
Searching with vectorsSearching with vectors
Searching with vectors
 
Improving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language ProcessingImproving Search in Workday Products using Natural Language Processing
Improving Search in Workday Products using Natural Language Processing
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
Enriching Solr with Deep Learning for a Question Answering System - Sanket Sh...
 
Apex code (Salesforce)
Apex code (Salesforce)Apex code (Salesforce)
Apex code (Salesforce)
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Candidate selection tutorial
Candidate selection tutorialCandidate selection tutorial
Candidate selection tutorial
 
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
SIGIR 2017 - Candidate Selection for Large Scale Personalized Search and Reco...
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.comPersonalized Search and Job Recommendations - Simon Hughes, Dice.com
Personalized Search and Job Recommendations - Simon Hughes, Dice.com
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...RuleML2015 - Tutorial -  Powerful Practical Semantic Rules in Rulelog - Funda...
RuleML2015 - Tutorial - Powerful Practical Semantic Rules in Rulelog - Funda...
 

Más de Lucidworks

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategyLucidworks
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceLucidworks
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsLucidworks
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesLucidworks
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Lucidworks
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...Lucidworks
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Lucidworks
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Lucidworks
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteLucidworks
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentLucidworks
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeLucidworks
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Lucidworks
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchLucidworks
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Lucidworks
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyLucidworks
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Lucidworks
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceLucidworks
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchLucidworks
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondLucidworks
 

Más de Lucidworks (20)

Search is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce StrategySearch is the Tip of the Spear for Your B2B eCommerce Strategy
Search is the Tip of the Spear for Your B2B eCommerce Strategy
 
Drive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in SalesforceDrive Agent Effectiveness in Salesforce
Drive Agent Effectiveness in Salesforce
 
How Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant ProductsHow Crate & Barrel Connects Shoppers with Relevant Products
How Crate & Barrel Connects Shoppers with Relevant Products
 
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product DiscoveryLucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
 
Connected Experiences Are Personalized Experiences
Connected Experiences Are Personalized ExperiencesConnected Experiences Are Personalized Experiences
Connected Experiences Are Personalized Experiences
 
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
 
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
 
Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020Preparing for Peak in Ecommerce | eTail Asia 2020
Preparing for Peak in Ecommerce | eTail Asia 2020
 
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
 
AI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and RosetteAI-Powered Linguistics and Search with Fusion and Rosette
AI-Powered Linguistics and Search with Fusion and Rosette
 
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual MomentThe Service Industry After COVID-19: The Soul of Service in a Virtual Moment
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
 
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - EuropeWebinar: Smart answers for employee and customer support after covid 19 - Europe
Webinar: Smart answers for employee and customer support after covid 19 - Europe
 
Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19Smart Answers for Employee and Customer Support After COVID-19
Smart Answers for Employee and Customer Support After COVID-19
 
Applying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 ResearchApplying AI & Search in Europe - featuring 451 Research
Applying AI & Search in Europe - featuring 451 Research
 
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1Webinar: Accelerate Data Science with Fusion 5.1
Webinar: Accelerate Data Science with Fusion 5.1
 
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce StrategyWebinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
 
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
 
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision IntelligenceApply Knowledge Graphs and Search for Real-World Decision Intelligence
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
 
Webinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise SearchWebinar: Building a Business Case for Enterprise Search
Webinar: Building a Business Case for Enterprise Search
 
Why Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and BeyondWhy Insight Engines Matter in 2020 and Beyond
Why Insight Engines Matter in 2020 and Beyond
 

Último

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Último (20)

Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Implementing Conceptual Search in Solr using LSA and Word2Vec: Presented by Simon Hughes, Dice.com

  • 1. O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X
  • 2. Implementing Conceptual Search in Solr Simon Hughes Chief Data Scientist, Dice.com
  • 3. 3 •  Chief Data Scientist at Dice.com, under Yuri Bykov •  Key Projects involving Solr: Who Am I? •  Recommender Systems – more jobs like this, more seekers like this (uses custom Solr index) •  Custom Dice Solr MLT handler (real-time recommendations) •  Did you mean functionality •  Title, skills and company type-ahead •  Relevancy improvements in dice jobs search
  • 4. 4 •  Supply Demand Analysis •  Dice Skills pages – http://www.dice.com/skills Other Projects PhD •  PhD candidate at DePaul University, studying natural language processing and machine learning
  • 5.
  • 6.
  • 7. 7 Q. What is the Most Common Relevancy Tuning Mistake?
  • 8. 8 Q. What is the Most Common Relevancy Tuning Mistake? A. Ignoring the importance of RECALL
  • 9. 9 Relevancy Tuning •  Key performance metrics to measure: •  Precision •  Recall •  F1 Measure - 2*(P*R)/(P+R) •  Precision is easier – correct mistakes in the top search results •  Recall - need to know which relevant documents don’t come back •  Hard to accurately measure •  Need to know all the relevant documents present in the index
  • 10. 10 What is Conceptual Search? •  A.K.A. Semantic Search •  Two key challenges with keyword matching: •  Polysemy: Words have more than one meaning •  e.g. engineer – mechanical? programmer? automation engineer? •  Synonymy: Many different words have the same meaning •  e.g. QA, quality assurance, tester; VB, Visual Basic, VB.Net •  Other related challenges - •  Typos, Spelling Errors, Idioms •  Conceptual search attempts to solve these problems by learning concepts
  • 11. 11 Why Conceptual Search? •  We will attempt to improve recall without diminishing precision •  Can match relevant documents containing none of the query terms
  • 12. 12 Concepts •  Conceptual search allows us to retrieve documents by how similar the concepts in the query are to the concepts in a document •  Concepts represent important high-level ideas in a given domain (e.g. java technologies, big data jobs, helpdesk support, etc) •  Concepts are automatically learned from documents using machine learning •  Words can belong to multiple concepts, with varying strengths of association with each concept
  • 13. 13 Traditional Techniques •  Many algorithms have been used for concept learning, include LSA (Latent Semantic Analysis), LDA (Latent Dirichlet Allocation) and Word2Vec •  All involve mapping a document to a low dimensional dense vector (an array of numbers) •  Each element of the vector is a number representing how well the document represents that concept •  E.g. LSA powers the similar skills found in dice’s skills pages
  • 14. 14 Traditional Techniques Don’t Scale •  LSALSI, LDA and related techniques rely on factorization of very large term- document matrices – very slow and computationally intensive •  Require embedding a machine learning model with the search engine to map new queries to the concept space (latent or topic space) •  Query performance is very poor – unable to utilize the inverted index as all documents have the same number of concepts •  What we want is a way to map words not documents to concepts. Then we can embed this in Solr via synonym filters and custom query parsers
  • 15. 15 Word2Vec and ‘Word Math’ •  Word2Vec was developed by google around 2013 for learning vector representations for words, building on earlier work from Rumelhart, Hinton and Williams in 1986 (see paper below for citation of this work) •  Word2Vec Paper: Efficient Estimation of Word Representations in Vector Space •  It works by training a machine learning model to predict the words surrounding a word in a sentence •  Similar words get similar vector representations
  • 16. 16 “Word Math” Example •  Using basic vector arithmetic, you get some interesting patterns •  This illustrates how it represents relationships between words •  E.g. man – king + woman = queen
  • 17. 17 The algorithm learns to represent different types of relationships between words in vector form
  • 18. 18 At this point you may be thinking…
  • 19. 19
  • 20. 20 Why Do I Care? This is a Search Conference…
  • 21. 21 Why Do I Care? This is a Search Conference… •  This algorithm can be used to represent documents as vectors of concepts •  We can them use these representations to do conceptual search •  This will surface many relevant documents missed by keyword matching •  This boosts recall •  This technique can also be used to automatically learn synonyms
  • 22. 22 A Quick Demo Using our Dice.com active jobs index, some example common user queries: •  Data Scientist •  Big Data •  Information Retrieval •  C# •  Web Developer •  CTO •  Project Manager
  • 23. 23 How? GitHub- DiceTechJobs/ConceptualSearch: 1.  Pre-Process documents – parse html, strip noise characters, tokenize words 2.  Define important keywords for your domain, or use my code to auto extract top terms and phrases 3.  Train Word2Vec model on documents to produce a word2vec model 4.  Using this model, either: 1.  Vectors: Use the raw vectors and embed them in Solr using synonyms + payloads 2.  Top N Similar: Or extract the top n similar terms with similarities and embed these as weighted synonyms using my custom queryboost parser and tokenizer 3.  Clusters: Cluster these vectors by similarity, and map terms to clusters in a synonym file
  • 24. 24 Define Top Domain Specific Keywords •  If you have a set of documents belonging to a specific domain, it is important to define the important keywords for your domain: •  Use top few thousand search keywords •  Or use my fast keyword and phrase extraction tool (in GitHub) •  Or use SolrLucene shingle filter to extract top 1 - 4 word sequences by document frequency •  Important to map common phrases to single tokens, e.g. data scientist => data_scientist, java developer=>java_developer
  • 25. 25 Do It Yourself •  All code for this talk is now publicly available on GitHub: •  https://github.com/DiceTechJobs/SolrPlugins - Solr plugins to work with conceptual search, and other dice plugins, such as a custom MLT handler •  https://github.com/DiceTechJobs/SolrConfigExamples - Examples of Solr configuration file entries to enable conceptual search and other Dice plugins: •  https://github.com/DiceTechJobs/ConceptualSearch - Python code to compute the Word2Vec word vectors, and generate Solr synonym files
  • 26. 26 Some Solr Tricks to Make this Happen 1.  Keyword Extraction: Use the synonym filter to extract key words from your documents
  • 27. 27
  • 28. 28 Some Solr Tricks to Make this Happen 1.  Keyword Extraction: Use the synonym filter to extract key words from your documents 2.  Synonym Expansion using Payloads: •  Use the synonym filter to expand a keyword to multiple tokens •  Each token has an associated payload – used to adjust relevancy scores at index or query time
  • 29. 29
  • 30. 30 Synonym File Examples – Vector Method •  Each keyword maps to a set of tokens via a synonym file •  Vector Synonym file entry (5 element vector, usually100+ elements): •  java developer=>001|0.4 002|0.1 003|0.5 005|.9 •  Uses a custom token filter that averages these vectors over the entire document (see GitHub - DiceTechJobs/SolrPlugins) •  Relatively fast at index time but some additional indexing overhead •  Very slow to query
  • 31.
  • 32. 32 Synonym File Examples – Top N Method •  Each keyword maps to a set of most similar keywords via a synonym file •  Top N Synonym file entry (top 5): •  java_developer=>java_j2ee_developer|0.907526 java_architect|0.889903 lead_java_developer|0.867594 j2ee_developer|0.864028 java_engineer|0.861407 •  Can configure this at index time with payloads, a payload aware query parser and a payload similarity function •  Or you can configure this at query time with a special token filter that converts payloads into term boosts, along with a special parser (see GitHub - DiceTechJobs/SolrPlugins) •  Fast at index and query time if N is reasonable (10-30)
  • 33. 33 Searching over Clustered Terms •  After we have learned word vectors, we can use a clustering algorithm to cluster terms by their vectors to give clusters of related words •  Can learn several different sizes of cluster, such as 500, 1000, 5000 clusters, and map each of these to a separate field •  Apply stronger boosts to the fields containing smaller clusters (e.g. the 5000 cluster field) using the edismax qf parameter - tighter clusters get more weight •  Code for clustering vectors in GitHub - DiceTechJobs/ConceptualSearch
  • 34. 34 Synonym File Examples – Clustering Method •  Each keyword in a cluster maps the same artificial token for that cluster •  Cluster Synonym file entries: •  java=>cluster_171 •  java applications=>cluster_171 •  java coding=>cluster_171 •  java design=>cluster_171 •  Doesn’t use payloads so does not require any special plugins •  No noticeable impact on query or indexing performance
  • 35. 35 Example Clusters Learned from Dice Job Postings •  Note: Labels in bold are manually assigned for interpretability: •  Natural Languages: bi lingual, bilingual, chinese, fluent, french, german, japanese, korean, lingual, localized, portuguese, russian, spanish, speak, speaker •  Apple Programming Languages: cocoa, swift •  Search Engine Technologies: apache solr, elasticsearch, lucene, lucene solr, search, search engines, search technologies, solr, solr lucene •  Microsoft .Net Technologies: c# wcf, microsoft c#, microsoft.net, mvc web, wcf web services, web forms, webforms, windows forms, winforms, wpf wcf
  • 36. 36 Example Clusters Learned from Dice Job Postings AttentionAttitude: attention, attentive, close attention, compromising, conscientious, conscious, customer oriented, customer service focus, customer service oriented, deliver results, delivering results, demonstrated commitment, dependability, dependable, detailed oriented, diligence, diligent, do attitude, ethic, excellent follow, extremely detail oriented, good attention, meticulous, meticulous attention, organized, orientated, outgoing, outstanding customer service, pay attention, personality, pleasant, positive attitude, professional appearance, professional attitude, professional demeanor, punctual, punctuality, self motivated, self motivation, superb, superior, thoroughness
  • 37. 37 Summary •  It’s easy to overlook recall when performing relevancy tuning •  Conceptual search improves recall while maintaining high precision by matching documents on concepts or ideas. •  In reality this involves learning which terms are related to one another •  Word2Vec is a scalable algorithm for learning related words from a set of documents, that gives state of the art results in word analogy tasks •  We can train a Word2Vec model offline, and embed it’s output into Solr by using the in-built synonym filter and payload functionality, combined with some custom plugins