SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
Query Expansion with Locally-
Trained Word Embeddings
Fernando Bhaskar Mitra Nick Craswell
Microsoft
p(d)
d
p(d)
d
q
p(d|q)
cut
global local*
cutting tax
squeeze deficit
reduce vote
slash budget
reduction reduction
spend house
lower bill
halve plan
soften spend
freeze billion
global: trained using full
corpus
local: trained using topically-
*gas
global local
t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
• local term clustering [Lesk 1968, Attar and Fraenkel
1977]
• local latent semantic analysis [Hull 1995, Hull, 1994;
Schutze et al., 1995; Singhal et al., 1997]
• local document clustering [Tombros and van
Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985]
• one sense per discourse [Gale et al., 1992]
target
corpus
query
results
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …]
query = gas tax
…
gas petroleum:0.9 indigestion:0.6 …
tax tariff:0.7 strain:0.4 …
…[ ]W=
q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …]
query = gas tax
d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
W = UUT
U m ⇥ k embedding matrix
p(d)
d
q
p(d|q)
p(d)
d
q
˜p(d|q)
target
corpus
query
results
external
corpus
query
results
U =
8
>>><
>>>:
uniform p(d) on the target corpus
uniform p(d) on an external corpus
p(d|q) on the target corpus
p(d|q) on an external corpus
docs words queries
trec12 469,949 438,338 150
robust 528,155 665,128 250
web 50,220,423 90,411,624 200
global local
target target
wikipedia+gigaword* gigaword†
google news* wikipedia†
*publicly available embedding; †publicly available external corpus
target
corpus
query
results
external
corpus
query
results
target
corpus
query
results
target
corpus
query
results
external
corpus
query
results
trec12 robust web
local vs global
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
expansion
none
global
local
trec12 robust web
local embedding
NDCG@10
0.0
0.1
0.2
0.3
0.4
0.5
corpus
target
gigaword
wikipedia
• local embedding provides a stronger representation than
global embedding
• potential impact for other topic-specific natural language
processing tasks
• future work
• effectiveness improvements
• efficiency improvements

Más contenido relacionado

Destacado

Destacado (10)

Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
Query Expansion with Locally-Trained Word Embeddings (Neu-IR 2016)
 
Using Text Embeddings for Information Retrieval
Using Text Embeddings for Information RetrievalUsing Text Embeddings for Information Retrieval
Using Text Embeddings for Information Retrieval
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
A Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web DataA Proposal for Evaluating Answer Distillation from Web Data
A Proposal for Evaluating Answer Distillation from Web Data
 
Neu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the TrenchesNeu-IR 2016: Lessons from the Trenches
Neu-IR 2016: Lessons from the Trenches
 
Neu-ir 2016: Opening note
Neu-ir 2016: Opening noteNeu-ir 2016: Opening note
Neu-ir 2016: Opening note
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
A Simple Introduction to Word Embeddings
A Simple Introduction to Word EmbeddingsA Simple Introduction to Word Embeddings
A Simple Introduction to Word Embeddings
 
Vectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for SearchVectorland: Brief Notes from Using Text Embeddings for Search
Vectorland: Brief Notes from Using Text Embeddings for Search
 
Natural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4jNatural Language Processing with Graph Databases and Neo4j
Natural Language Processing with Graph Databases and Neo4j
 

Más de Bhaskar Mitra

Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Bhaskar Mitra
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
Bhaskar Mitra
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
Bhaskar Mitra
 

Más de Bhaskar Mitra (20)

Joint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and RecommendationJoint Multisided Exposure Fairness for Search and Recommendation
Joint Multisided Exposure Fairness for Search and Recommendation
 
What’s next for deep learning for Search?
What’s next for deep learning for Search?What’s next for deep learning for Search?
What’s next for deep learning for Search?
 
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
So, You Want to Release a Dataset? Reflections on Benchmark Development, Comm...
 
Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...Efficient Machine Learning and Machine Learning for Efficiency in Information...
Efficient Machine Learning and Machine Learning for Efficiency in Information...
 
Multisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and RecommendationMultisided Exposure Fairness for Search and Recommendation
Multisided Exposure Fairness for Search and Recommendation
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Neural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progressNeural Information Retrieval: In search of meaningful progress
Neural Information Retrieval: In search of meaningful progress
 
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning TrackConformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
Conformer-Kernel with Query Term Independence @ TREC 2020 Deep Learning Track
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Duet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning TrackDuet @ TREC 2019 Deep Learning Track
Duet @ TREC 2019 Deep Learning Track
 
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and BeyondBenchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
Benchmarking for Neural Information Retrieval: MS MARCO, TREC, and Beyond
 
Deep Neural Methods for Retrieval
Deep Neural Methods for RetrievalDeep Neural Methods for Retrieval
Deep Neural Methods for Retrieval
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Learning to Rank with Neural Networks
Learning to Rank with Neural NetworksLearning to Rank with Neural Networks
Learning to Rank with Neural Networks
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)Dual Embedding Space Model (DESM)
Dual Embedding Space Model (DESM)
 
Adversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrievalAdversarial and reinforcement learning-based approaches to information retrieval
Adversarial and reinforcement learning-based approaches to information retrieval
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 

Query Expansion with Locally-Trained Word Embeddings (ACL 2016)

  • 1. Query Expansion with Locally- Trained Word Embeddings Fernando Bhaskar Mitra Nick Craswell Microsoft
  • 4. cut global local* cutting tax squeeze deficit reduce vote slash budget reduction reduction spend house lower bill halve plan soften spend freeze billion global: trained using full corpus local: trained using topically- *gas
  • 5. global local t-SNE projection: top words by ˜p(d|q) (blue: query; red: top words by p(d|q))
  • 6.
  • 7.
  • 8. • local term clustering [Lesk 1968, Attar and Fraenkel 1977] • local latent semantic analysis [Hull 1995, Hull, 1994; Schutze et al., 1995; Singhal et al., 1997] • local document clustering [Tombros and van Rijsbergen, 2001; Tombros et al., 2002; Willett, 1985] • one sense per discourse [Gale et al., 1992]
  • 10. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax
  • 11. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 12. q = [gas:1.0 tax:1.0 petroleum:0.0 tariff:0.0 …] query = gas tax … gas petroleum:0.9 indigestion:0.6 … tax tariff:0.7 strain:0.4 … …[ ]W=
  • 13. q = [gas:1.0 tax:1.0 petroleum:0.8 tariff:0.6 …] query = gas tax d = [gas:0.0 tax:0.0 petroleum:0.7 tariff:0.5 …]
  • 14. W = UUT U m ⇥ k embedding matrix
  • 18. U = 8 >>>< >>>: uniform p(d) on the target corpus uniform p(d) on an external corpus p(d|q) on the target corpus p(d|q) on an external corpus
  • 19. docs words queries trec12 469,949 438,338 150 robust 528,155 665,128 250 web 50,220,423 90,411,624 200
  • 20. global local target target wikipedia+gigaword* gigaword† google news* wikipedia† *publicly available embedding; †publicly available external corpus target corpus query results external corpus query results target corpus query results target corpus query results external corpus query results
  • 21. trec12 robust web local vs global NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 expansion none global local
  • 22. trec12 robust web local embedding NDCG@10 0.0 0.1 0.2 0.3 0.4 0.5 corpus target gigaword wikipedia
  • 23. • local embedding provides a stronger representation than global embedding • potential impact for other topic-specific natural language processing tasks • future work • effectiveness improvements • efficiency improvements