SlideShare una empresa de Scribd logo
1 de 20
Summary of Papers of
SIGIR 2011 Workshop on Query
Representation and Understanding
Chetana Gavankar
Ricardo Campos, Alipio Jorge, Gael Dias:
"Using Web Snippets and Query-logs to
Measure Implicit Temporal Intents in
Queries"
Types of Temporal queries
1. Atemporal: Queries not sensitive to
time like plan my trip
2.Temporal unambiguous: Queries in
concrete time period. Ex: Haiti earthquake
in 2010
3. Temporal ambiguous: queries with
multiple instances over time. Ex: Cricket
worldcup which occurs every four years.
Web snippets and Query Logs
Content-Related Resources, based on a web content approach
Simply requires the set of web search results.
Query-Log Resources, based on similar year-qualified queries
Imply that some versions of the query have already been issued.
1.Web snippets
(temporal evidence within web pages):
TA(q)=∑fεI wf f(q)
I = {Tsnippet(.),TTitle(.),TUrl(.)}
Value each feature differently using wf
18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.)
If TA(q) value < 10% then Atemporal.
Dates appearing in query & docs may not match.
TSnippets =
# Snippets Retrieved
# Snippets Retrieved with Dates
Identifying implicit temporal queries
Identifying implicit temporal queries
2.Web Query Logs: Temporal activity can be
recorded from date & time of request and from user
activity.
No. of times query is pre, post qualified by year is
WA(q,y)=#(y,q) + #(q,y)
α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x)
If query qualified with single year then α(q) =1
Results
Temporal information is more frequent in web snippets than
in any of the query logs of Google and Yahoo!;
Most of the queries have a TSnippet(.) value around 20%,
TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
Conclusion
➔Future dates common in snippets than query log
➔Query having dates does not necessarily mean
that it has temporal intent (from web query logs of
Google and yahoo) Ex: October Sky movie
➔Web snippets statistically more relevant in terms
of temporal intent than query logs
Rishiraj Saha Roy, Niloy Ganguly, Monojit
Choudhury, Naveen Singh:
"Complex Network Analysis Reveals
Kernel-Periphery Structure in Web
Search Queries"
Search Queries
Search Query language: bag of segments
Word occurrence n/w: Edge exists if Pij > Pi Pj
Eight complex network models for query logs
●
Query Unrestricted wordnet(local) and (global)
●
Query Restricted wordnet(local) and (global)
●
Query Unrestricted SegmentNet(local) and (global)
●
Query Restricted SegmentNet(local) and (global)
Kernel and Peripheral lexicons
Two regimes in DD of word occurrence N/W:
1.Kernel lexicons (K-Lex or modifiers):
• Units popular in query (high degrees)
• Generic and domain independent
2.Peripheral lexicon (P-Lex or HEADs):Rare ones
with degree much less than those in kernal
P
K-Lex (popular segments) P-Lex (rarer segments)
how to matthew brodrick
wiki accessories
free police officer
and who is
in australia epson tx800
videos star trek next gen
Degree Disribution
|N| = Nodes, |E| = edges
C= average clustering coefficient
d=mean shortest path between edges
Crand and drand are corr. Values in random graph
Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|)
k'= average degree of graph
Degree distribution= p(k)
= nodes with degree k/ total nodes
Two regime power law
Conclusion
● Like NL, Queries reflect kernal-periphery distinction
Unlike NL, Query N/W lack small word property for
quickly retrieving words from mind
● More difficult to understand context of segment in query.
● Peripheral N/W consist of large number of small
disconnected components
● Capability of peripheral units to exist by themselves
makes POS identification hard in Queries.
● Socio-cultural factors govern the kernel-periphery
distinction in queries
Lidong Bing, Wai Lam:
"Investigation of Web Query Refinement
via Topic Analysis and Learning with
Personalization"
Web Query Refinement
● Query Refinement
● Substitution
● Expansion
● Deletion
● Stemming
● Spelling correction
● Abbreviation expansion
......................
● Generate some candidate queries first, and score
the quality of these candidates.
Latent Topic Analysis in Query Log
Query log record (user_id, query, clicked_url, time)
Pseudo-document generation: Queries related to the same host are
aggregated. General sites like “en.wikipedia.org” are not suitable for
latent topic analysis & are eliminated
Latent Dirichlet Allocation Algorithm) LDA to conduct the latent
semantic topic analysis on the collection of host-based pseudo-
documents.
Z = set of latent topics zi
Each zi is associated with multinomial distribution of terms
P(tk|zi)= prob of term tk given topic zi
Personalization
πu ={πu
1, πu
2, … , πu
|z|} = profile of the user u,
πu
i = P(zi|u) = probability that the user u prefers the
topic zi
Generate user-based pseudo-document U for user u.
{P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u.
candidate query q: t1, … tn
Topic of term tr = zr
Topic based scoring with
personalization
Candidate query score:
model parameter P(zj|zi) captures the relationship of two
topics
With personal profile
P(z1|u) = probability that user u prefers the topic z1
Conclusion
Framework that considers
personalization achieves
the best performance.
With user profiles, the
topic-based scoring part
is more reliable

Más contenido relacionado

Similar a SIGIR 2011

Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
Trey Grainger
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Peter Haase
 

Similar a SIGIR 2011 (20)

Sigir 2011 proceedings
Sigir 2011 proceedingsSigir 2011 proceedings
Sigir 2011 proceedings
 
Technologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic RecordsTechnologies For Appraising and Managing Electronic Records
Technologies For Appraising and Managing Electronic Records
 
WOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of ThingsWOTS2E: A Search Engine for a Semantic Web of Things
WOTS2E: A Search Engine for a Semantic Web of Things
 
Towards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology ApplicationsTowards a Quality Assessment of Web Corpora for Language Technology Applications
Towards a Quality Assessment of Web Corpora for Language Technology Applications
 
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
Reflected Intelligence - Lucene/Solr as a self-learning data system: Presente...
 
Reflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data systemReflected Intelligence: Lucene/Solr as a self-learning data system
Reflected Intelligence: Lucene/Solr as a self-learning data system
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Final presentation
Final presentationFinal presentation
Final presentation
 
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
Never Stop Exploring - Pushing the Limits of Solr: Presented by Anirudha Jadh...
 
Understanding Queries through Entities
Understanding Queries through EntitiesUnderstanding Queries through Entities
Understanding Queries through Entities
 
Building a Semantic search Engine in a library
Building a Semantic search Engine in a libraryBuilding a Semantic search Engine in a library
Building a Semantic search Engine in a library
 
Hierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyondHierarchical clustering in Python and beyond
Hierarchical clustering in Python and beyond
 
Introduction to Text Mining
Introduction to Text MiningIntroduction to Text Mining
Introduction to Text Mining
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...Integrating a Domain Ontology Development Environment and an Ontology Search ...
Integrating a Domain Ontology Development Environment and an Ontology Search ...
 
Knowledge discoverylaurahollink
Knowledge discoverylaurahollinkKnowledge discoverylaurahollink
Knowledge discoverylaurahollink
 
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data ProcessingFedbench - A Benchmark Suite for Federated Semantic Data Processing
Fedbench - A Benchmark Suite for Federated Semantic Data Processing
 
Duplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy DatasetDuplicate Detection on Hoaxy Dataset
Duplicate Detection on Hoaxy Dataset
 

Último

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
MateoGardella
 

Último (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.Gardella_Mateo_IntellectualProperty.pdf.
Gardella_Mateo_IntellectualProperty.pdf.
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 

SIGIR 2011

  • 1. Summary of Papers of SIGIR 2011 Workshop on Query Representation and Understanding Chetana Gavankar
  • 2. Ricardo Campos, Alipio Jorge, Gael Dias: "Using Web Snippets and Query-logs to Measure Implicit Temporal Intents in Queries"
  • 3. Types of Temporal queries 1. Atemporal: Queries not sensitive to time like plan my trip 2.Temporal unambiguous: Queries in concrete time period. Ex: Haiti earthquake in 2010 3. Temporal ambiguous: queries with multiple instances over time. Ex: Cricket worldcup which occurs every four years.
  • 4. Web snippets and Query Logs Content-Related Resources, based on a web content approach Simply requires the set of web search results. Query-Log Resources, based on similar year-qualified queries Imply that some versions of the query have already been issued.
  • 5. 1.Web snippets (temporal evidence within web pages): TA(q)=∑fεI wf f(q) I = {Tsnippet(.),TTitle(.),TUrl(.)} Value each feature differently using wf 18.14 for TTitles, 50.91 for TSnippets and 30.95 for Turl(.) If TA(q) value < 10% then Atemporal. Dates appearing in query & docs may not match. TSnippets = # Snippets Retrieved # Snippets Retrieved with Dates Identifying implicit temporal queries
  • 6. Identifying implicit temporal queries 2.Web Query Logs: Temporal activity can be recorded from date & time of request and from user activity. No. of times query is pre, post qualified by year is WA(q,y)=#(y,q) + #(q,y) α(q) = ∑y WA (q,y) / ∑x#(x,q) + ∑x#(q,x) If query qualified with single year then α(q) =1
  • 7. Results Temporal information is more frequent in web snippets than in any of the query logs of Google and Yahoo!; Most of the queries have a TSnippet(.) value around 20%, TLogYahoo(.) and TLogGoogle(.) are mostly near to 0%.
  • 8. Conclusion ➔Future dates common in snippets than query log ➔Query having dates does not necessarily mean that it has temporal intent (from web query logs of Google and yahoo) Ex: October Sky movie ➔Web snippets statistically more relevant in terms of temporal intent than query logs
  • 9. Rishiraj Saha Roy, Niloy Ganguly, Monojit Choudhury, Naveen Singh: "Complex Network Analysis Reveals Kernel-Periphery Structure in Web Search Queries"
  • 10. Search Queries Search Query language: bag of segments Word occurrence n/w: Edge exists if Pij > Pi Pj Eight complex network models for query logs ● Query Unrestricted wordnet(local) and (global) ● Query Restricted wordnet(local) and (global) ● Query Unrestricted SegmentNet(local) and (global) ● Query Restricted SegmentNet(local) and (global)
  • 11. Kernel and Peripheral lexicons Two regimes in DD of word occurrence N/W: 1.Kernel lexicons (K-Lex or modifiers): • Units popular in query (high degrees) • Generic and domain independent 2.Peripheral lexicon (P-Lex or HEADs):Rare ones with degree much less than those in kernal P K-Lex (popular segments) P-Lex (rarer segments) how to matthew brodrick wiki accessories free police officer and who is in australia epson tx800 videos star trek next gen
  • 12. Degree Disribution |N| = Nodes, |E| = edges C= average clustering coefficient d=mean shortest path between edges Crand and drand are corr. Values in random graph Crand ~ k'/ |N| , drand ~ ln(|N|)/ ln(|k'|) k'= average degree of graph Degree distribution= p(k) = nodes with degree k/ total nodes
  • 14. Conclusion ● Like NL, Queries reflect kernal-periphery distinction Unlike NL, Query N/W lack small word property for quickly retrieving words from mind ● More difficult to understand context of segment in query. ● Peripheral N/W consist of large number of small disconnected components ● Capability of peripheral units to exist by themselves makes POS identification hard in Queries. ● Socio-cultural factors govern the kernel-periphery distinction in queries
  • 15. Lidong Bing, Wai Lam: "Investigation of Web Query Refinement via Topic Analysis and Learning with Personalization"
  • 16. Web Query Refinement ● Query Refinement ● Substitution ● Expansion ● Deletion ● Stemming ● Spelling correction ● Abbreviation expansion ...................... ● Generate some candidate queries first, and score the quality of these candidates.
  • 17. Latent Topic Analysis in Query Log Query log record (user_id, query, clicked_url, time) Pseudo-document generation: Queries related to the same host are aggregated. General sites like “en.wikipedia.org” are not suitable for latent topic analysis & are eliminated Latent Dirichlet Allocation Algorithm) LDA to conduct the latent semantic topic analysis on the collection of host-based pseudo- documents. Z = set of latent topics zi Each zi is associated with multinomial distribution of terms P(tk|zi)= prob of term tk given topic zi
  • 18. Personalization πu ={πu 1, πu 2, … , πu |z|} = profile of the user u, πu i = P(zi|u) = probability that the user u prefers the topic zi Generate user-based pseudo-document U for user u. {P(z1|U), P(z2|U), … , P(z|Z||U)} = profile of u. candidate query q: t1, … tn Topic of term tr = zr
  • 19. Topic based scoring with personalization Candidate query score: model parameter P(zj|zi) captures the relationship of two topics With personal profile P(z1|u) = probability that user u prefers the topic z1
  • 20. Conclusion Framework that considers personalization achieves the best performance. With user profiles, the topic-based scoring part is more reliable