SlideShare a Scribd company logo
1 of 20
Dynamic Collective Entity
Representations for Entity Ranking
David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
2
3
4
Entity search?
 Index = Knowledge Base (= Wikipedia)
 Documents = Entities
 “Real world entities” have a single representation
(in KB)
5
Representation is not static
 People talk about entities all the time
 Associations between words and entities change
over time
6
Example 1: News events
7
Example 2: Social media chatter
8
Dynamic Collective Entity
Representations
 Use “collective intelligence” to mine entity
descriptions to enrich representation.
 Is like document expansion (add terms found
through explicit links)
 Is not query expansion (terms found through
predicted links)
9
Advantages
 Cheap: Change document in index, leverage tried &
tested retrieval algorithms
 Free “smoothing”: (e.g., tweets) may capture ‘newly
evolving’ word associations (Ferguson shooting) and
incorporate out-of-document terms
 “move relevant documents closer to queries” (= close
the gap between searcher vocabulary & docs in index)
10
Haven’t we seen this before?
 Anchors & queries in particular have been shown to
improve retrieval [1]
 Tweets have been shown to be similar to anchors [2]
 Social tags, same [3]
 But:
 in batch (i.e., add data, see how it affects retrieval)
 single source
[1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001
[2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12
[3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
11
Description sourcesAnthropornis nordenskjoeldi
Anthropornis
Nordenskjoeld's Giant Penguin
Eocene
Oligocene
Animal
Chordate
Aves
Sphenisciformes
Spheniscidae
...
emperor penguin
Nordenskjoeld's Giant Penguin
Anthropornis nordenskjoeldi
Nordenskjoeld's giant penguin
Anthropornis
Eocene birds
Oligocene birds
Extinct penguins
Oligocene extinctions
Bird genera
KB Anchors
KB Categories
KB Redirects
KB Links
Anthropornis nordenskjoeldi
Anthropornis nordenskjoeldi
Web Anchors
megafauna
Tags
Tweets
biggest penguin
anthropornis
extinct penguin
prehistoric birds
Queries
12
Challenge
 Heterogeneity
1. Description sources
2. Entities
 Dynamic nature
 Content changes over time
13
Method: Adaptive ranking
 Supervised single-field weighting model
 Features:
 field similarity: retrieval score per field.
 field “importance”: length, novel terms, etc.
 entity “importance”: time since last update.
 (Re-)learn optimal weights from clicks
14
Experimental setup
1. Data:
 MSN Query log (62,841 queries + clicks (on entities))
 Each query is treated as a time unit
 For each query:
 Produce ranking
 Observe click
 Evaluate ranking (MAP/P@1)
 Expand entities (w/ dynamic descriptions)
 [re-train ranker]
15
Main results
 Comparing effectiveness of diff. description
sources
 Comparing adaptive vs. non-adaptive ranker
performance
16
Description sources
MAP
No. of queries
17
Feature weights over time
Relativefeatureimportance
No. of queries
18
Non-adaptive vs. adaptive ranking
19
In summary
 Expanding entity representations with different
sources enables better matching of queries to
entities
 As new content comes in, it is beneficial to retrain
the ranker
 Informing ranker of “expansion state” further
improves performance
20
Thank you
 (Also, thank you WSDM & SIGIR travel grants)

More Related Content

What's hot

Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
Penn State University
 

What's hot (8)

Knowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, BonnKnowledge Exchange, Nov 2011, Bonn
Knowledge Exchange, Nov 2011, Bonn
 
Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline Role of Amyloid Burden in cognitive decline
Role of Amyloid Burden in cognitive decline
 
Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck Leveraging publication metadata to help overcome the data ingest bottleneck
Leveraging publication metadata to help overcome the data ingest bottleneck
 
Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011Class intro cm7-referencinginternet_2011
Class intro cm7-referencinginternet_2011
 
Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...Research data and scholarly publications: going from casual acquaintances to ...
Research data and scholarly publications: going from casual acquaintances to ...
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010W3C HCLS Scientific Discourse Task Autumn 2010
W3C HCLS Scientific Discourse Task Autumn 2010
 

Viewers also liked (8)

yourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic eventsyourHistory - entity linking for a personalized timeline of historic events
yourHistory - entity linking for a personalized timeline of historic events
 
Understanding Email Traffic
Understanding Email TrafficUnderstanding Email Traffic
Understanding Email Traffic
 
Instance Matching
Instance Matching Instance Matching
Instance Matching
 
Dynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity RankingDynamic Collective Entity Representations for Entity Ranking
Dynamic Collective Entity Representations for Entity Ranking
 
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)Understanding Email Traffic (talk @ E-Discovery NL Symposium)
Understanding Email Traffic (talk @ E-Discovery NL Symposium)
 
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27thDavid Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
David Graus - Entity Linking (at SEA), Search Engines Amsterdam, Fri June 27th
 
Analyzing and Predicting Task Reminders
Analyzing and Predicting Task RemindersAnalyzing and Predicting Task Reminders
Analyzing and Predicting Task Reminders
 
Big Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & ValkuilenBig Data & Machine Learning - Mogelijkheden & Valkuilen
Big Data & Machine Learning - Mogelijkheden & Valkuilen
 

Similar to Dynamic Collective Entity Representations for Entity Ranking

Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
Ankit Sharma
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
ICZN
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 

Similar to Dynamic Collective Entity Representations for Entity Ranking (20)

Shorthouse
ShorthouseShorthouse
Shorthouse
 
Information retrieval and extraction
Information retrieval and extractionInformation retrieval and extraction
Information retrieval and extraction
 
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
December 2, 2015: NISO/NFAIS Virtual Conference: Semantic Web: What's New and...
 
Data Publishing in Archaeozoology
Data Publishing in ArchaeozoologyData Publishing in Archaeozoology
Data Publishing in Archaeozoology
 
Scott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data delugeScott Edmunds: Data publication in the data deluge
Scott Edmunds: Data publication in the data deluge
 
Research Data Sharing LERU
Research Data Sharing LERU Research Data Sharing LERU
Research Data Sharing LERU
 
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon DunsireUKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
UKSG webinar: Quo vadis? Getting there with linked data with Gordon Dunsire
 
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
Relations for Reusing (R4R) in A Shared Context: An Exploration on Research P...
 
Toward a news data science
Toward a news data scienceToward a news data science
Toward a news data science
 
Interpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open ContextInterpretation, Context, and Metadata: Examples from Open Context
Interpretation, Context, and Metadata: Examples from Open Context
 
Material Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre MonninMaterial Cultures2010 Alexandre Monnin
Material Cultures2010 Alexandre Monnin
 
Resources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the WebResources, resources, resources: the three rs of the Web
Resources, resources, resources: the three rs of the Web
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impedimentDonat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
Donat Agosti & Norman F. Johnson - Copyright: the new taxonomic impediment
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology:  A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology:  A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
Multi-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation RecommendationMulti-step Classification Approaches to Cumulative Citation Recommendation
Multi-step Classification Approaches to Cumulative Citation Recommendation
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Our World is Socio-technical
Our World is Socio-technicalOur World is Socio-technical
Our World is Socio-technical
 
Visual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information DiffusionVisual Analysis of Concept Change and Information Diffusion
Visual Analysis of Concept Change and Information Diffusion
 
Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"Scott Edmunds: Data Dissemination in the era of "Big-Data"
Scott Edmunds: Data Dissemination in the era of "Big-Data"
 

More from David Graus

Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
David Graus
 

More from David Graus (14)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Bias in Recommendations
Bias in RecommendationsBias in Recommendations
Bias in Recommendations
 
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
RecSys in the Media Industry: Relevance, Recency, Popularity, and Diversity.
 
CAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for ImpactCAT/AI: Computer Assisted Translation 
Assessment for Impact
CAT/AI: Computer Assisted Translation 
Assessment for Impact
 
Opening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender SystemsOpening the Black Box of User Profiles in Content-based Recommender Systems
Opening the Black Box of User Profiles in Content-based Recommender Systems
 
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacyZoeken, vinden, en aanbevelen: personalisatie vs. privacy
Zoeken, vinden, en aanbevelen: personalisatie vs. privacy
 
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital TracesLayman's Talk: Entities of Interest --- Discovery in Digital Traces
Layman's Talk: Entities of Interest --- Discovery in Digital Traces
 
Financial News Mining @ PyData Amsterdam
Financial News Mining @ PyData AmsterdamFinancial News Mining @ PyData Amsterdam
Financial News Mining @ PyData Amsterdam
 
De Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgevenDe Macht van Data --- Hoe algoritmen ons leven vormgeven
De Macht van Data --- Hoe algoritmen ons leven vormgeven
 
Financial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.infoFinancial News Mining @ FD Mediagroep/Company.info
Financial News Mining @ FD Mediagroep/Company.info
 
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social StreamsGenerating Pseudo-ground Truth for Detecting New Concepts in Social Streams
Generating Pseudo-ground Truth for Detecting New Concepts in Social Streams
 
Semantic Search in E-Discovery
Semantic Search in E-DiscoverySemantic Search in E-Discovery
Semantic Search in E-Discovery
 
Semantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron DatabaseSemantic Annotation of the Cyttron Database
Semantic Annotation of the Cyttron Database
 
Semantic annotation, clustering and visualization
Semantic annotation, clustering and visualizationSemantic annotation, clustering and visualization
Semantic annotation, clustering and visualization
 

Recently uploaded

Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
Lokesh Kothari
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 

Dynamic Collective Entity Representations for Entity Ranking

  • 1. Dynamic Collective Entity Representations for Entity Ranking David Graus, Manos Tsagkias, Wouter Weerkamp, Edgar Meij, Maarten de Rijke
  • 2. 2
  • 3. 3
  • 4. 4 Entity search?  Index = Knowledge Base (= Wikipedia)  Documents = Entities  “Real world entities” have a single representation (in KB)
  • 5. 5 Representation is not static  People talk about entities all the time  Associations between words and entities change over time
  • 7. 7 Example 2: Social media chatter
  • 8. 8 Dynamic Collective Entity Representations  Use “collective intelligence” to mine entity descriptions to enrich representation.  Is like document expansion (add terms found through explicit links)  Is not query expansion (terms found through predicted links)
  • 9. 9 Advantages  Cheap: Change document in index, leverage tried & tested retrieval algorithms  Free “smoothing”: (e.g., tweets) may capture ‘newly evolving’ word associations (Ferguson shooting) and incorporate out-of-document terms  “move relevant documents closer to queries” (= close the gap between searcher vocabulary & docs in index)
  • 10. 10 Haven’t we seen this before?  Anchors & queries in particular have been shown to improve retrieval [1]  Tweets have been shown to be similar to anchors [2]  Social tags, same [3]  But:  in batch (i.e., add data, see how it affects retrieval)  single source [1] T. Westerveld, W. Kraaij, and D. Hiemstra. Retrieving web pages using content, links, urls and anchors. TREC 2001 [2] G. Mishne and J. Lin. Twanchor text: A preliminary study of the value of tweets as anchor text. SIGIR ’12 [3] C.-J. Lee and W. B. Croft. Incorporating social anchors for ad hoc retrieval. OAIR ’13
  • 11. 11 Description sourcesAnthropornis nordenskjoeldi Anthropornis Nordenskjoeld's Giant Penguin Eocene Oligocene Animal Chordate Aves Sphenisciformes Spheniscidae ... emperor penguin Nordenskjoeld's Giant Penguin Anthropornis nordenskjoeldi Nordenskjoeld's giant penguin Anthropornis Eocene birds Oligocene birds Extinct penguins Oligocene extinctions Bird genera KB Anchors KB Categories KB Redirects KB Links Anthropornis nordenskjoeldi Anthropornis nordenskjoeldi Web Anchors megafauna Tags Tweets biggest penguin anthropornis extinct penguin prehistoric birds Queries
  • 12. 12 Challenge  Heterogeneity 1. Description sources 2. Entities  Dynamic nature  Content changes over time
  • 13. 13 Method: Adaptive ranking  Supervised single-field weighting model  Features:  field similarity: retrieval score per field.  field “importance”: length, novel terms, etc.  entity “importance”: time since last update.  (Re-)learn optimal weights from clicks
  • 14. 14 Experimental setup 1. Data:  MSN Query log (62,841 queries + clicks (on entities))  Each query is treated as a time unit  For each query:  Produce ranking  Observe click  Evaluate ranking (MAP/P@1)  Expand entities (w/ dynamic descriptions)  [re-train ranker]
  • 15. 15 Main results  Comparing effectiveness of diff. description sources  Comparing adaptive vs. non-adaptive ranker performance
  • 17. 17 Feature weights over time Relativefeatureimportance No. of queries
  • 19. 19 In summary  Expanding entity representations with different sources enables better matching of queries to entities  As new content comes in, it is beneficial to retrain the ranker  Informing ranker of “expansion state” further improves performance
  • 20. 20 Thank you  (Also, thank you WSDM & SIGIR travel grants)

Editor's Notes

  1. first entities & structure, i get to show the mandatory entity search example
  2. you are not interested in documents but in things: person/artist kendrick lamar referring to him w/ his former stage name
  3. so it is like web search, but the units of retrieval are real life entities, so we can collect data for them
  4. This is what we try to leverage in this work
  5. July 31st, after August 7th -> Added content, new words associations
  6. this looks a bit extreme, because there’s swearing but there’s a serious intuition here; vocabulary gap (formal KB, informal chatter)
  7. our method aims to leverage this enrich representation + close the gap
  8. of collective int/descr sources
  9. we look at a scenario where the expansions come in a streaming manner
  10. Fielded document representation
  11. You could do vanilla retrieval. But two challenges arise; description sources differ along several dimensions (e.g., volume, quality, novelty) head entities are likely to receive a larger number of external descriptions than tail entities. content changes over time, so expansions may accumulate and “swamp” the representation
  12. Our solution is to dynamically learn how to combine fields into single representation, Features (more detail in paper); field similarity features (per field) = query–field similarity scores. field importance features (per field) to inform the ranker of the status of the field at that time (i.e., more and novel content) entity importance (to favor “recently” updated entities) (what about experimental setup?)
  13. Took all queries that yield Wiki clicks. Top-k retrieval, extract features Allows to track performance over time
  14. in this talk I focus on the contribution of sources and adaptive vs. static ranker
  15. 1. Each source contributes to better ranking; Tags/web anchors do best, tweets are significantly > KB 2. Dynamic sources have higher “learning rates” (suggests that newly incoming data is successfully incorporated) 3. Tags starts under web but approaches it; new tags improve [NEXT] To see the effect of incoming data, feature weights
  16. - Static go down, dynamic go up (suggests retraining is important w/ dynamic expansions) - Tweets marginally, but as we know KB+Tweets > KB, the tweets do help - Not shown; static expansions stay roughly the same [NEXT] Increasing field weight + increased performance suggests retraining is needed, next;
  17. 1. [LEFT] Lower performance overall (more data w/o more training queries) 2. [LEFT] Dynamic ones higher slopes; so newly incoming data does help even in static 3. [RIGHT] same patterns but tags+web do comparatively better (because of swamping?) [END] higher performance: retraining increases ranker’s ability in optimally combining descriptions into a single representation
  18. More data helps, but to optimally benefit you need to inform your ranker