Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

On Entities and Evaluation

680 visualizaciones

Publicado el

Keynote talk given at the 41st European Conference on Information Retrieval (ECIR 2019)

Publicado en: Ciencias
  • Sé el primero en comentar

On Entities and Evaluation

  1. 1. ON ENTITIES AND EVALUATION Krisztian Balog University of Stavanger
 @krisztianbalog Keynote given at the 41st European Conference on Informa<on Retrieval (ECIR '19) | Cologne, Germany, April 2019
  2. 2. SPECIAL THANKS TO • My former PhD advisor: • Maarten de Rijke • My former and current PhD students: • Jan R. Benetka, Richard Berendsen, Marc Bron, Heng Ding, Darío Garigliotti, Faegheh Hasibi, Trond Linjordet, Robert Neumayer, Shuo Zhang • Collaborators on material presented in this talk: • Po-Yu Chuang, Peter Dekker, Maarten de Rijke, Kristian Gingstad, Rolf Jagerman, Øyvind Jekteberg, Liadh Kelly, Tom Kenter, Phillip Schaer, Anne Schuth, Narges Tavakolpoursaleh
  3. 3. Part I ON ENTITIES
  4. 4. OUTLINE FOR PART I • What is an entity? 
 • Why care about entities? 
 • What research has been done on entities in IR? 
 • What’s next?
  5. 5. WHAT IS AN ENTITY? An entity is an object or thing that can be uniquely identified. entity catalog entity ID* name(s)*
  6. 6. AN ENTITY <dbr:Roger_Needham> <dbo:Scientist> <dbo:Person> <dbo:Agent> <owl:Thing> <rdf:type> <dbo:abstract> "1935-08-26" "Karen Spärck Jones" <foaf:name> <dbo:spouse> <University_of_Cambridge> <dbp:almaMater> <dbr:Natural_language_processing> <dbo:knownFor> <dbc:Information_retrieval_researchers> <dct:subject> <dbc:British_women_computer_scientists> <dbc:British_computer_scientists> <dbc:British_women_scientists> "Karen Spärck Jones FBA (26 August 1935 – 4 April 2007) was a British computer scientist." <dbr:Karen_Spark_Jones> <dbo:birthDate>
  7. 7. WHAT IS AN ENTITY? An entity is a uniquely identifiable object or thing, characterized by its name(s), type(s), attributes, and relationships to other entities.
  8. 8. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)* knowledge repository type(s)* descriptions relationships (non-typed links)
  9. 9. REPRESENTING ENTITIES 
 AND THEIR PROPERTIES entity catalog entity ID* name(s)* knowledge repository type(s)* descriptions relationships (non-typed links) knowledge base (KB) /
 knowledge graph (KG) attributes relationships (typed links)
  10. 10. WHY CARE ABOUT ENTITIES? • From a user perspective, entities ... • are natural units for organizing information • enable a richer and more effective user experience
  11. 11. WHY CARE ABOUT ENTITIES? • From a machine perspective, entities ... • allow for a better understanding of queries, document content, and of users • enable search engines to be more intelligent Michael Schumacher (born 3 January 1969) is a German retired racing driver. He is a seven-time Formula One World Champion and is widely regarded as one of the greatest Formula One drivers of all time. He won two titles with Benetton in 1994 and 1995 before moving to Ferrari where he drove for eleven years. His time with Ferrari yielded five consecutive titles between 2000 and 2004. Michael Schumacher Schuderia Ferrari Benetton Formula Racing driver Formula One constructor Formula One constructor Formula One Auto racing series
  12. 12. Part I En88es RESEARCH ON ENTITIES IN IR
  13. 13. TRENDS IN THE IR LITERATURE 0 10 20 30 40 2000 2002 2004 2006 2008 2010 2012 2014 2016 entity OR entities Wikipedia knowledge base knowledge graph Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW
  14. 14. TRENDS IN THE IR LITERATURE Numbers are based on boolean queries on paper titles from SIGIR, ECIR, CIKM, WSDM, and WWW 0 10 20 30 40 2000 2002 2004 2006 2008 2010 2012 2014 2016 entity OR entities Wikipedia OR "knowledge base" OR "knowledge graph"
  15. 15. #1 ENTITIES AS THE UNIT OF RETRIEVAL • A significant portion of queries mention or target entities • Those queries are better answered with a ranked list of entities (as opposed to a list of documents) • Term-based entity representations can be effectively ranked using document-based retrieval models • Semantically informed retrieval models utilize entity- specific properties (attributes, types, and relationships)
  16. 16. #2 ENTITIES FOR KNOWLEDGE REPRESENTATION • Entities help to bridge the gap between unstructured and structured data <entity> <entity> Entity linking Knowledge base population
  17. 17. #3 ENTITIES FOR AN ENHANCED SEARCH EXPERIENCE • Improve the search experience through the entire search process • Understanding search queries • Improving document retrieval performance • Query assistance services (auto-completion, suggestions) • Entity recommendations
  18. 18. WANT TO KNOW MORE? www.eos-book.org
  19. 19. OUTLINE FOR PART I • What is an entity? 
 • Why care about entities? 
 • What research has been done on entities in IR? 
 • What’s next?
  20. 20. SCENARIO #1 I would like to get some new strings for my guitar AIOK, would that be your electric guitar or the acoustic one? The electric one. AIAlright. I can repeat your Amazon order of 3 months ago, or you can go by a music store on Elm street on the way to your dentist appointment this afternoon.
  21. 21. TRULY PERSONAL AI 
 IS NOT POSSIBLE WITHOUT A PERSONAL KNOWLEDGE GRAPH
  22. 22. PERSONAL KNOWLEDGE GRAPHS A personal knowledge graph (PKG) is a source of structured knowledge about entities and the relation between them, where the entities and the relations between them are of personal, rather than general, importance.
  23. 23. PERSONAL KNOWLEDGE GRAPHS User Hometown Mom Social network Jamie High schoolGeneral -purpose KG Electric guitar E-commerce catalog Mom’s dentist Domain-specific KG Acoustic guitar Personal Knowledge Graph
  24. 24. Part I En88es A RESEARCH AGENDA 
 FOR PERSONAL KNOWLEDGE GRAPHS Part I En88es
  25. 25. #1 KNOWLEDGE REPRESENTATION • Task: representing entities and their properties • KGs are organized according to a knowledge model (schema) • Peculiarities/challenges: • Entities need to be (directly/indirectly) connected to the user • Not duplicating attributes, focusing on what is personal • Information about entities can be very sparse • Some entities may not have any digital presence • Strong temporality (relations can be ephemeral)
  26. 26. #1 KNOWLEDGE REPRESENTATION • Task: representing entities and their properties • KGs are organized according to a knowledge model (schema) • Peculiarities/challenges: • Entities need to be (directly/indirectly) connected to the user • Not duplicating attributes, focusing on what is personal • Information about entities can be very sparse • Some entities may not have any digital presence • Strong temporality (relations can be ephemeral) What is the best way of representing entities and their properties and relations, considering the vast but sparse set of possible predicates? RQ1
  27. 27. #2 SEMANTIC ANNOTATION OF TEXT • Task: annotating text with respect to a knowledge repository (commonly known as entity linking) • Usually involves mention detection, entity disambiguation, and NIL-detection steps • Challenges • Entities might have little to no digital presence • Entities are not necessarily proper nouns • Linking, NIL-detection, and KG population are intertwined
  28. 28. #2 SEMANTIC ANNOTATION OF TEXT • Task: annotating text with respect to a knowledge repository (commonly known as entity linking) • Usually involves mention detection, entity disambiguation, and NIL-detection steps • Challenges • Entities might have little to no digital presence • Entities are not necessarily proper nouns • Linking, NIL-detection, and KG population are intertwined How can entity linking be performed against a personal knowledge graph, where structured entity information to rely on is potentially absent? RQ2a When should entity linking be performed against a personal knowledge graph as opposed to a general- purpose KG? RQ2b
  29. 29. SCENARIO #2 I need to see a dentist. Mom recommended hers at dinner yesterday. AII can try to help you find this person. Do you have any more information? I reckon that him and Mom graduated from the same high school the same year. AIOK, that's enough to narrow it down. 
 It must be Dr. John Pullman. That must be him. I remember he had a fitting name. Can you try make an appointment for Thursday afternoon?
  30. 30. #3 POPULATION AND MAINTENANCE • Task: extending a KG from external sources (KB acceleration/population) or via internal inferencing • Verification of facts in the KG • Challenges: • Single curator; more automation is desired than for KGs, but the user should still be in control • The first mention of an entity should trigger population • Properties may be inferred from the context
  31. 31. #3 POPULATION AND MAINTENANCE • Task: extending a KG from external sources (KB acceleration/population) or via internal inferencing • Verification of facts in the KG • Challenges: • Single curator; more automation is desired than for KGs, but the user should still be in control • The first mention of an entity should trigger population • Properties may be inferred from the context How can personal knowledge graphs be automatically populated and reliably maintained? RQ3
  32. 32. SCENARIO #3 AISince you're running a half marathon at Hackney in May, may I suggest you undertake a 10k run this weekend? Yes, that sounds like a good idea. Any suggestions for a not too popular route 
 that I haven't done before? AISure thing. I'll upload some routes to the running app on your phone. Cheers mate!
  33. 33. #4 QUERYING • Task: Retrieving information (entities, types, relations, etc.) from the PKG or from KGs with the help of the PKG • Challenges: • Sparsity of data • Soft, subjective constraints
  34. 34. #4 QUERYING • Task: Retrieving information (entities, types, relations, etc.) from the PKG or from KGs with the help of the PKG • Challenges: • Sparsity of data • Soft, subjective constraints How to leverage the semantically rich but sparse information in personal knowledge graphs for answering natural language queries? RQ4
  35. 35. #5 INTEGRATION WITH EXTERNAL SOURCES • Task: recognizing the same entity across multiple data sources (a.k.a. object resolution, record linkage, ...) • Challenges: • One-to-many, as opposed to one-to-one linkage • Continuous process, not a one-off effort • Two-way synchronization would be desired • Conflicting facts or relations need resolving by the user
  36. 36. #5 INTEGRATION WITH EXTERNAL SOURCES • Task: recognizing the same entity across multiple data sources (a.k.a. object resolution, record linkage, ...) • Challenges: • One-to-many, as opposed to one-to-one linkage • Continuous process, not a one-off effort • Two-way synchronization would be desired • Conflicting facts or relations need resolving by the user How to provide continuous two-way integration with external knowledge sources with the user in the loop? RQ5
  37. 37. RESEARCH QUESTIONS 
 FOR PERSONAL KNOWLEDGE GRAPHS • What is the best way of representing entities and their properties and relations, considering the vast but sparse set of possible predicates? • How can entity linking be performed against a personal knowledge graph, where structured entity information to rely on is potentially absent? • How can personal knowledge graphs be automatically populated and reliably maintained? • How to leverage the semantically rich but sparse information in personal knowledge graphs for answering natural language queries? • How to provide continuous two-way integration with external knowledge sources with the user in the loop?
  38. 38. THERE IS MORE... • Implementation • Where is it stored (on the device, cloud, etc.)? • How can security and privacy be ensured? • How to interact with a range of services with proper access control? • Evaluation • How to build reusable test resources?
  39. 39. SUMMARY OF PART I • Progress on entity-oriented search was enabled by large open knowledge repositories • Personal AI is not possible without the concept of a personal knowledge graph • Many interesting research opportunities are available
  40. 40. Part II ON EVALUATION
  41. 41. OUTLINE FOR PART II • Online evaluation and why we need it
 • Living labs: methodology and lessons learned
 • What's next?
  42. 42. EVALUATION METHODOLOGIES • Offline evaluation ("TREC-style" studies) • Online evaluation • Lab-based studies • Simulation of users • ...
  43. 43. ONLINE EVALUATION 101 • See how regular users interact with a retrieval system when just using it • Observe implicit behavior • Clicks, skips, saves, forwards, bookmarks, likes, etc. • Try to infer differences in behavior from different flavors of the live system • A/B testing, interleaving • Run statistical tests to confirm the difference is not due to chance
  44. 44. CHALLENGES IN ONLINE EVALUATION • It's a live service • Complexity of modern SERPs • Data is noisy • There’s no “ground truth”
  45. 45. OFFLINE VS. ONLINE EVALUATION Offline Online Basic assumption Assessors tell you what is relevant Observable user behavior can tell you what is relevant Quality Data is only as good as the guidelines Real user data, real and representative information needs Realisticity Simplified scenario, cannot go beyond a certain level of complexity Perfectly realistic setting (users are not aware that they are guniea pigs) Assessment cost Expensive Cheap Scalability Doesn't scale Scales very well Repeatability Repeatable Not repeatable Throughput High Low Risk None High
  46. 46. THE COMMUNITY NEEDS 
 OPEN RESEARCH PLATFORMS FOR ONLINE EVALUATION
  47. 47. LIVING LABS Living labs is a new evaluation paradigm for IR, where the experimentation platform is an existing search engine. Researchers have the opportunity to replace components of this search engine and evaluate these components using interactions with real, "unsuspecting" users of this search engine.
  48. 48. OVERVIEW experimental systems users live site ? organizaOon
  49. 49. ALL WE NEED IS A SITE: LET'S TAKE AN EXISTING ONE
  50. 50. KEY IDEAS FOR OPERATIONALIZATION • An API orchestrates all the data exchange between sites (live search engines) and participants • Focus on frequent (head) queries • Enough traffic on them for experimentation • Participants generate rankings offline and upload these to the API • Eliminates real-time requirement • Freedom in choice of tools and environment K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua<on. CIKM'14
  51. 51. OVERVIEW experimental systems users live site API K. Balog, L. Kelly, andA. Schuth.Head First: Living Labs for Ad-hoc Search Evalua<on. CIKM'14
  52. 52. METHODOLOGY (1) experimental system users live site API • Sites make queries, candidate documents (items), historical search and click data available through the API
  53. 53. METHODOLOGY (2) experimental system users live site API • Rankings are generated (offline) for each query and uploaded to the API
  54. 54. METHODOLOGY (3) experimental system API • When any of the test queries is fired on the live site, it requests an experimental ranking from the API and interleaves it with that of the producOon system query interleaved ranking query experimental ranking
  55. 55. METHODOLOGY (3) experimental system API • When any of the test queries is fired on the live site, it requests an experimental ranking from the API and interleaves it with that of the producOon system query interleaved ranking query experimental ranking doc 1 doc 2 doc 3 doc 4 doc 5 doc 2 doc 4 doc 7 doc 1 doc 3 system A system B doc 1 doc 2 doc 4 doc 3 doc 7 interleaved list
  56. 56. METHODOLOGY (4) • ParOcipants get detailed feedback on user interacOons (clicks) experimental system users live site API
  57. 57. METHODOLOGY (5) • Evaluation measure: • where the number of “wins” and “losses” is against the production system, aggregated over a period of time • An Outcome of > 0.5 means beating the production system Outcome = #Wins #Wins + #Losses
  58. 58. LIMITATIONS • Head queries only: Considerable portion of traffic, but only popular info needs • Lack of context: No knowledge of the searcher’s location, previous searches, etc. • No real-time feedback: API provides detailed feedback, but it’s not immediate • Limited control: Experimentation is limited to single searches, where results are interleaved with those of the production system; no control over the entire result list
  59. 59. EVALUATION CAMPAIGNS Part II Evalua8on
  60. 60. EVALUATION CAMPAIGNS Product search (Hungarian toy store) Product search (Hungarian toy store) Academic search (CiteSeerX, SSOAR, Microsoft Academic) Academic search (CiteSeerX, SSOAR) Web search (Czech web search engine) OS‘16 OS‘17
  61. 61. TREC OPENSEARCH • Sites: academic search engines • Task: ad hoc scientific literature search • Multiple evaluation rounds (6 weeks each) • Train/test queries • Training queries: feedback on individual impressions • Test queries: only aggregated feedback at the end of the evaluation period
  62. 62. CITESEERX @TREC-OS 2016 Round 1 Round 2 Round 3 Impressions 359 571 4829 Clicks 144 128 651 ery 0 100 200 300 Numberofimpressions 2016 Round 1 - CiteSeerX ery 0 100 200 300 400 Numberofimpressions 2016 Round 2 - CiteSeerX ery 0 200 400 600 Numberofimpressions 2016 Round 3 - CiteSeerX ery 0 5 10 15 Numberofclicks 2016 Round 1 - CiteSeerX ery 0 5 10 15 20 Numberofclicks 2016 Round 2 - CiteSeerX ery 0 5 10 15 20 Numberofclicks 2016 Round 3 - CiteSeerX
  63. 63. EVALUATION RESULTS
 CITESEERX, TREC-OS 2016, ROUND #3 Wins Ties Losses Outcome p-value System 1 48 15 39 0.5517 0.3912 System 2 27 11 22 0.5510 0.5682 System 3 35 14 32 0.5224 0.8072 ... We would need to gather data for about six months for p 0.05 and for about a year for p 0.01 (assuming a similar win/loss ratio).
  64. 64. LESSONS LEARNED • Head first idea is feasible • Running multiple campaigns without major technical hurdles • Low traffic/click volume is an issue • No statistically significant differences observed • Possible remedy is to use more queries (tap into the long tail) • Main challenges are more of an organizational than of a technical nature • Nontrivial infrastructure development on the service providers’ side • Convincing large industrial partners as sites • Attracting a large and active set of participants R. Jagerman, K. Balog, and M. de Rijke.OpenSearch: Lessons Learned from an Online Evaluaon Campaign. Journal of Data and Informaon Quality, 2018.
  65. 65. ALL WE NEED IS A SITE: LET'S BUILD ONE
  66. 66. A SUCCESS STORY Part II Evalua8on
  67. 67. run by GroupLens, a research lab at the University of Minnesota
  68. 68. OFFLINE DATASETS • MovieLens-20M • 20M item ratings • 27K movies • 138K users • 465K tags • links to YouTube trailers for 25K movies
  69. 69. ONLINE EXPERIMENTATION WITH 
 NOVEL USER INTERFACES
  70. 70. PUBLICATIONS
  71. 71. BUILDING A SERVICE FOR SCIENTIFIC LITERATURE RECOMMENDATION Part II Evalua8on
  72. 72. ARXIVDIGEST: THE SERVICE • Recommendation service to help keep up with scientific literature published on arxiv.org • Users sign up and indicate their interests by providing keywords, Google Scholar/DBLP profile, etc. • Users receive recommendations regularly in a digest email • Articles can be liked • Users agree that their profile, the articles recommended to them, and their feedback would be made available to experimental systems
  73. 73. ARXIVDIGEST: THE EVALUATION PLATFORM • Broker-based architecture • RESTful API for accessing article and user data and for uploading recommendations • Participating teams are given a window each day to download new content and to generate recommendations for all users • Users receive interleaved rankings • Performance is monitored continuously over time
  74. 74. CURRENT STATUS AND OPPORTUNITIES • All components of the broker in place • https://github.com/iai-group/ArXivDigest • Ensuring GDPR-compliance is in progress • Opportunities for studying • Personalized recommender algorithms • Explainable recommendations • Interleaving • ... Fork m e on G itH ub
  75. 75. SUMMARY OF PART II • The community needs open online evaluation platforms • Lessons learned from previous evaluation benchmarks • Proposal: develop a service that we'd use ourselves
  76. 76. TAKE-HOME MESSAGES • A truly personal AI is not possible without a personal knowledge graph • The community needs open research platforms for online evaluation
  77. 77. THANK YOU!

×