SlideShare una empresa de Scribd logo
1 de 41
Automatic Heritage Metadata
         Enrichment with Historic Events

         Marieke van Erp, Johan Oomen, Roxane Segers, Chiel van den
         Akker, Lora Aroyo, Geertje Jacobs, Susan Legêne, Lourens van
            der Meij, Jacco van Ossenbruggen and Guus Schreiber
                          http://agora.cs.vu.nl


7/4/11                              MW2011
The Agora Project
         •   2009-2013

         •   VU Amsterdam (CS + History Departments),
             Netherlands Institute for Sound and Vision, Rijksmuseum
             Amsterdam

         •   Funded by the Netherlands Organisation for Scientific
             Research (NWO) within the CATCH research
             programme




                                     MW2011
7/4/11
“Enabling anything like seamless access to the cultural record
         will require developing tools to navigate among vast catalogs of
         born-digital and digitized materials, […]
         The return on this investment will be a humanities and social
         science cyberinfrastructure that will allow new questions to be
         asked, new patterns and relations to be discerned, and deep
         structures in language, society, and culture to be exposed and
         explored.”

7/4/11                                 MW2011
                                                   http://www.acls.org/programs/Default.aspx?id=644
Gabriel Metsu
          (17th century Dutch painer)




7/4/11   MW2011
7/4/11   Venue
Networked heritage




                               ?
7/4/11                        MW2011
Europeana ~15,578,850




7/4/11            MW2011
Europeana Thoughtlab




7/4/11                 MW2011
Europeana Thoughtlab




7/4/11                 MW2011
Baseline:
matching





7/4/11           MW2011
Baseline:
matching

                          Metadata for the
                              object




7/4/11           MW2011
Baseline:
matching

                          Metadata for the
                              object




                            Controlled place name
                           from a vocabulary at the
                                Rijskmuseum


7/4/11           MW2011
A
"more
specific
Egypt"?




7/4/11             MW2011
7/4/11   Venue
date   Venue
were present at…




date                      Venue
date   Venue
Location is...




date             Venue
date   Venue
role is…

date          Venue
date   Venue
is part of…




date   Venue
why enrichment?
         • Historical context is missing
           • What happened before/after
           • ‘Grand narratives’
         • Based on keyword search
           • Exact matches
         • ...and manual annotation is costly




7/4/11                        MW2011
http://www.bl.uk/learning/timeline/
7/4/11   MW2011
Simple Event Model




7/4/11           MW2011
The Pipeline




7/4/11        MW2011
The Pipeline




7/4/11        MW2011
Recognising Events

         • Pattern-based approach to find Event
           names
          • “during the” <NP>
          • “after the” <NP>

7/4/11                      MW2011
Recognising Events
         • Machine learning based approach to
           recognise persons and locations
          • Retrained Stanford NER system for
             Dutch
         • Regular expressions to recognise temporal
           expressions
          • [0-9]{1,2}/[0-9]{1,2}/[0-9]{4}
7/4/11                       MW2011
The Pipeline




7/4/11        MW2011
Linking Event Elements

         • Check which pairs of event names and
           persons, locations or times co-occur most
           within a Wikipedia paragraph
         • Rank by most frequent

7/4/11                      MW2011
The Pipeline




7/4/11        MW2011
Event Instances




7/4/11         MW2011
Thesaurus 1   Thesaurus 2        Direct Links   Links Via
                                                          Events
         Rijksmuseum      Sound and
                                               -            20
             events    Vision locations
         Rijksmuseum     Sound and
                                               -            15
             events     Vision people
         Rijksmuseum      Sound and
                                               -           300
             people    Vision locations
         Rijksmuseum     Sound and
                                               7           297
           locations    Vision people
         Rijksmuseum    Rijksmuseum
                                                           488
             events       locations
         Rijksmuseum    Rijksmuseum
                                                           395
             events         people




7/4/11                               MW2011
New Links




7/4/11      MW2011
Event-driven Browsing




7/4/11            MW2011
Conclusions

         • Events provide a framework for collection
           data enrichment
         • Language technology can be used to
           identify events
         • Events provide meaningful links between
           different collections


7/4/11                        MW2011
Future Work




7/4/11       MW2011
Future Work

         • Fine-tuning event extraction approach
         • English version of the system
         • User involvement to improve event
           relations
         • User-generated narratives

7/4/11                      MW2011
?               ?
        ¿                       ¿

                        ?
            ¿
    ?                       ¿

                    ?
    ¿
credits
         • publications etc. http://agora.cs.vu.nl
         • Merci Web & Media Group at VU University
           for inspiration & images
         • Follow us on Twitter
          • @agoraproject
          • @johanoomen
7/4/11                       MW2011

Más contenido relacionado

Destacado (8)

Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
NewsReader: Automating detective work
NewsReader: Automating detective workNewsReader: Automating detective work
NewsReader: Automating detective work
 
KM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionKM2012 Lecture 1: introduction
KM2012 Lecture 1: introduction
 
DeRiVE opening
DeRiVE openingDeRiVE opening
DeRiVE opening
 
2 ontologies I
2 ontologies I2 ontologies I
2 ontologies I
 
Automatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterAutomatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from Twitter
 
KM Lecture11 nlp/nif
KM Lecture11 nlp/nifKM Lecture11 nlp/nif
KM Lecture11 nlp/nif
 
Knowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAKnowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QA
 

Similar a Automatic Heritage Metadata Enrichment with Historic Events

Status EuropeanaConnect May 2010
Status EuropeanaConnect May 2010Status EuropeanaConnect May 2010
Status EuropeanaConnect May 2010
Max Kaiser
 
Digital Infrastructures for Research (DI4R) 2017
Digital Infrastructures for Research (DI4R) 2017Digital Infrastructures for Research (DI4R) 2017
Digital Infrastructures for Research (DI4R) 2017
LIBIS
 
Sealinc media kickoff-wp6
Sealinc media kickoff-wp6Sealinc media kickoff-wp6
Sealinc media kickoff-wp6
Lora Aroyo
 

Similar a Automatic Heritage Metadata Enrichment with Historic Events (20)

Geographic Information in the Carare and Athena Projects
Geographic Information in the Carare and Athena ProjectsGeographic Information in the Carare and Athena Projects
Geographic Information in the Carare and Athena Projects
 
PhDO May 20 2011
PhDO May 20 2011PhDO May 20 2011
PhDO May 20 2011
 
Audiovisual archives and digital humanities
Audiovisual archives and digital humanitiesAudiovisual archives and digital humanities
Audiovisual archives and digital humanities
 
Open Repositories 2014: Crowdsourced Transcription via IIIF
Open Repositories 2014: Crowdsourced Transcription via IIIFOpen Repositories 2014: Crowdsourced Transcription via IIIF
Open Repositories 2014: Crowdsourced Transcription via IIIF
 
Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up Sharing cultural heritage the linked open data way: why you should sign up
Sharing cultural heritage the linked open data way: why you should sign up
 
Workshop 6 access
Workshop 6 accessWorkshop 6 access
Workshop 6 access
 
"Virtual" VREs - bringing research into the curriculum
"Virtual" VREs - bringing research into the curriculum"Virtual" VREs - bringing research into the curriculum
"Virtual" VREs - bringing research into the curriculum
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
F1 kristin dill_pundit_dm2e
F1 kristin dill_pundit_dm2eF1 kristin dill_pundit_dm2e
F1 kristin dill_pundit_dm2e
 
When we come together
When we come togetherWhen we come together
When we come together
 
OpenAIRE at INFSO-RTD, Open Access Co-ordination Workshop, Brussels, May 2011
OpenAIRE at  INFSO-RTD, Open Access Co-ordination Workshop, Brussels, May 2011OpenAIRE at  INFSO-RTD, Open Access Co-ordination Workshop, Brussels, May 2011
OpenAIRE at INFSO-RTD, Open Access Co-ordination Workshop, Brussels, May 2011
 
NZMuseums - One Year On
NZMuseums - One Year OnNZMuseums - One Year On
NZMuseums - One Year On
 
Master Projects at Web&Media Group, VU University Amsterdam
Master Projects at Web&Media Group, VU University AmsterdamMaster Projects at Web&Media Group, VU University Amsterdam
Master Projects at Web&Media Group, VU University Amsterdam
 
Research and Development at Sound and Vision
Research and Development at Sound and Vision Research and Development at Sound and Vision
Research and Development at Sound and Vision
 
Status EuropeanaConnect May 2010
Status EuropeanaConnect May 2010Status EuropeanaConnect May 2010
Status EuropeanaConnect May 2010
 
Digital Infrastructures for Research (DI4R) 2017
Digital Infrastructures for Research (DI4R) 2017Digital Infrastructures for Research (DI4R) 2017
Digital Infrastructures for Research (DI4R) 2017
 
Introduction to LoCloud
Introduction to LoCloudIntroduction to LoCloud
Introduction to LoCloud
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
G00 holy locloud_introduction
G00 holy locloud_introductionG00 holy locloud_introduction
G00 holy locloud_introduction
 
Sealinc media kickoff-wp6
Sealinc media kickoff-wp6Sealinc media kickoff-wp6
Sealinc media kickoff-wp6
 

Más de Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
Marieke van Erp
 

Más de Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Automatic Heritage Metadata Enrichment with Historic Events

Notas del editor

  1. \n
  2. \n
  3. \n
  4. \n
  5. \n
  6. \n
  7. \n
  8. \n
  9. \n
  10. \n
  11. \n
  12. \n
  13. \n
  14. \n
  15. \n
  16. \n
  17. \n
  18. \n
  19. \n
  20. \n
  21. \n
  22. \n
  23. \n
  24. \n
  25. \n
  26. \n
  27. \n
  28. Entirely correct 68.4%, partially correct 13.3%\n
  29. Persons 77.08%, Location 65,8%\n
  30. \n
  31. \n
  32. \n
  33. 45.6% actor correct, 41.1% location correct, and 51.5% date is correct\n
  34. \n
  35. \n
  36. \n
  37. \n
  38. \n
  39. \n
  40. \n
  41. \n