Slides for the MTSR2017 presentation on event enrichment in DIVE+ in the context of CLARIAH.
By: Victor de Boer, Liliana Melgar, Oana Inel, Carlos Martinez Ortiz, Lora Aroyo, and Johan Oomen
Abstract: Scholars currently have access to large heterogeneous media collections on the Web, which they use as sources for their research. Exploration of such collections is an important part in their research, where
scholars make sense of these heterogeneous datasets. Knowledge graphs which relate media objects, people and places with historical events can provide a valuable structure for more meaningful and serendipitous browsing. Based on extensive requirements analysis done with historians and media scholars, we present a methodology to publish, represent, enrich, and link heritage collections so that they can be explored by domain expert users. We present four methods to derive events from media object descriptions. We also present a case study where four datasets with mixed media types are made accessible to scholars and describe the building blocks for event-based proto-narratives in the knowledge graph.
Enriching Media Collections for Event-based Exploration
1. ENRICHING MEDIA COLLECTIONS
FOR EVENT-BASED EXPLORATION
Victor de Boer, Liliana Melgar, Oana Inel, Carlos Martinez Ortiz,
Lora Aroyo, and Johan Oomen
MTSR 2017
11. OPENIMAGES.EU
3,220 news broadcasts
Netherlands Institute for Sound & Vision
GTAA thesaurus
DELPHER.NL
197,199 Scans of Radio
bulletins
1937 – 1984
AMSTERDAM MUSEUM
73,447 cultural heritage objects
AM Thesaurus
TROPENMUSEUM
78,270 cultural heritage objects
SVNC thesaurus
DIVE+ Collections and Vocabularies
12. Interactive Exploration & Discovery in Context
linking objects to events and entities
building automatic storylines (proto-narratives)
Goal: develop explorable Knowledge Graph
14. Mapping to popular vocabularies
am:obj_22093 “Job Cohen”
am:contentPersonName
rdfs:subPropertyOf
dcterms:subject
1. Mapping to generic schema
DIVE+
15. Van Hage, W. R., Malaisé, V., Segers, R., Hollink, L., & Schreiber, G. (2011). Design
and use of the Simple Event Model (SEM). Web Semantics: Science, Services and
Agents on the World Wide Web, 9(2), 128-136.
Simple Event Model (SEM)
19. Original Metadata
Interpretation of content
Named Entity Recognition
Human computation
Hybrid pipeline
Where do we get events from?
- LIDO, CIDOC, EDM
- creationDateStart
- - Interpretation of object
- NLP tools, other pipelines
- - Crowdsourcing
- -Nichesourcing,
20. Original Metadata
am:Belgische opstand
am:besnijdenis
am:Beurs de Keyser
am:bevrijding
am:bezoekerscentrum
am:bibliotheken
am:Bijlmerramp
am:Boulevard of Broken Dreams
am:brand
am:brand van het oude stadhuis op de Dam
am:burgeroorlog
am:capitulatie
am:christendom geboorte van Christus
am:christendom kruisiging
am:christendom opstanding van Christus
am:christus aan het kruis
am:Christus schrijft op de grond
am:concert
"Fayence bord”
22. Description Event
Foto is genomen tijdens de Eerste Zuid
Nieuw-Guinea Expeditie
Eerste Zuid Nieuw-
Guinea Expeditie
"Foto is genomen tijdens de Eerste- of
de Tweede Zuid Nieuw-Guinea
Expeditie"
Tweede Zuid Nieuw-
Guinea Expeditie
"Masker gedragen tijdens oogstfeesten.
Het feest in kwestie is het Sokari spel dat
eenmaal per jaar wordt opgevoerd
gedurende zeven opeenvolgende
nachten na Nieuwjaar, medio april. …” Nieuwjaar
FROG NLP toolkit NER Event extraction
Victor Kramer
https://languagemachines.github.io/frog/
26. DIVE:MediaObject
Nieuws uit Indonesië:
opheffing van het KNIL
dive:depictedBy
sem:hasTimestamp
sem:Event
ANP:1950-08-11:50
dive:isRelatedTo
dive:relatedPlace
sem:hasPlace
dive:isRelatedTo
dive:relatedActor
sem:hasActor
dive:isRelatedTo
dive:relatedPlace
sem:hasPlace
sem:Time
25 Juli 1950
dive:depictedBy
sem:hasTimestamp
DIVE:MediaObject
Mannen bij het huis van Paul Spies
aan de Parapattan 42, Djakarta
dive:depictedBy
dive:depictedBy
dive:depictedBy
DIVE:MediaObject
ANP:1950-08-11:50
DIVE:MediaObject
Schaal
sem:Time
11 Augustus 1950
sem:Event
ontbindingsceremonie
sem:Place
Djakarta
sem:Place
Indonesië
Result: Explorable Knowledge graph
sem:Actor
“Mohammed Hatta”
27. DIVE+ Enrichments
Enrichment
method
Media
Objects Actors Places Events Other Alignments
OI Crowd + NER 3,204 1,249 1,412 1,916 185,846 623
NB
Interpreted +
NER 197,200 194,890 54,571 197,200 6,736 6,353
AM
original
thesaurus 73,447 66,966 5,973 148 28,047 6,865
TM
original
thesaurus +
FROG NER 78,226 27,829 3,896 23* 13,269 -
Total 352,077 290,934 65,852 199,264 233,898 -
*) more to come
31. DIVE+ UI: INFINITY OF EXPLORATION
/ Support exploration and serendipity /
/ Visual inspection of media objects and entities /
/ Lets user build, save and share Proto-Narratives/
42. / Generic data model for connecting
heterogeneous media collections
/ Various data enrichment strategies to construct
explorable event-centric knowledge graphs
/ DIVE+ Case Study
Take
home
45. Current work: (Common) Event thesaurus?
Februaristaking
WOII
Februaristaking
“De oproep 'Staakt!'
voor deelname aan
de februaristaking te
Amsterdam op 25 en
26 februari 1941. “
stakingen
Eduard Hellendoorn
"Joseph Eijl Eduard Hellendoorn
Hermanus Coenradi 13 maart 1941
gefusilleerd Waalsdorpervlakte"
Waalsdorpervlakte
Jessie Both & Didi de hooge