SlideShare una empresa de Scribd logo
1 de 37
Het Verrijkt Koninkrijk




                           NIOD Lunchlezing
                              08/01/2013
Johan van Doornik (UvA)                         Victor de Boer (VUA)
The Kingdom of the Netherlands
          During World War II
• History of German occupied Dutch society
  (1940-1945)
• 14 volumes, 30 parts, 18.000 pages
• Digitized version online in 2011, crashing the
  server
                      “Published between 1969 and 1991,
                      the 30 volumes still combine the
                      qualities of an authoritative work for a
                      general audience, and an inevitable
                      point of reference for scholars”
Clarin-VK: Verrijkt Koninkrijk



“The aim of this project is twofold; in the demonstrator part of
the project advanced tools and techniques are applied to
gather data on De Jong's perception of the much debated issue
of pillarization (Dutch: 'verzuiling') and group identity. In the
resource curation part of the project the corpus will be
enriched and made available to the CLARIN-community for
further research”
Verrijkt Koninkrijk Project

          NIOD: Historical research
          questions

          UvA: Representation of digital
          text, Named Entity extraction and
          consolidation, search prototype

          VUA: Enrichment of structured
          sources, internal and external
          linking. Hackathon

          DANS: Data storage and access.
Digitization and Search
     (the UvA part)
<book xmlns="http://www.loedejongdigitaal.nl" vk:id="nl.vk.d.5-I">
 <index vk:title="Inhoud" vk:id="nl.vk.d.5-I.1">
 <chapter vk:title="Lente 4 1" vk:number="1" vk:id="nl.vk.d.5-I.2">
   <section vk:title="" vk:id="nl.vk.d.5-I.2.1">
   <section vk:title="Oorlogsverloop en -perspectiej?" vk:id="nl.vk.d.5-I.2.2">
   <section vk:title="II. Midden-Oosten, lente 1941" vk:id="nl.vk.d.5-I.2.3">
     <subsection vk:id="nl.vk.d.5-I.2.3.1">
     <subsection vk:id="nl.vk.d.5-I.2.3.2">
       <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.1">Hoe kon Engeland ooit de oorlog winnen?</p>
       <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.2">Het is, achteraf gezien, volstrekt duidelijk ...
       <p vk:pdf-page-ref="22" vk:id="nl.vk.d.5-I.2.3.2.3">Deze conceptie was bemoedigend en dit ...
         <page vk:pdf-page="22" vk:original-page="14" vk:id="nl.vk.d.5-I.2.3.2.3.14">
           <backofbook-ref>
         </page>
         <header vk:id="nl.vk.d.5-I.2.3.2.3.15">HET BRITSE OORLOGSPLAN</header>men zich in Londen: in de ...
       <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.4">Hoe dat zij vooral Churchill ...
       <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.5">Had men dat in bezet Nederland vernomen ...
     </subsection>
   </section>
   <section vk:title="Publieke opinie" vk:id="nl.vk.d.5-I.2.4">
     <subsection vk:id="nl.vk.d.5-I.2.4.1">
       <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.1">Het verwachtingspatroon van een volk ...
       <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.2">1 Aangehaald in Butler ....
         <page vk:pdf-page="23" vk:original-page="15" vk:id="nl.vk.d.5-I.2.4.1.2.4">
           <backofbook-ref>
             <lemma-ref>Azoren</lemma-ref>
             <lemma-ref>Bomber Command</lemma-ref>
             <lemma-ref>Canarische eilanden</lemma-ref>
             <lemma-ref>Madeira</lemma-ref>
             <lemma-ref>Portugal</lemma-ref>
             <lemma-ref>Spanje</lemma-ref>
             <lemma-ref>Tsjechoslowakije</lemma-ref>
           </backofbook-ref>
         </page>
Back of the Book
Required specialized parsing:
  Pages (312, 316, …) and page ranges (210-215, …)
  See and See also references
  OCR correction for numbers (3I2 = 312, …)
  Verification of all page references
  Mapping page references to paragraph references
  Terms that span multiple pages in the back of book
  Layout not always as consistent as you would like
Counting elements
vk:book         30

vk:chapter      226

vk:section      1885

vk:subsection   4708

vk:p            86257

vk:quote        56547

vk:page         16922

vk:lemma        16186

vk:lemma-ref    148370
Resolver
     http://resolver.loedejongdigitaal.nl/nl.vk.d.5-II.6.1.2.2

country, collection, doc-type, volume, chapter, section, sub-section, paragraph

 <p vk:pdf-page-ref="338" vk:id="nl.vk.d.5-II.6.1.2.2">En in het algemeen leed de
 Geallieerde koopvaardij in de eerste zes maanden van '42 opnieuw zeer zware verliezen. Zij
 waren vooral gevolg van het feit dat de Amerikanen traag waren met het treffen van
 veiligheidsmaatregelen in de Caraïbische Zee en in de zeegebieden bij de Amerikaanse
 oostkust. Maandenlang vonden<i>U-Boote</i>daar een uiterst profijtelijk jachtterrein. Het
 aantal<i>U-Boote</i>nam ook steeds toe; in juli '41 waren er constant 65 in de vaart, in juli
 '42 140. Hitler bezat er toen 331 en er waren, doordat de<i>U-Boote</i>zich zo verspreid
 hadden, in de zeven maandenvan januari t.e.m. juli '42 slechts weinige vernietigd: 31. In die
 periode verloren de Geallieerden daartentegen per maand gemiddeld meer dan een half
 miljoen ton aan scheepsruimte. Het waren vooral die scheepsverliezen die de Geallieerde
 oorlogsleiders in de eerste helft van '42 voortdurend aanleiding gaven tot diepe bezorgdheid.
 Hoe haakten zij naar de dag waarop de Duitsers en Italianen uit NoordAfrika verdreven
 zouden zijn! Dan zou eindelijk de lange, schepen verslindende toevoerroute naar Egypte om
 Afrika heen door de zoveel kortere via de Straat van Gibraltar vervangen kunnen
 worden.</p>
Named Entities + Wikification
1. Natural Language Processing with FROG

2. Detecting names
  Machine learned detection using POS and capitalization


3. Linking to Wikipedia with ILPS tools
  Mussert                      Anton Mussert
  Avondklok                    Spertijd
  Nationale Padvindersraad     Padvinder
Verrijkt Koninkrijk and Linked Data
           (the VUA part)
What is Linked Open Data
 •Open data is about open licenses
 •Linked (Open) Data is about interoperability

``a term used to describe a
recommended best practice for
exposing, sharing, and connecting
pieces of data, information,
and knowledge on the Semantic
Web using URIs and RDF.’’ --Wikipedia

   ``Sharable, spreadable and nerd-
                friendly’’
                   -- Charlotte S H Jensen, kulturweb
Web of Documents (WWW)
Linked Documents
Web of Data
Linked Data
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
Linked Data:
NIOD and VK



       bbwo2:plaatje1.jpg
                                   4en5mei:Avonklok



                                  4en5mei:monumentX

    “Spertijd”


                 niod:Avondklok


                                   Dbpedia:Avondklok

          VK:paragraaf 1.2.3.4      DBPedia:Curfew
Niod      Named Entity                         Back of the
thesaurus     Results                            Book-index




                           Verrijkt Koninkrijk
Niod      Named Entity                         Back of the
thesaurus     Results                            Book-index




                           Verrijkt Koninkrijk
Niod
thesaurus     NIOD List of terms

• Used by NIOD library,           Rub Term
  archive, AV archive             4   Repressie
• Externally by 29 institutions       Voorlichting

                                      Kernwapens - Zie:
• 1408 terms: “Civil servants”,       Atoomwapens
                                  3   Atoomwapens
  “Anti-fascism”, “Arrival”
                                  2   Kolonialisme - Zie ook:
   – 12 ‘categories’: “Law,”          Dekolonisatie
     “Military history”,          8   Religie - Zie ook bij soorten
     “Countries”, etc.                afzonderlijk, bijv.: Christendom
Niod                                                       Niod Thesaurus (SKOS)
thesaurus



                                           niod:Uitrusting



                         niod:Gasmaskers                        niod:Transport
                                                                    Preferred: “Transport”
                                                                    Alternative: “Vracht”




     Niod termenlijst (XML)
Back of the                                                                       Back-of-the-Book Index (SKOS)
        Book-index
                                                                               botb:Amsterdam
                                                                   niod:botb-Blitzkrieg
                                                              niod:botb-Blitzkrieg
                                                           botb:Blitzkrieg




http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
Named Entities (SKOS)

     Named Entity
       Results


                                                                            entity:Maassluis
                                                                       entity:Amsterdam

                                                                   niod:botb-Blitzkrieg
                                                                niod:botb-Blitzkrieg
                                                           entity:Abraham Kuijper




http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
Linked Data




       Niod      Named Entity                         Back of the
     thesaurus     Results                            Book-index




                                Verrijkt Koninkrijk
Niod thesaurus
niod:oai_wo2_niod_nl_rec_102045

                                               subject



                                                              niod:Blitzkrieg

                     http://resolver.verrijktkoninkrijk.nl/
                              nl.vk.d.reg.4.1386
                                                                                   Skos:exactMatch


                                                  hasParRef         niod:botb-Blitzkrieg




          Koninkrijk                                           Back-of-the-Book Index
GTAA thesaurus



             gtaa:Oorlog
                                                                   Niod thesaurus


             subject
                                           Niod:Oorlog


                                         niod:Blitzkrieg



                                                  sameAs


                              http://resolver.verrijktkoninkrijk.nl/
                                       nl.vk.d.reg.4.1386




Koninkrijk                                 Back-of-the-Book Index
dbpedia:Minister-President


 dbpedia:Barend Biesheuvel




                                   dbpedia:Abraham Kuijper
entity:Barend Biesheuvel




                             Entity:Abraham Kuijper




 Koninkrijk
Geonames:Zuid-Holland



                                                                          32780

                        Geonames:Maassluis                 population


                                             coordinates
                                                                N 51° 55' 24'' E 4° 15' 0''




                                         Botb:Maassluis




Koninkrijk
The semantic server
“Give me all BBWO2 images linked to a
     VK paragraph through a niod
  thesaurus entity found in the text”
PREFIX niod: <http://purl.org/collections/nl/niod/>
prefix dc: <http://purl.org/dc/elements/1.1/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT DISTINCT *
WHERE {
?object dc:subject ?subj ;
  dc:relation ?img .
?subj skos:inScheme niod:ConceptScheme.
?subj skos:exactMatch ?bc.
?bc skos:inScheme niod:EntityScheme.
?bc niod:pRef ?pRef.
}
limit 100
“What placenames occur on which page
 and to which province do they belong”
PREFIX niod: <http://purl.org/collections/nl/niod/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT ?pl ?provname ?pref
WHERE
{
?s skos:inScheme niod:BotBScheme.
?s skos:prefLabel ?pl.
?s skos:closeMatch ?geo.
?geo <http://www.geonames.org/ontology#parentADM1>
?prov.
?prov <http://www.geonames.org/ontology%23name>
?provname.
?s niod:pageRef ?pref.
}
LIMIT 100
“Give me all occurrences of Prime
        Ministers in Het Koninkrijk”
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX niod: <http://purl.org/collections/nl/niod/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX dbp-prop: <http://nl.dbpedia.org/property/>
PREFIX dbp-res: <http://nl.dbpedia.org/resource/>
SELECT * WHERE {
?entity niod:nerClass niod:nerclass-per;
owl:sameAs ?dbpedia_entry;
niod:pRef ?pref.
?dbpedia_entry dbp-prop:functie dbp-res:Minister-
president_van_Nederland.
}
LIMIT 100
Hackathon




        Photos from Flickr user HackNY
Some issues
• Quality issues
  – OCR
  – Named Entity Recognition/Reconcilliation
  – Linkage


• Pillarization question

• Acceptability for historical research
?

Más contenido relacionado

Similar a Vk niod jan_2013

Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...
Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...
Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...Olaf Janssen
 
Verrijkt Koninkrijk presentation for DBPedia community meeting
Verrijkt Koninkrijk presentation for DBPedia community meetingVerrijkt Koninkrijk presentation for DBPedia community meeting
Verrijkt Koninkrijk presentation for DBPedia community meetingVictor de Boer
 
12 janssen wikiproject_verzetskranten
12 janssen wikiproject_verzetskranten12 janssen wikiproject_verzetskranten
12 janssen wikiproject_verzetskranteningeangevaare
 
Wikiproject Verzetskranten
Wikiproject VerzetskrantenWikiproject Verzetskranten
Wikiproject VerzetskrantenOlaf Janssen
 
Museums & Wikidata - studiedag Rubenianum
Museums & Wikidata - studiedag RubenianumMuseums & Wikidata - studiedag Rubenianum
Museums & Wikidata - studiedag RubenianumPACKED vzw
 
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
 Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe... Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...Olaf Janssen
 
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke Bibliotheek
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke BibliotheekNetwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke Bibliotheek
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke BibliotheekNetwerk Oorlogsbronnen
 
Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014Victor de Boer
 
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik
Wikipedia en de Koninklijke Bibliotheek:  samen een wereldwijd bereik Wikipedia en de Koninklijke Bibliotheek:  samen een wereldwijd bereik
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik Olaf Janssen
 

Similar a Vk niod jan_2013 (9)

Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...
Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...
Lunch lecture on Dutch World War 2 underground newspapers at the NIOD on 12-0...
 
Verrijkt Koninkrijk presentation for DBPedia community meeting
Verrijkt Koninkrijk presentation for DBPedia community meetingVerrijkt Koninkrijk presentation for DBPedia community meeting
Verrijkt Koninkrijk presentation for DBPedia community meeting
 
12 janssen wikiproject_verzetskranten
12 janssen wikiproject_verzetskranten12 janssen wikiproject_verzetskranten
12 janssen wikiproject_verzetskranten
 
Wikiproject Verzetskranten
Wikiproject VerzetskrantenWikiproject Verzetskranten
Wikiproject Verzetskranten
 
Museums & Wikidata - studiedag Rubenianum
Museums & Wikidata - studiedag RubenianumMuseums & Wikidata - studiedag Rubenianum
Museums & Wikidata - studiedag Rubenianum
 
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
 Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe... Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik - Netwe...
 
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke Bibliotheek
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke BibliotheekNetwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke Bibliotheek
Netwerkdag 2017 | Olaf Janssen | Wikipedia en de Koninklijke Bibliotheek
 
Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014Dutch Ships and Sailors Project @ WAI 2014
Dutch Ships and Sailors Project @ WAI 2014
 
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik
Wikipedia en de Koninklijke Bibliotheek:  samen een wereldwijd bereik Wikipedia en de Koninklijke Bibliotheek:  samen een wereldwijd bereik
Wikipedia en de Koninklijke Bibliotheek: samen een wereldwijd bereik
 

Más de Victor de Boer

One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebVictor de Boer
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesVictor de Boer
 
The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...Victor de Boer
 
UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...Victor de Boer
 
Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...Victor de Boer
 
Fahad Ali's slides for Machine to-machine communication in rural conditions ...
Fahad Ali's slides for Machine to-machine communication in rural conditions  ...Fahad Ali's slides for Machine to-machine communication in rural conditions  ...
Fahad Ali's slides for Machine to-machine communication in rural conditions ...Victor de Boer
 
Linking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa LoLinking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa LoVictor de Boer
 
Enriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based ExplorationEnriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based ExplorationVictor de Boer
 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)Victor de Boer
 
User-centered Data Science for Digital Humanities
User-centered Data Science for Digital HumanitiesUser-centered Data Science for Digital Humanities
User-centered Data Science for Digital HumanitiesVictor de Boer
 
Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)Victor de Boer
 
Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?Victor de Boer
 
DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017Victor de Boer
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Victor de Boer
 
Kasadaka and ICT4D at VU
Kasadaka and ICT4D at VUKasadaka and ICT4D at VU
Kasadaka and ICT4D at VUVictor de Boer
 
VU ICT4D symposium 2017 Francis Dittoh Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh  Mr. MeteoVU ICT4D symposium 2017 Francis Dittoh  Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh Mr. MeteoVictor de Boer
 
VU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van AartVU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van AartVictor de Boer
 
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...Victor de Boer
 
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture Victor de Boer
 

Más de Victor de Boer (20)

One day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic WebOne day workshop Linked Data and Semantic Web
One day workshop Linked Data and Semantic Web
 
Linked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media ArchivesLinked Data for Digital Humanities research at Media Archives
Linked Data for Digital Humanities research at Media Archives
 
The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...The Benefits of Linking Metadata for Internal and External users of an Audiov...
The Benefits of Linking Metadata for Internal and External users of an Audiov...
 
UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...UX Challenges of Information Organisation: Assessment of Language Impairment ...
UX Challenges of Information Organisation: Assessment of Language Impairment ...
 
Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...Interactive Dance Choreography Assistance presentation for ACE entertainment ...
Interactive Dance Choreography Assistance presentation for ACE entertainment ...
 
Fahad Ali's slides for Machine to-machine communication in rural conditions ...
Fahad Ali's slides for Machine to-machine communication in rural conditions  ...Fahad Ali's slides for Machine to-machine communication in rural conditions  ...
Fahad Ali's slides for Machine to-machine communication in rural conditions ...
 
Linking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa LoLinking African Traditional Medicine Knowledge - by Gossa Lo
Linking African Traditional Medicine Knowledge - by Gossa Lo
 
Enriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based ExplorationEnriching Media Collections for Event-based Exploration
Enriching Media Collections for Event-based Exploration
 
New Life for Old Media (NEM presentation)
New Life for Old Media  (NEM presentation)New Life for Old Media  (NEM presentation)
New Life for Old Media (NEM presentation)
 
User-centered Data Science for Digital Humanities
User-centered Data Science for Digital HumanitiesUser-centered Data Science for Digital Humanities
User-centered Data Science for Digital Humanities
 
Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)Linked Data for Audiovisual Archives (Guest lecture at NISV)
Linked Data for Audiovisual Archives (Guest lecture at NISV)
 
Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?Semantic Technology for Development: Semantic Web without the Web?
Semantic Technology for Development: Semantic Web without the Web?
 
DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017DIVE+ and Events at EVENTS2017
DIVE+ and Events at EVENTS2017
 
About Cultuurlink
About CultuurlinkAbout Cultuurlink
About Cultuurlink
 
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson Intro to Linked, Dutch Ships and Sailors and SPARQL handson
Intro to Linked, Dutch Ships and Sailors and SPARQL handson
 
Kasadaka and ICT4D at VU
Kasadaka and ICT4D at VUKasadaka and ICT4D at VU
Kasadaka and ICT4D at VU
 
VU ICT4D symposium 2017 Francis Dittoh Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh  Mr. MeteoVU ICT4D symposium 2017 Francis Dittoh  Mr. Meteo
VU ICT4D symposium 2017 Francis Dittoh Mr. Meteo
 
VU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van AartVU ICT4D symposium 2017 Chris van Aart
VU ICT4D symposium 2017 Chris van Aart
 
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
VU ICT4D symposium 2017 Gayo Diallo Towards a Digital African Traditional Hea...
 
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
VU ICT4D symposium 2017 Wendelien Tuyp: Boosting african agriculture
 

Vk niod jan_2013

  • 1. Het Verrijkt Koninkrijk NIOD Lunchlezing 08/01/2013 Johan van Doornik (UvA) Victor de Boer (VUA)
  • 2. The Kingdom of the Netherlands During World War II • History of German occupied Dutch society (1940-1945) • 14 volumes, 30 parts, 18.000 pages • Digitized version online in 2011, crashing the server “Published between 1969 and 1991, the 30 volumes still combine the qualities of an authoritative work for a general audience, and an inevitable point of reference for scholars”
  • 3. Clarin-VK: Verrijkt Koninkrijk “The aim of this project is twofold; in the demonstrator part of the project advanced tools and techniques are applied to gather data on De Jong's perception of the much debated issue of pillarization (Dutch: 'verzuiling') and group identity. In the resource curation part of the project the corpus will be enriched and made available to the CLARIN-community for further research”
  • 4. Verrijkt Koninkrijk Project NIOD: Historical research questions UvA: Representation of digital text, Named Entity extraction and consolidation, search prototype VUA: Enrichment of structured sources, internal and external linking. Hackathon DANS: Data storage and access.
  • 5. Digitization and Search (the UvA part)
  • 6.
  • 7. <book xmlns="http://www.loedejongdigitaal.nl" vk:id="nl.vk.d.5-I"> <index vk:title="Inhoud" vk:id="nl.vk.d.5-I.1"> <chapter vk:title="Lente 4 1" vk:number="1" vk:id="nl.vk.d.5-I.2"> <section vk:title="" vk:id="nl.vk.d.5-I.2.1"> <section vk:title="Oorlogsverloop en -perspectiej?" vk:id="nl.vk.d.5-I.2.2"> <section vk:title="II. Midden-Oosten, lente 1941" vk:id="nl.vk.d.5-I.2.3"> <subsection vk:id="nl.vk.d.5-I.2.3.1"> <subsection vk:id="nl.vk.d.5-I.2.3.2"> <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.1">Hoe kon Engeland ooit de oorlog winnen?</p> <p vk:pdf-page-ref="21" vk:id="nl.vk.d.5-I.2.3.2.2">Het is, achteraf gezien, volstrekt duidelijk ... <p vk:pdf-page-ref="22" vk:id="nl.vk.d.5-I.2.3.2.3">Deze conceptie was bemoedigend en dit ... <page vk:pdf-page="22" vk:original-page="14" vk:id="nl.vk.d.5-I.2.3.2.3.14"> <backofbook-ref> </page> <header vk:id="nl.vk.d.5-I.2.3.2.3.15">HET BRITSE OORLOGSPLAN</header>men zich in Londen: in de ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.4">Hoe dat zij vooral Churchill ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.3.2.5">Had men dat in bezet Nederland vernomen ... </subsection> </section> <section vk:title="Publieke opinie" vk:id="nl.vk.d.5-I.2.4"> <subsection vk:id="nl.vk.d.5-I.2.4.1"> <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.1">Het verwachtingspatroon van een volk ... <p vk:pdf-page-ref="23" vk:id="nl.vk.d.5-I.2.4.1.2">1 Aangehaald in Butler .... <page vk:pdf-page="23" vk:original-page="15" vk:id="nl.vk.d.5-I.2.4.1.2.4"> <backofbook-ref> <lemma-ref>Azoren</lemma-ref> <lemma-ref>Bomber Command</lemma-ref> <lemma-ref>Canarische eilanden</lemma-ref> <lemma-ref>Madeira</lemma-ref> <lemma-ref>Portugal</lemma-ref> <lemma-ref>Spanje</lemma-ref> <lemma-ref>Tsjechoslowakije</lemma-ref> </backofbook-ref> </page>
  • 8. Back of the Book Required specialized parsing: Pages (312, 316, …) and page ranges (210-215, …) See and See also references OCR correction for numbers (3I2 = 312, …) Verification of all page references Mapping page references to paragraph references Terms that span multiple pages in the back of book Layout not always as consistent as you would like
  • 9. Counting elements vk:book 30 vk:chapter 226 vk:section 1885 vk:subsection 4708 vk:p 86257 vk:quote 56547 vk:page 16922 vk:lemma 16186 vk:lemma-ref 148370
  • 10. Resolver http://resolver.loedejongdigitaal.nl/nl.vk.d.5-II.6.1.2.2 country, collection, doc-type, volume, chapter, section, sub-section, paragraph <p vk:pdf-page-ref="338" vk:id="nl.vk.d.5-II.6.1.2.2">En in het algemeen leed de Geallieerde koopvaardij in de eerste zes maanden van '42 opnieuw zeer zware verliezen. Zij waren vooral gevolg van het feit dat de Amerikanen traag waren met het treffen van veiligheidsmaatregelen in de Caraïbische Zee en in de zeegebieden bij de Amerikaanse oostkust. Maandenlang vonden<i>U-Boote</i>daar een uiterst profijtelijk jachtterrein. Het aantal<i>U-Boote</i>nam ook steeds toe; in juli '41 waren er constant 65 in de vaart, in juli '42 140. Hitler bezat er toen 331 en er waren, doordat de<i>U-Boote</i>zich zo verspreid hadden, in de zeven maandenvan januari t.e.m. juli '42 slechts weinige vernietigd: 31. In die periode verloren de Geallieerden daartentegen per maand gemiddeld meer dan een half miljoen ton aan scheepsruimte. Het waren vooral die scheepsverliezen die de Geallieerde oorlogsleiders in de eerste helft van '42 voortdurend aanleiding gaven tot diepe bezorgdheid. Hoe haakten zij naar de dag waarop de Duitsers en Italianen uit NoordAfrika verdreven zouden zijn! Dan zou eindelijk de lange, schepen verslindende toevoerroute naar Egypte om Afrika heen door de zoveel kortere via de Straat van Gibraltar vervangen kunnen worden.</p>
  • 11. Named Entities + Wikification 1. Natural Language Processing with FROG 2. Detecting names Machine learned detection using POS and capitalization 3. Linking to Wikipedia with ILPS tools Mussert Anton Mussert Avondklok Spertijd Nationale Padvindersraad Padvinder
  • 12.
  • 13. Verrijkt Koninkrijk and Linked Data (the VUA part)
  • 14. What is Linked Open Data •Open data is about open licenses •Linked (Open) Data is about interoperability ``a term used to describe a recommended best practice for exposing, sharing, and connecting pieces of data, information, and knowledge on the Semantic Web using URIs and RDF.’’ --Wikipedia ``Sharable, spreadable and nerd- friendly’’ -- Charlotte S H Jensen, kulturweb
  • 15. Web of Documents (WWW) Linked Documents
  • 17. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
  • 18. Linked Data: NIOD and VK bbwo2:plaatje1.jpg 4en5mei:Avonklok 4en5mei:monumentX “Spertijd” niod:Avondklok Dbpedia:Avondklok VK:paragraaf 1.2.3.4 DBPedia:Curfew
  • 19. Niod Named Entity Back of the thesaurus Results Book-index Verrijkt Koninkrijk
  • 20. Niod Named Entity Back of the thesaurus Results Book-index Verrijkt Koninkrijk
  • 21. Niod thesaurus NIOD List of terms • Used by NIOD library, Rub Term archive, AV archive 4 Repressie • Externally by 29 institutions Voorlichting Kernwapens - Zie: • 1408 terms: “Civil servants”, Atoomwapens 3 Atoomwapens “Anti-fascism”, “Arrival” 2 Kolonialisme - Zie ook: – 12 ‘categories’: “Law,” Dekolonisatie “Military history”, 8 Religie - Zie ook bij soorten “Countries”, etc. afzonderlijk, bijv.: Christendom
  • 22. Niod Niod Thesaurus (SKOS) thesaurus niod:Uitrusting niod:Gasmaskers niod:Transport Preferred: “Transport” Alternative: “Vracht” Niod termenlijst (XML)
  • 23. Back of the Back-of-the-Book Index (SKOS) Book-index botb:Amsterdam niod:botb-Blitzkrieg niod:botb-Blitzkrieg botb:Blitzkrieg http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
  • 24. Named Entities (SKOS) Named Entity Results entity:Maassluis entity:Amsterdam niod:botb-Blitzkrieg niod:botb-Blitzkrieg entity:Abraham Kuijper http://resolver.verrijktkoninkrijk.nl/nl.vk.d.reg.4.1386
  • 25. Linked Data Niod Named Entity Back of the thesaurus Results Book-index Verrijkt Koninkrijk
  • 26. Niod thesaurus niod:oai_wo2_niod_nl_rec_102045 subject niod:Blitzkrieg http://resolver.verrijktkoninkrijk.nl/ nl.vk.d.reg.4.1386 Skos:exactMatch hasParRef niod:botb-Blitzkrieg Koninkrijk Back-of-the-Book Index
  • 27. GTAA thesaurus gtaa:Oorlog Niod thesaurus subject Niod:Oorlog niod:Blitzkrieg sameAs http://resolver.verrijktkoninkrijk.nl/ nl.vk.d.reg.4.1386 Koninkrijk Back-of-the-Book Index
  • 28. dbpedia:Minister-President dbpedia:Barend Biesheuvel dbpedia:Abraham Kuijper entity:Barend Biesheuvel Entity:Abraham Kuijper Koninkrijk
  • 29. Geonames:Zuid-Holland 32780 Geonames:Maassluis population coordinates N 51° 55' 24'' E 4° 15' 0'' Botb:Maassluis Koninkrijk
  • 31. “Give me all BBWO2 images linked to a VK paragraph through a niod thesaurus entity found in the text” PREFIX niod: <http://purl.org/collections/nl/niod/> prefix dc: <http://purl.org/dc/elements/1.1/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT DISTINCT * WHERE { ?object dc:subject ?subj ; dc:relation ?img . ?subj skos:inScheme niod:ConceptScheme. ?subj skos:exactMatch ?bc. ?bc skos:inScheme niod:EntityScheme. ?bc niod:pRef ?pRef. } limit 100
  • 32. “What placenames occur on which page and to which province do they belong” PREFIX niod: <http://purl.org/collections/nl/niod/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> SELECT ?pl ?provname ?pref WHERE { ?s skos:inScheme niod:BotBScheme. ?s skos:prefLabel ?pl. ?s skos:closeMatch ?geo. ?geo <http://www.geonames.org/ontology#parentADM1> ?prov. ?prov <http://www.geonames.org/ontology%23name> ?provname. ?s niod:pageRef ?pref. } LIMIT 100
  • 33.
  • 34. “Give me all occurrences of Prime Ministers in Het Koninkrijk” PREFIX dcterms: <http://purl.org/dc/terms/> PREFIX niod: <http://purl.org/collections/nl/niod/> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX dbp-prop: <http://nl.dbpedia.org/property/> PREFIX dbp-res: <http://nl.dbpedia.org/resource/> SELECT * WHERE { ?entity niod:nerClass niod:nerclass-per; owl:sameAs ?dbpedia_entry; niod:pRef ?pref. ?dbpedia_entry dbp-prop:functie dbp-res:Minister- president_van_Nederland. } LIMIT 100
  • 35. Hackathon Photos from Flickr user HackNY
  • 36. Some issues • Quality issues – OCR – Named Entity Recognition/Reconcilliation – Linkage • Pillarization question • Acceptability for historical research
  • 37. ?

Notas del editor

  1. Also: how did the perspective change
  2. VIC: Add numbers