A practical case study of how to create Linked Open Data for 1.300 Dutch underground newspapers from World War 2 using Wikipedia, DBpedia and an old paper book.
Lecture given by Olaf Janssen - Wikimedia & Open Data coordinator for the National Library of the Netherlands (KB) - for students of the master's course "Digital Access to Cultural Heritage" at Leiden University on 27-2-2017
Unit-IV; Professional Sales Representative (PSR).pptx
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Lecture Leiden University 27-2-2017
1. Linked Open Data case study:
WW2 underground newspapers on Wikipedia
Digital Access to Cultural Heritage, Leiden University, 23-2-2017
Olaf Janssen (Koninklijke Bibliotheek)
olaf.janssen@kb.nl - @ookgezellig
2. What I hope you’ll learn today
Using LOD theory (lecture René Voorburg) in practice
1. How to give a new life to an old paper book
2. How to get 1.300 newspapers from WW2 on Wikipedia
While doing 1 and 2:
3. The advantages of Linked Open Data
(= downsides of unconnected data sources)
See this lecture on Slideshare:
https://www.slideshare.net/OlafJanssenNL/linked-open-data-case-study-illegal-newspapers-ww2-wikipedia-
dbpedia-lecture-leiden-university-2722017
7. By OlafJanssen (Own work) [CC BY-SA 4.0 (http://creativecommons.org/licenses/by-sa/4.0)], -
https://commons.wikimedia.org/wiki/File:Kluisdeur_bij_het_NIOD_in_Amsterdam.jpg
After the war 1.300 newspaper titles
were (physically) preserved
at the NIOD.
The national Institute for
War, Holocaust
and Genocide Studies
in Amsterdam
By OSeveno (Own work) [CC BY-SA 4.0
(http://creativecommons.org/licenses/by-sa/4.0)], -
https://commons.wikimedia.org/wiki/File:02_(NIOD)_2016_(Herengracht_380
-382,_Amsterdam).jpg
8. By Romaine - Own work, CC0, https://commons.wikimedia.org/w/index.php?curid=37072767
17. URL of this page:
http://www.delpher.nl/nl/kranten/results?coll=d
ddtitel&cql[]=ppn+any+(107123223)
107123223 = PPN
= unique ID of this title in Delpher
(same for KB catalogue)
Once more, De Geus onder studenten
18. In Delpher you can read and (word)search
these newspapers…
• Scans
• Full-text OCR
• ALTO
19. But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this newspaper?
• Who wrote it?
• Where was this newspaper printed?
• How was it distributed?
• Were there any relations with other underground newspapers?
• Etc…
20. But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this newspaper?
• Who wrote it?
• Where was this newspaper printed?
• How was it distributed?
• Were there any relations with other underground newspapers or
resistance groups?
• Etc…
21. Under “Details” perhaps?
But say, I want to know more about this newspaper
• What sort of illegal newspaper was it?
• What is the history of this newspaper?
• Who wrote it?
• Where was this newspaper printed?
• How was it distributed?
• Were there any relations with other underground newspapers?
• Etc…
22. OK ok, some metadata…
.. but no real contextual info
25. Big drawback of Delpher
(and KB catalogue)
No contextual information
about WW2 underground newspapers
https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
26. Question:
Where would many people go to find
contextual information about historic newspapers?
Probably Wikipedia! (via Google)
28. Report on interest in WW2 among Dutch population
http://www.oorlogsbronnen.nl/gebruikersonderzoek2015, May 2015
29. Many of us use the internet to search for
information [..]. We often mention Wikipedia…
General audience
30. Everything is of course on Wikipedia. Just type in a
name and you can read entire essays...
People > 60
31. Over half of us think that Wikipedia and Google
contribute to our knowledge and understanding of
history
School/students
32. Over half of us think that Wikipedia and Google
contribute to our knowledge and understanding of
history
When we have to find information about WW2 outside
the class setting, we fully concentrate on digital
resources like Google and Wikipedia.
School/students
36. 1. There are very few illegal
newspapers with their own WP articles
2. The inventory of these newspapers
on WP is far from complete
<<< 1.300
titles
38. https://thejungleisneutral.files.wordpress.com/2013/11/lost.jpg
So: We badly need contextual information about illegal
newspapers. Where do we get it?
De Ondergrondse Pers 1940-1945
Lydia E. Winkel, H. de Vries , 1989,
ISBN 9021837463,
Veen Uitgevers
This paper book
contains contextual entries
about all 1.300
illegal newspapers
39. Entry 199 – De Geus; (onder studenten)
Unique ID
(within the book)
47. We need it digital!!
http://https://knowledgeutopia.files.wordpress.com/2014/01/hollandhouselibraryblitz1940.j
pg
48. We need it digital!!
1. Clear copyright with copyright holder (NIOD)
Open CC-BY-SA license
2. Scan & OCR
3. Convert into PDF
4. Put online: NIOD site & Wikimedia Commons
49. DOP as PDF on NIOD website (CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945
50. DOP as PDF on Wikimedia Commons
https://commons.wikimedia.org/wiki/File:PDF_of_De_Ondergrondse_Pers_1940-1945_-_derde_druk_-_1989.pdf
51. De Winkel as PDF on NIOD website (CC-BY-SA)
http://www.niod.nl/nl/de-ondergrondse-pers-1940-1945
Saved us €13.330!
http://www.brill.com/dutch-underground-press-1940-1945
56. ... but the data sources are
unconnected
(and for 3+4: unstructured & not machine-readable)
To summarize:
a lot of data sources are available about these WW2
underground newspapers
1. Metadata (from KB catalogue)
2. Content (full-text from Delpher)
3. Context (from De Ondergrondse Pers, PDF)
4. Relations: newspaper places, persons, other newspapers
(DOP, PDF)
5. External resources about newspapers, places and persons
59. Wikiproject(*) Verzetskranten
Systematically and uniformly describe & interlink all
1.300 Dutch underground newspapers from WW2
on Dutch Wikipedia
tinyurl.com/verzetskranten (in Dutch)
* https://nl.wikipedia.org/wiki/Wikipedia:Wikiproject, https://en.wikipedia.org/wiki/Wikipedia:Wikiproject
60. Wikiproject(*) Verzetskranten
Systematically and uniformly describe & interlink all
1.300 Dutch underground newspapers from WW2
on Dutch Wikipedia
tinyurl.com/verzetskranten (in Dutch)
* https://nl.wikipedia.org/wiki/Wikipedia:Wikiproject, https://en.wikipedia.org/wiki/Wikipedia:Wikiproject
Reach a big audience:
80% of Dutch people
use Wikipedia
61. From 14 1.300 titles
https://nl.wikipedia.org/wiki/Categorie:Illegale_pers_in_de_Tweede_Wereldoorlog
62. Wikiproject(*) Verzetskranten
Systematically and uniformly describe & interlink all
1.300 Dutch underground newspapers from WW2
on Dutch Wikipedia
tinyurl.com/verzetskranten (in Dutch)
We need a
database!
76. Build central database
Step 2: Convert Excel into
RDF triplestore
(=special kind of online database anybody can access)
• Steps 1-4 from http://linda-project.eu/linked-
data-primer-2
• Step 4: Vocubulary used = Bibframe
(http://bibframe.org/vocab)
77. Build central database
Step 3: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable, structured version of Wikipedia
• DBpedia = hub for linking different data sets on the Web to each
other Linked Open Data cloud
• Connect persons & places in newspaper database to external
resources via DBpedia
78. Step 1c: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable , structured version of Wikipedia
DBpedia allows you to ask sophisticated queries against Wikipedia, and to link
the different data sets on the Web to Wikipedia data
• Connect persons & places in newspaper database to external
resources via DBpedia
http://lod-cloud.net/versions/2010-09-22/lod-cloud_colored.png
Linked Open Data cloud
79. Build central database
Step 3: Link to external resources
• Step 5 from http://linda-project.eu/linked-data-primer-2
• DBpedia = machine-readable, structured version of Wikipedia
• DBpedia = hub for linking different data sets on the Web to each
other Linked Open Data cloud
• We use DBpdia to connect persons & places in our newspaper
database to information in other databases
81. Build central database
Added value of Linked Open Data & DBpedia
Software can automatically query for additional
information about places and persons mentioned in
DOP that is not available in
• KB catalogue
• Delpher
• DOP itself
82.
83. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between newspaper
Delpher & KB-cat (via PPNs)
Relations Links between newspapers
• Newspaper Places places & persons external
• Newspaper Persons sources (via DBpedia)
• Newspaper Other newspapers
(PPNs)
84. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between newspaper
Delpher & KB-cat (via PPNs)
Relations Links between newspapers
• Newspaper Places places & persons external
• Newspaper Persons sources (via DBpedia)
• Newspaper Other newspapers
(PPNs)
85. Summary: data about 1.300 newspapers
Available online Structured data (RDF-triples)
Open license (CC-BY-SA) Open standard (RDF)
Contextual information Links between newspaper
Delpher & KB-cat (via PPNs)
Relations Links between newspapers
• Newspaper Places places & persons external
• Newspaper Persons sources (via DBpedia)
• Newspaper Other newspapers
(PPNs)
88. Using an article template we can generate
1.300 uniform Wikipedia article stubs from
the LOD triple store
https://c1.staticflickr.com/9/8281/7699231918_11a7356c38_b.jpg
89. LOD database + article template = Wikipedia article stub
104. Link to KB catalogue record
Get from triple store
105.
106. These snippets/labels are identical for
all 1.300 newspapers
Hard coded in template
https://github.com/ookgezellig/verzetskranten/blob/master/app/Resources/views/defaul
t/wiki.html.twig
107. Grey = Wikipedia article stub
• From triple store (using SPARQL)
• Hard coded fixed strings in template
114. This bit
was added manually
to expand stub into full article
Crowdsourcing by
Dutch Wikipedia community
https://nl.wikipedia.org/wiki/De_Geus_onder_studenten_(verzetsblad)
115. A group of Wikipedia volunteers is currently
working to expand the 1.300 stubs…
gradually creating more and more full articles.
Door Sebastiaan ter Burg [CC BY 2.0 (http://creativecommons.org/licenses/by/2.0)], via Wikimedia Commons
116. Before the project
(2015)
Overview of full articles on illegal newspapers @WP:NL
https://nl.wikipedia.org/wiki/Categorie:Nederlandse_illegale_pers_in_de_Tweede_Wereldoorlog
117. The number of
articles is growing
steadily…
Overview of full articles on illegal newspapers @WP:NL
https://nl.wikipedia.org/wiki/Categorie:Nederlandse_illegale_pers_in_de_Tweede_Wereldoorlog
118. … making many Dutch
people happy!
http://www.formerdays.com/2011/05/dutch-liberation.html
120. Suggested reading
• http://www.ted.com/talks/tim_berners_lee_on_the_next_web
20 years ago, Tim Berners-Lee invented the World Wide Web. For his next project, he's building a web for open, linked data
that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it
together.
• https://en.wikipedia.org/wiki/Linked_data
Wikipedia article related to the above video
• http://5stardata.info/en/
The 5 stars of Linked Open Data (Tim Berners-Lee)
• http://linda-project.eu/linked-data-primer-2/
Short primer about creating LOD in practice, starting from an Excel sheet
• http://www.programmableweb.com/news/how-linked-data-solved-digital-age-marketing-
problem/analysis/2015/08/31
The figure near the bottom of the first page is a good illustration of the concept of (linked) triples
• https://en.wikipedia.org/wiki/DBpedia
• https://en.wikipedia.org/wiki/Semantic_network
http://www.gettyimages.nl/detail/nieuwsfoto's/three-women-of-the-ats-light-up-together-ats-regulations-nieuwsfotos/3094265