A presentation on Wikidata for libraries and archives delivered on March 16, 2015 to the Metropolitan New York Library Council.
Contains minor edits and corrections from version presented.
Released under CC0.
2. Wikidata is a free linked database that can be
read and edited by
humans and machines.
3. Wikidata's goals
● Centralize interwiki links
● Centralize Wikipedia infoboxes
● Provide an interface for rich queries
● Structure the sum of all human knowledge
4. What you'll learn from this talk
● How to edit Wikidata
● Authority control properties
● Ontology
● Wikidata vocabulary
● Querying and other tools
● Wikidata & Commons plans
7. Items and properties
●
Each item and property has its own page
● Items
– Represent subjects: Douglas Adams, Challenger disaster
– Have identifiers like Q42, Q921090
–13,745,153 items
● Properties
– Represent attribute names: occupation, cause of
– Have identifiers like P106, P828
– 1,429 properties
8. Statements and claims
● Claims
– Claims are “triplets”
● Formally: subject, predicate, object
● In Wikidata: item, property, value
● Example: Douglas Adams, occupation, author
● Statements
– A claim is only part of a statement
– Statements also include:
● References
● Ranks
9. Qualifiers, ranks, references
● Qualifiers
– Qualifiers are properties used on claims rather than items
– “Yonkers population 12,733 at time (P585) 1860”
● Ranks
– Preferred, normal, deprecated
– Useful to mark outdated claims
● References
– Source of claim; provenance
– “... stated in (P248) 1860 United States Census”
10. More on Wikidata vocabulary
https://www.wikidata.org/wiki/Wikidata:Glossary
11. Wikipedia articles have a Wikidata item link in the
left navigation panel.
Wikidata link on Wikipedia
16. Finding properties
● Is there a property for “number of windows”?
● What was the ID of that property, again?
● Search
– In main site search box, prefix search term with “P:”
– “P:number of”, “P:occupation”
– Instant search doesn't work for properties, only items
● Browse
– https://www.wikidata.org/wiki/Wikidata:List_of_properties
^ bookmark this!
18. Waldseemuller map
https://www.wikidata.org/wiki/Q195801
● instance of (P31): map
● creator (P170): Martin Waldseemüller
● publication (P577): April 1507
● language of work (P407): Latin
● depicts (P180): Americas, Africa,
Europe, Asia, Pacific Ocean
● has part (P527): woodcut
● location (P276): Library of Congress
19. Tools
– Querying: Autolist, by Magnus Manske
● http://tools.wmflabs.org/autolist/autolist1.html
– Batch editing: Widar, by Magnus Manske
● https://tools.wmflabs.org/autolist/
– Software framework: Wikidata Toolkit, by Markus Kroetzsch et al.
● https://www.mediawiki.org/wiki/Wikidata_Toolkit
● https://github.com/Wikidata/Wikidata-Toolkit
20. Querying in Wikidata
Use case: Get a list of history books
Pseudo-query:
instance of: book AND genre: history
instance of: P31
book: Q571
genre: P136
history: Q309
Wikidata query in Autolist:
claim[31:571] AND claim[136:309]
22. Authority control on Wikidata
● “Identifier” properties link Wikidata items to entities in
external databases
● Also known as authority control numbers
● Examples:
– VIAF identifier (P214)
– Freebase identifier (P646)
– MusicBrainz artist ID (P434)
23. Identifier properties on Wikidata
Top 10 ID properties on Wikidata as of 2015-03-15
1) Freebase identifier (P646) 1,153,786
2) China administrative division code (P442) 742,240
3) VIAF identifier (P214) 527,786
4) GeoNames ID (P1566) 490,444
5) GND identifier (P227) 345,247
6) ITIS TSN (P815) 334,053
7) PlantList-ID (P1070) 331,179
8) Tropicos taxon name identifier (P960) 318,719
9) IMDb identifier (P345) 286,637
10) IPNI taxon name identifier (P961) 276,515
https://www.wikidata.org/wiki/Wikidata:Database_reports/List_of_properties/Top100
Identifier properties on WikidataIdentifier properties on Wikidata
24. Ontology
● “A specification of a conceptualization”
● Rooted in long-standing philosophical traditions
● Useful for:
– building concept hierarchies
– making common knowledge computable
26. Classes and instances
● Plato is a human is a animal
● Plato instance of human subclass of animal
● Instance: concrete object, individual
● Class: abstract object
28. Concept tree diagram: painting
http://tools.wmflabs.org/wikidata-todo/tree.html?q=3305213&rp=279&lang=en&method=d3
29. Ontology on Wikidata
● instance of (P31)
– rdf:type in RDF and OWL
– 12,692,181 usages
– Most popular Wikidata property
● subclass of (P279)
– “all instances of A are also instances of B”
– rdfs:subClassOf in RDF and OWL
– 190,095 usages
30. Ontology on Wikidata
● Last but not least: part of (P361)
– Third basic membership property
– Top-level “part-whole” relation
● Instance of, subclass of and part of are all transitive
● Transitive relation :
A subclass of B
B subclass of C
:. A subclass of C
https://www.wikidata.org/wiki/Help:Basic_membership_properties
31. subproperty of (P1647)
● How to link author (P50) and creator (P170)
– subproperty of (P1647)
– author subproperty of creator
● Wikidata supports claims about properties
● subproperty of (P1647) has semantics of
rdfs:subPropertyOf
32. Structured data for images
● Use case:
– A library, archive or other cultural institution wants to
make its images discoverable on Wikidata
● Structured Data on Wikimedia Commons
– Project of the Wikimedia Foundation (WMF)
33. Structured Data
for Wikimedia Commons
● Purpose
– use structured data for all media files on Wikimedia sites
– make it easier for users to read, translate and edit file
information
– enable developers to build better tools, both to
contribute and reuse Commons content.
https://commons.wikimedia.org/wiki/Commons:Structured_data/Overview
34. Structured Data for Commons: FAQ
● How long will it take?
–Completion will take years, but project will be released in stages.
–Forward-facing part of project is on hold.
● Is file meta data staying on Commons?
–Yes
● How will file information be structured?
–To be determined
● Will Commons still have templates?
–Yes
● Will Commons still have categories?
–Yes
38. Library ontology on Wikidata
● What is a book?
● Different levels of existence (FRBR)
– Work
– Expression - e.g., translation for a language
– Manifestation - edition, thing with an ISBN
– Item - physical copy
39. FRBR example
● Work
– Bible (Q1845)
● Expression
– Latin translation of the Bible
● Manifestation
– Vulgate (Q131175): a particular Latin translation
● Item
– Gutenberg Bible (Q158075): a physical book in New York Public Library(!)
40. Linking FRBR levels in Wikidata
Done by the property edition or translation of (P629)
● Gutenberg Bible (item)
– edition or translation of: Vulgate
● Vulgate (manifestation)
– edition or translation of: Bible
● Bible (work)
41. Trouble: what is an instance?
● Thing with a unique location in space and time
● Bible instance of religious text: incorrect?
– Gutenberg Bible instance of religious text: all agree
● Solution: Bible as information artifact
42. Information Artifact Ontology
● IAO sets forth a basic ontological distinction:
– Entities in the world
● Actual people: Grace Hopper, Barack Obama
● That very old tree
● Gutenberg Bible
– Information artifacts
● Entities about something in reality
● Concretized by some information bearer
43. Information Artifact Ontology
● Addresses problems in Dublin Core Metadata Initiative
(DCMI)
– Smith (2014)
http://ncor.buffalo.edu/2014/IAOW/IAO-Tutorial-Smith-Rio-Sep-2014.pptx
● Rooted in Basic Formal Ontology (BFO), which is used in
many scienitific ontologies
– Smith et al. (2007). Nature Biotechnology.
http://www.nature.com/nbt/journal/v25/n11/full/nbt1346.html
44. Summary
● Wikidata is a free knowledge base that anyone can edit
● Search properties by “p:search term”
● Wikidata items link to external databases through
identifier properties, e.g. VIAF identifier (P214)
● Wikidata supports FRBR and other ontologies