Wikidata is a central data repository for Wikimedia projects that aims to provide structured data about entities in a fully multilingual and human- and machine-editable format. It currently focuses on centralizing cross-language relationships between Wikipedia articles in Phase I and will build structured data about entities harvested from Wikipedia and other sources in Phase II before enabling automatic list and chart generation in Phase III.
2. Wikidata summary
●
Central data repository for Wikimedia projects
●
Human- and machine-readable
●
Human- and machine-editable
●
Fully multilingual
●
Supports semantic relationships
www.wikidata.org
3. Overall plan
●
Phase I
– Centralise cross-language relationships
●
Phase II
– Centralise core structured data
●
Phase III
– Dynamic generation of list content
4. Phase I
●
Centralising all “interwiki” cross-language links
– Historically, a major maintenance headache!
●
Single conceptual entity => many articles
– ...some unexpected oddities arise; not all 1:1
●
Almost all entities now listed
●
Inclusion standards currently restricted
7. Phase II
●
Building structured data on these entities
●
“Phase 2.1” - harvesting data from Wikipedia
– and supplemented from other sources
●
“Phase 2.2” - displaying data on Wikipedia
– autogenerated information templates
10. Wikidata entities
●
Single entity corresponding to one or more
Wikipedia articles
– Name (in various languages) + WP links
– Contains various Phase II properties
– Properties can include sources/qualifiers
●
No support (yet!) for entities not existing in WP
12. Phase II – initial properties
●
Limited properties – gradual roll-outStandard
●
Single“main type”, but no restrictions on use
– “the capital of Julius Caesar”
●
Relational properties implemented
– but no automatic reciprocity yet
●
String datatypes created for identifiers
●
130 properties currently in use
13. Phase II – future properties
●
Properties created by community discussion
●
Several awaiting datatypes:
– time
– geocoordinate
– number (and dimension)
●
Qualifiers yet to be added
14. Data reuse
●
Permanent numeric identifier for all items
●
API available (JSON)
– but still being developed!
●
Regular XML dumps – dumps.wikimedia.org
– all item/property data licensed as CC-0
16. Tools
●
Examples of toolsets:
– GeneaWiki (visualise relations)
– Reasonator (display interface)
– Query API (experimental, alternative)
– Tree of Life (static dump)