In the session the question will be discussed how data from the public sector is generated and made publicly available. Very often, cumbersome processes are in place until finally data (e. g. from official statistics) reach Wikipeda. Often this is done through publication in Open Government Data portals and voluntary efforts to add official data as source in Wikidata.
In the talk, a redesign of the ecosystem of how public sector data is generated, distributed and made available will be presented, facilitating Semantic MediaWiki as data tool. The relation to Wikidata/Wikibase, Open Governmen Data portals and official statistics will be discussed.
2. www.kdz.or.at
Introduction
KDZ – Centre for Public
Administration Research
Open (Government) Data
Semantic MediaWiki
17. August 2019 · Seite 2
3. www.kdz.or.at
Open Data – a short introduction
Open Government Data means data from the public
sector that is published for everybody to use
without restrictions
Several Open Government Initiatives exist on
municipal, regional and federal level
The European Data portal collects information about
metadata published by member states
So you need:
metadata describing the data (type, format, license…) -> data.gv.at
a place where you can actually retrieve the data -> URL
Very often open data is the source for
Wikidata/Wikipedia
17. August 2019 · Seite 3
4. www.kdz.or.at
Semantic MediaWiki – a short
introduction
„Older sibling“ of Wikidata
Ecosystem of extensions around managing
data in your own MediaWiki
Enterprise ready, used in many large wikis
(open or closed) outside of WMF
Commercial support available
SMW installations can be datasource for
Wikidata
17. August 2019 · Seite 4
5. www.kdz.or.at
The problem:
We live in the age of
big data…
17. August 2019 · Seite 5
…but still a lot of data is
missing, closed or old,
especially data from „official“
sources (public sector data)
6. www.kdz.or.at
Examples for data needed to
monitor SDGs (on local level)
Children in child care facilities (SDG 4)
Number of male/female officials /public
servants (SDG 5)
Expenditure for renewable engergy (SDG 7)
Quality of housing (SDG 10)
Public parks (SDG 11)
Amount of waste managed (SDG 12)
Bike routes (SDG 13)
…
17. August 2019 · Seite 6
8. www.kdz.or.at
Example: Modal split
17. August 2019 · Seite 8
Wikidata: entity, but no data
Nothing in Wikipedia city articles
Manual list in article „Modal share“
https://en.wikipedia.org/wiki/Modal_share
https://de.wikipedia.org/wiki/Modal_Split
9. www.kdz.or.at
Example: Modal split
17. August 2019 · Seite 9
Data from 2006, 2016 and 2006 in
Nothing in Open Data Portals
Nothing at Statistics Austria
10. www.kdz.or.at
Data ecosystems: central register
Example: inhabitants
17. August 2019 · Seite 10
Cities register new residents in a central
database (ZMR)
ZMR delivers data quarterly to Statistics Austria
Some cities publish their own statistics on open
data portals (data.gv.at)
data has to be added to Wikidata manually
11. www.kdz.or.at
Possible solution: API, open data
City delivers data to central register
register offers (part of) data in API
Statistics Austria, open data portal and
other users (Wikidata) can access data
Probability: somewhat likely
17. August 2019 · Seite 11
12. www.kdz.or.at
Data ecosystems: finances
17. August 2019 · Seite 12
03/2019 (data from 2018) 10/2019
City agrees on spending report and send spending data
to regional government
Regional government sends data to Statistics Austria
KDZ buys data from Statistics Austria and publishes
them on www.offenerhaushalt.at
Metadata gets delivered to open data portal (data.gv.at)
13. www.kdz.or.at
Possible solution: publish often
City delivers data several times
Probability:
high for open data (already done)
unlikely for Wikidata: (not going to happen!)
17. August 2019 · Seite 13
mandatory
voluntary
voluntary (?)
14. www.kdz.or.at
Possible solution:
municipal data infrastructure
Cities operate a data infrastructure with
data input and API/export possibilities
data can be exported to/consumed by many
probability: likely for some data
17. August 2019 · Seite 14
18. www.kdz.or.at
Result: „Austrian Cities in Figures“
only PDF publication is
planned
~ 75 cities are included
(2098 municipalities)
data from 2018 will be
published in the document
„Cities in Figures 2019“ that
will be available in early 2020
Data will not be available
electronically
No open license
SMW has proven to be a
reliable infrastructure
Easy form based entering
of data
Own publication process
„approve data“, without
wiki know-how
CSV, RDF, JSON export
for reuse oft data
Visualisation options
(“result formats”)
17. August 2019 · Seite 18
19. www.kdz.or.at
How to improve the situation:
Basic steps
1. Convince public institutions to open up data
you need: open license, machine-readable
format & infrastructure
2. Publish metadata on open data portals
infrastructure is there, DCAT-AP
3. Put (some of) their data into Wikidata – as
an additional usecase
17. August 2019 · Seite 19
20. www.kdz.or.at
How to improve the situation:
Things to consider
If they need to build an open infrastructure,
there are 3 options:
1. Open up existing infrastructure (API)
2. Change to different infrastructure
3. Donate to open infrastructure (Wikidata,
Commons)
4. Build up open infrastructure
17. August 2019 · Seite 20
21. www.kdz.or.at
Options?
Man use case: runs
Wikidata
Complex installation
User interface for
entering and querying
data too complex
Where provenance
metadata is essential
For integration with
Wikibase
Main use case: manage
data in your own wiki
Simple editing using
forms
Editing restricted to
specific user groupos
Managing text and data
https://www.semantic-
mediawiki.org/wiki/Help:Refere
nce_and_provenance_data
17. August 2019 · Seite 21
22. www.kdz.or.at
More by Denny Vrandecic
17. August 2019 · Seite 22
https://twitter.com/vrandezo/status/1149012495497138178
23. www.kdz.or.at
Lets work together on bringing
the best data out of public
institutions so everyone can use
them as simple as possible!
17. August 2019 · Seite 23
Current data ecosystems are very complex,
cumbersome and old-fashioned and
therefore slow
There is not enough open data available
Summary