2. OVERVIEW
Wikidata is a critical AI asset in
many applications
Recent project of Wikimedia
(2012), edited collaboratively
Our research assesses the
quality of Wikidata and the
link between community
processes and quality
5. THE KNOWLEDGE GRAPH
STATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q84
London
Q334155
Sadiq Khan
P6
head of government
6. THE KNOWLEDGE GRAPH
ITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259
Ada Lovelace
Q84
London
Q334155
Sadiq Khan
P6
head of government
Q727
Amsterdam
Q515
city
Q6581097
male
Q59360
Labour party
Q145
United Kingdom
7. THE KNOWLEDGE GRAPH
ADDING CONTEXT TO STATEMENTS
Statements may include context
Qualifiers (optional)
References (required)
Two types of references
Internal, linking to another item
External, linking to webpage
7
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
8. THE KNOWLEDGE GRAPH
CO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community for routine tasks
9. OUR WORK
Influence of community make-up on outcomes
Effects of editing practice on outcomes
Data quality, as a function of its provenance
10. THE RIGHT MIX OF
USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017) What
Makes a Good Collaborative Knowledge Graph:
Group Composition and Quality in Wikidata.
International Conference on Social Informatics, 305-
322, Springer.
11. BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
Diversity can multiple effects
Moderate tenure diversity increases outcome quality
Interest diversity leads to increased group productivity
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)
12. OUR STUDY
Analysed the edit history of items
Used corpus of 5000 items, whose quality
has been manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity
literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality
14. DATA AND METHODS
Ordinal regression analysis, four models were trained
Dependent variable: 5000 labelled Wikidata items
Independent variables
Proportion of bot edits
Bot human edit proportion
Proportion of anonymous edits
Tenure diversity: Coefficient of variation
Interest diversity: User editing matrix
Control variables: group size, item age
18. LIMITATIONS AND FUTURE WORK
▪ Measures of quality over time required
▪ Sample vs Wikidata (most items C or lower)
▪ Other group features (e.g., coordination) not
considered
▪ No distinction between editing activities (e.g.,
schema vs instances, topics etc.)
▪ Different metrics of interest (topics, type of
activity)
18
19. THE DATA IS AS GOOD
AS ITS REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Information in a Collaborative
Knowledge Graph: an Evaluation of Wikidata External
References. International Semantic Web Conference,
542-558, Springer.
19
20. PROVENANCE IN WIKIDATA
Statements may include context
Qualifiers (optional)
References (required)
Two types of references
Internal, linking to another item
External, linking to webpage
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...
21. THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Data provenance increases trust in Wikidata
Lack of provenance hinders data reuse
Quality of references is yet unknown
Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.
22. OUR STUDY
Approach to evaluate quality of external references in
Wikidata
Quality is defined by the Wikidata verifiability policy
Relevant: support the statement they are attached to
Authoritative: trustworthy, up-to-date, and free of bias for supporting a
particular statement
Large-scale (the whole of Wikidata)
Bot vs. human-contributed references
23. RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references
authoritative?
▪I.e., do they match the author and publisher types from
the Wikidata policy?
RQ3 Can we automatically detect non-relevant
and non-authoritative references?
24. METHODS
TWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
▪Evaluate relevance & authoritativeness
of a reference sample
▪Create training set for machine
learning model
2. Machine learning
▪Large-scale reference quality prediction
RQ1 RQ2
RQ3
25. STAGE 1: MICROTASK CROWDSOURCING
▪3 tasks on Crowdflower
▪5 workers/task, majority voting
▪Test questions to select workers
25
Feature Microtask Description
Relevance T1 Does the reference support the statement?
Authoritativeness
T2 Choose author type from list
T3.A Choose publisher type from list
T3.B Verify publisher type, then choose sub-type from list
RQ1
RQ2
26. STAGE 2: MACHINE LEARNING
Compared three algorithms
Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., 2012 & Potthast et
al. 2008]
Baseline: item labels matching (relevance);
deprecated domains list (authoritativeness)
RQ3
Features
URL reference uses Subject parent class
Source HTTP code Property parent class
Statement item vector Object parent class
Statement object vector Author type
Author activity Author activity on references
27. DATA
1.6M external references (6% of total)
1.4M from two sources (protein KBs)
83,215 English-language references
Sample of 2586 (99% conf., 2.5% m. of error)
885 assessed automatically, e.g., links not working
or csv files
28. RESULTS: CROWDSOURCING
CROWDSOURCING WORKS
▪Trusted workers: >80% accuracy
▪95% of responses from T3.A confirmed in T3.B
Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k
T1 1701 references 457 218 75% 0.335
T2 1178 links 749 322 75% 0.534
T3.A 335 web domains 322 60 66% 0.435
T3.B 335 web domains 239 116 68% 0.391
29. RESULTS: CROWDSOURCING
MAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from 345 domains
Broken URLs deemed not relevant and not authoritative
RQ1
RQ2
31. RESULTS: CROWDSOURCING
DATA FROM GOVT. AND ACADEMIA
Most common author type (T2)
Organisation (78%)
Most common publisher types (T3)
Governmental agencies (37%)
Academic organisations (24%)
RQ2
32. RESULTS: MACHINE LEARNING
RANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random Forest 0.92 0.89
SVM 0.91 0.87
Authoritativeness
Baseline 0.53 0.16
Naïve Bayes 0.86 0.78
Random Forest 0.89 0.83
SVM 0.89 0.79
RQ3
33. LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links,
continuous control required
Lack of diversity in bot-added sources
Humans and bots are good at different things
34. LIMITATIONS AND FUTURE WORK
Studies with non-English sources
New approach for internal references
Deployment in Wikidata, including changes in
editing behaviour
35. THE COST OF FREEDOM:
ON THE ROLE OF
PROPERTY CONSTRAINTS
IN WIKIDATA
35
36. BACKGROUND
Wikidata is built by the community, from scratch
Editors are free to carry out any kind of edit
There is tension between editing freedom and
quality of the modelling
Property constraints have been introduced at a later
stage
Currently 18 constraints, but they are not enforced
36
Hall, A., McRoberts, S., Thebault-Spieker, J., Lin, Y., Sen, S., Hecht, B., & Terveen, L. (2017, May). Freedom versus standardization: structured data generation in a peer
production community. In Proceedingsof the 2017 CHI Conferenceon human fators in computing sytems(pp. 6352-6362). ACM.
37. OUR STUDY
Effects of property constraints on
Content quality, i.e., increasing user awareness
of property use
Diversity of expression
Editor behaviour, by increasing conflict level
38. ▪Several claims can be expressed for a statement, thanks
to qualifiers and references
38
Q84
London
Q334155
Sadiq Khan
P6
9 May 2016
https://www.london.gov.u
k/…
The cost of freedom: Claims
Q180589
Boris Johnson
4 May 2008
https://www.london.gov.u
k/…
41. METHODS
H1: Linear trend analysis of Cviolations
H2 and H3: Lagged, multiple regression models
to predict changes between Tn & Tn–1in KDscore and
Cscore
42. RESULTS
H1 was supported, but limited to some constraints
12 constraints out of 18 showed significant
variations along the time frame observed
Constraint with largest variation was type (i.e.,
property domain)
43. RESULTS
H2 was rejected, but more property constraints at the
beginning of a time frame lead to decreased knowledge
diversity
45. LIMITATIONS
Wikidata still in early state of development
Metrics need further refinement
Changes were made to constraints after our
analysis, which could produce new effects
46. LESSONS LEARNED
Editors seem to understand meaning of
property constraints
Low level of knowledge diversity and conflict
overall
Non-enforcement of constraints seems to have
only limited effect on community dynamics
Effects of when and how constraints are
introduced not explored yet
46
48. SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuously
study their effects on outcomes and community
References are high quality, though biases exist in terms of
choice of sources
Wikidata’s approach to knowledge engineering questions
existing theoretical and empirical literature