Quality and collaboration in Wikidata

QUALITY AND
COLLABORATION
IN WIKIDATA
Elena Simperl and
Alessandro Piscopo
University of Southampton, UK
@esimperl

OVERVIEW
Wikidata is a critical AI asset in
many applications
Recent project of Wikimedia
(2012), edited collaboratively
Our research assesses the
quality of Wikidata and the
link between community
processes and quality

BASIC FACTS
Collaborative knowledge graph
100k registered users, 35M items
Open licence
RDF exports, connected to Linked Open Data Cloud

THE KNOWLEDGE GRAPH
STATEMENTS, ITEMS, PROPERTIES
Item identifiers start with a Q, property identifiers
start with a P
5
Q84
London
Q334155
Sadiq Khan
P6
head of government

THE KNOWLEDGE GRAPH
ITEMS CAN BE CLASSES, ENTITIES, VALUES
6
Q7259
Ada Lovelace
Q84
London
Q334155
Sadiq Khan
P6
head of government
Q727
Amsterdam
Q515
city
Q6581097
male
Q59360
Labour party
Q145
United Kingdom

THE KNOWLEDGE GRAPH
ADDING CONTEXT TO STATEMENTS
Statements may include context
 Qualifiers (optional)
 References (required)
Two types of references
 Internal, linking to another item
 External, linking to webpage
7
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...

THE KNOWLEDGE GRAPH
CO-EDITED BY BOTS AND HUMANS
Human editors can register or work anonymously
Bots created by community for routine tasks

OUR WORK
Influence of community make-up on outcomes
Effects of editing practice on outcomes
Data quality, as a function of its provenance

THE RIGHT MIX OF
USERS
Piscopo, A., Phethean, C., & Simperl, E. (2017) What
Makes a Good Collaborative Knowledge Graph:
Group Composition and Quality in Wikidata.
International Conference on Social Informatics, 305-
322, Springer.

BACKGROUND
Wikidata editors have varied tenure and interests
Group composition impacts outcomes
 Diversity can multiple effects
 Moderate tenure diversity increases outcome quality
 Interest diversity leads to increased group productivity
Chen, J., Ren, Y., Riedl, J.: The effects of diversity on group productivityand member withdrawalin online volunteer groups. In: Proceedingsof the 28th international
conference on human factors in computing systems - CHI ’10. p. 821. ACM Press, New York, USA (2010)

OUR STUDY
Analysed the edit history of items
Used corpus of 5000 items, whose quality
has been manually assessed (5 levels)*
Edit history focused on community make-up
Community is defined as set of editors of item
Considered features from group diversity
literature and Wikidata-specific aspects
*https://www.wikidata.org/wiki/Wikidata:Item_quality

RESEARCH HYPOTHESES
Activity Outcome
H1 Bots edits Item quality
H2 Bot-human interaction Item quality
H3 Anonymous edits Item quality
H4 Tenure diversity Item quality
H5 Interest diversity Item quality

DATA AND METHODS
 Ordinal regression analysis, four models were trained
 Dependent variable: 5000 labelled Wikidata items
 Independent variables
 Proportion of bot edits
 Bot human edit proportion
 Proportion of anonymous edits
 Tenure diversity: Coefficient of variation
 Interest diversity: User editing matrix
 Control variables: group size, item age

RESULTS
ALL HYPOTHESES SUPPORTED
H1
H2
H3 H4
H5

LESSONS LEARNED
The more is not
always the
merrier
01
Bot edits are key
for quality, but
bots and humans
are better
02
Diversity matters
03

IMPLICATIONS
Encourage
registration
01
Identify
further areas
for bot editing
02
Design
effective
human-bot
workflows
03
Suggest items
to edit based
on tenure and
interests
04

LIMITATIONS AND FUTURE WORK
▪ Measures of quality over time required
▪ Sample vs Wikidata (most items C or lower)
▪ Other group features (e.g., coordination) not
considered
▪ No distinction between editing activities (e.g.,
schema vs instances, topics etc.)
▪ Different metrics of interest (topics, type of
activity)
18

THE DATA IS AS GOOD
AS ITS REFERENCES
Piscopo, A., Kaffee, L. A., Phethean, C., & Simperl, E.
(2017). Provenance Information in a Collaborative
Knowledge Graph: an Evaluation of Wikidata External
References. International Semantic Web Conference,
542-558, Springer.
19

PROVENANCE IN WIKIDATA
Statements may include context
 Qualifiers (optional)
 References (required)
Two types of references
 Internal, linking to another item
 External, linking to webpage
Q84
London
Q334155
Sadiq Khan
P6
head
of government
9 May 2016
https://www.london.gov.uk/...

THE ROLE OF PROVENANCE
Wikidata aims to become a hub of references
Data provenance increases trust in Wikidata
Lack of provenance hinders data reuse
Quality of references is yet unknown
Hartig, O. (2009). Provenance Information in the Web of Data. LDOW, 538.

OUR STUDY
Approach to evaluate quality of external references in
Wikidata
Quality is defined by the Wikidata verifiability policy
 Relevant: support the statement they are attached to
 Authoritative: trustworthy, up-to-date, and free of bias for supporting a
particular statement
Large-scale (the whole of Wikidata)
Bot vs. human-contributed references

RESEARCH QUESTIONS
RQ1 Are Wikidata external references relevant?
RQ2 Are Wikidata external references
authoritative?
▪I.e., do they match the author and publisher types from
the Wikidata policy?
RQ3 Can we automatically detect non-relevant
and non-authoritative references?

METHODS
TWO STAGE MIXED APPROACH
1. Microtask crowdsourcing
▪Evaluate relevance & authoritativeness
of a reference sample
▪Create training set for machine
learning model
2. Machine learning
▪Large-scale reference quality prediction
RQ1 RQ2
RQ3

STAGE 1: MICROTASK CROWDSOURCING
▪3 tasks on Crowdflower
▪5 workers/task, majority voting
▪Test questions to select workers
25
Feature Microtask Description
Relevance T1 Does the reference support the statement?
Authoritativeness
T2 Choose author type from list
T3.A Choose publisher type from list
T3.B Verify publisher type, then choose sub-type from list
RQ1
RQ2

STAGE 2: MACHINE LEARNING
Compared three algorithms
 Naïve Bayes, Random Forest, SVM
Features based on [Lehmann et al., 2012 & Potthast et
al. 2008]
Baseline: item labels matching (relevance);
deprecated domains list (authoritativeness)
RQ3
Features
URL reference uses Subject parent class
Source HTTP code Property parent class
Statement item vector Object parent class
Statement object vector Author type
Author activity Author activity on references

DATA
1.6M external references (6% of total)
 1.4M from two sources (protein KBs)
83,215 English-language references
 Sample of 2586 (99% conf., 2.5% m. of error)
 885 assessed automatically, e.g., links not working
or csv files

RESULTS: CROWDSOURCING
CROWDSOURCING WORKS
▪Trusted workers: >80% accuracy
▪95% of responses from T3.A confirmed in T3.B
Task No. of microtasks Total workers Trusted workers Workers’ accuracy Fleiss’ k
T1 1701 references 457 218 75% 0.335
T2 1178 links 749 322 75% 0.534
T3.A 335 web domains 322 60 66% 0.435
T3.B 335 web domains 239 116 68% 0.391

MAJORITY OF REFERENCES ARE HIGH QUALITY
2586 references evaluated
Found 1674 valid references from 345 domains
Broken URLs deemed not relevant and not authoritative
RQ1
RQ2

HUMANS ARE BETTER AT EDITING REFERENCES
RQ1
RQ2

DATA FROM GOVT. AND ACADEMIA
Most common author type (T2)
 Organisation (78%)
Most common publisher types (T3)
 Governmental agencies (37%)
 Academic organisations (24%)
RQ2

RESULTS: MACHINE LEARNING
RANDOM FORESTS PERFORM BEST
F1 MCC
Relevance
Baseline 0.84 0.68
Naïve Bayes 0.90 0.86
Random Forest 0.92 0.89
SVM 0.91 0.87
Authoritativeness
Baseline 0.53 0.16
Naïve Bayes 0.86 0.78
Random Forest 0.89 0.83
SVM 0.89 0.79
RQ3

LESSONS LEARNED
Crowdsourcing+ML works!
Many external sources are high quality
Bad references mainly non-working links,
continuous control required
Lack of diversity in bot-added sources
Humans and bots are good at different things

LIMITATIONS AND FUTURE WORK
Studies with non-English sources
New approach for internal references
Deployment in Wikidata, including changes in
editing behaviour

THE COST OF FREEDOM:
ON THE ROLE OF
PROPERTY CONSTRAINTS
IN WIKIDATA
35

BACKGROUND
Wikidata is built by the community, from scratch
Editors are free to carry out any kind of edit
There is tension between editing freedom and
quality of the modelling
Property constraints have been introduced at a later
stage
Currently 18 constraints, but they are not enforced
36
Hall, A., McRoberts, S., Thebault-Spieker, J., Lin, Y., Sen, S., Hecht, B., & Terveen, L. (2017, May). Freedom versus standardization: structured data generation in a peer
production community. In Proceedingsof the 2017 CHI Conferenceon human fators in computing sytems(pp. 6352-6362). ACM.

OUR STUDY
Effects of property constraints on
Content quality, i.e., increasing user awareness
of property use
Diversity of expression
Editor behaviour, by increasing conflict level

▪Several claims can be expressed for a statement, thanks
to qualifiers and references
38
Q84
London
Q334155
Sadiq Khan
P6
9 May 2016
https://www.london.gov.u
k/…
The cost of freedom: Claims
Q180589
Boris Johnson
4 May 2008
https://www.london.gov.u
k/…

RESEARCH HYPOTHESES
Activity Outcome
H1 Property constraints Property perspicuity
H2 Property constraints Knowledge diversity
H3 Property constraints Level of conflict

METRICS
▪ Property perspicuity: V = Nviolations/Nclaims
▪ Knowledge diversity: KDscore = Nclaims/Nstatements
▪ Controversy metric:
▪ Conflicting edits
▪ Cscore = Nconfl.edits/Nedits (0> Cscore>>1)
40

METHODS
H1: Linear trend analysis of Cviolations
H2 and H3: Lagged, multiple regression models
to predict changes between Tn & Tn–1in KDscore and
Cscore

RESULTS
H1 was supported, but limited to some constraints
12 constraints out of 18 showed significant
variations along the time frame observed
Constraint with largest variation was type (i.e.,
property domain)

RESULTS
H2 was rejected, but more property constraints at the
beginning of a time frame lead to decreased knowledge
diversity

RESULTS
H3 was rejected, constraints lead to fewer conflicts

LIMITATIONS
Wikidata still in early state of development
Metrics need further refinement
Changes were made to constraints after our
analysis, which could produce new effects

LESSONS LEARNED
Editors seem to understand meaning of
property constraints
Low level of knowledge diversity and conflict
overall
Non-enforcement of constraints seems to have
only limited effect on community dynamics
Effects of when and how constraints are
introduced not explored yet
46

SUMMARY OF FINDINGS
Collaboration between human and bots is important
Tools needed to identify tasks for bots and continuously
study their effects on outcomes and community
References are high quality, though biases exist in terms of
choice of sources
Wikidata’s approach to knowledge engineering questions
existing theoretical and empirical literature

Quality and collaboration in Wikidata

Recomendados

Recomendados

Más contenido relacionado

Similar a Quality and collaboration in Wikidata

Similar a Quality and collaboration in Wikidata (20)

Más de Elena Simperl

Más de Elena Simperl (20)

Último

Último (20)

Quality and collaboration in Wikidata