Wikipedia for Researchers talk, as given at the British Library.
The first part covers Wikipedia as a resource for researchers, looking at how it works, how to judge the reliability of content, and how to use Wikipedia as a starting point to access other resources.
The second part looks at how Wikipedia is used by researchers as a subject or a corpus, and gives an overview of the kinds of research being done on Wikipedia.
2. About Wikipedia & Wikimedia
Wikimedia
Movement and charitable body
80,000 contributors in 280 languages and
eleven core projects
Image repository, dictionary, news site…
…read by 7% of the world!
Wikipedia
19,000,000 articles, 4,000,000 in English
6,500 articles and 235,000 edits per day
(…and ten years ago, this was all fields…)
2
3. …so what is Wikipedia?
…an encyclopedia
…written neutrally and verifiably
…using previously published information
…free to use, distribute, or reuse
…a collaborative community
…with no firm rules
3
4. Internal processes
All edits are visible through watchlists and page histories
About 7% are vandalism or malicious; processes to detect
these
Median time to correction < 2 minutes… but some stay much
longer
Individual discussion pages for all articles – “talk”
Quality review and assessment process
Specialised “wikiproject” working groups and central noticeboards
eg/ content topics; style; dispute resolution; copyright; etc.
4
5. Quality of Wikipedia
On average… it’s not bad
In 2005 four errors per article, versus three in Britannica
In 2011, in English, Spanish & Arabic:
“…the Wikipedia articles in this sample scored higher overall than the
comparison articles with respect to accuracy, references, style/
readability and overall judgment…”
Millions of articles – so many are, individually, problematic
Various ways of identifying “signs” of quality
Markers for quality are both obvious and subtle
Very effective “springboard” tool
5
6. Looking for quality
Corner icons
- article locked down in some way
- featured or “good” quality
Problem tags
Article talk pages and histories
Style
Badly written or formatted articles = often neglected
6
8. Moving on to other content
Other languages – not translations, and may have more content
Mousing over footnote markers
Within the references:
Links through DOIs and other identifiers
ISBNs go to a special landing page
…and then out to libraries, booksellers, etc
ISSNs go to WorldCat
If an author, look for authority control links:
8
9. Preferences
Available to logged in users
Two particularly useful options:
New window for external links (Gadgets > Browsing)
Quality assessment in headers (Gadgets > Appearance)
Many others - mostly editor-oriented tools
9
10. Looking for sets of material
Some tools available – http://www.toolserver.org
Complex to use, but rewarding
CatScan: look for intersection of categories
“all physicists born in 1912” – 51 in English, 34 in German
Full dumps of all data available – http://dumps.wikipedia.org
10
11. Research about Wikipedia
Thriving research around Wikipedia community & content
by mid-2011, 2100 peer-reviewed articles and 38 PhD theses
Active research committee and WMF support
Regular report - http://meta.wikimedia.org/wiki/Research:Newsletter
also @wikiresearch
Major themes include:
Community and content creation
Reading and researching by users
Quality of content
Technical research
11
12. Research on communities
Research on the Wikipedia communities:
Dynamics of community conflict, discussions, collaboration,
voting, contribution, mentoring…
Demographics, motivation and specialisms of contributors
Patterns of growth and content creation/deletion
Effect of central programs on volunteer activity
Cross-cultural interaction
12
13. Research on users
Research on usage of Wikipedia:
Specific searching behaviour
Patterns of usage (yearly, daily)
Tracking external events (eg swine flu) through Wikipedia
Search engine rankings
Change in usage by students
Effect of Wikipedia publication on wider literature
13
14. Research on content
Research on the content of Wikipedia:
Evolution of content
Accuracy, coverage and quality
Biases – geographic, cultural, gender
Linguistic analysis
Visualisations of content
Effect of external publications on Wikipedia
14
15. Research on technical aspects
Research on the technical side of Wikipedia:
Extensive work on scaling open-content services
Tools for detecting and handling vandalism
Algorithmic detection and identification of bias, spam
Practical research on uses of wikis
15
16. Research example – visualising art history
http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
16
17. Research example – visualising editing patterns
17
http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
18. Research example – editor activity
http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
18