A lecture given at the Moore Institute at the National University of Ireland Galway. It lays out the case for archiving the web as a source for future scholarly enquiry; examines the state of play of web archiving in Ireland; outlines the broad use cases for the archived web; and presents results from research into creationism on the web in the UK and in Ireland.
Prospects and pitfalls in using web archives for research
1. A new class of primary source?
Prospects and pitfalls in using
web archives for research
Dr Peter Webster
Webster Research and Consulting
@pj_webster
6. The web its own archive?
Open UK Web Archive 2004-13 comparison.
@anjacks0n http://britishlibrary.typepad.co.uk/webarchive/2014/10/what-is-still-on-
the-web-after-10-years-of-archiving-.html
10. Reasons to care about web
archiving
โข education and research
โข enforcement of the law
โข public accountability
11. Three archives for the UK
Temporal scope Content scope Access
Open UKWA 2004-present Selective
(14.7k)
Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK)
Onsite
JISC UK
Domain Dataset
1996-2013 Comprehensive
(for .uk)
Index only
12. JISC UK Web Domain Dataset
(1996-2013)
โข copy of Internet Archive holdings for .uk
โข bought by JISC, held by British Library
โข 60TB of data
โข no direct access to content
โข prototype search at webarchive.org.uk/shine
โข derived datasets in public domain
13. Web archives for NI and RoI
Temporal scope Content scope Access
NLI Web
Archive
2011-present Selective (542) Online
PRONI Web
Archive
2010-present Selective (115) Online
Legal Deposit
UKWA
2013-present Comprehensive
(for UK!)
Onsite (TCD)
14. Ways to use the archived web
โข URL search -> single page
โข Full-text search -> single page
โข Visualisation -> trend -> page
18. Ways to use the archived web
โข URL search -> single page
โข Full-text search -> single page
โข Visualisation -> trend -> page
โข Direct access to WARC
โข Derived datasets
โข API access
19. Derived datasets from the BL
From JISC UK Web Domain Dataset (1996-
2010)
โข File format profile
โข Geo-index
โข Crawled URL Index (CDX)
โข Host Link Graph
Public domain at data.webarchive.org.uk
20. Creationism ?
โข non-evolutionary account of human
origins
โข modern
โข a long history
โข a feature of some parts of evangelicalism
โข (anti-evolutionism, Intelligent Design)
21. The creationist web :
three questions
A justified conspiracy theory about
marginalisation of creationist voices?
A real danger or a moral panic (Truth in
Science) ?
The web as friend of the marginalised
opinion?
http://peterwebster.me/2014/11/18/reading-creationism-in-the-web-archive/
23. Approach
โข selection of key UK creationist sites
โข extraction of all unique inbound referring
hosts for 1996-2010
โข inspection and classification
24. Caveats on method
โข partial nature of the dataset
โข benchmarking of absolute numbers
โข selective sample
โข what does a link mean, anyway ?
โข not looking at number of linking resources
per host
25. Truth in Science: how
significant?
โข only 46 unique inbound hosts
โข โฆ of which many were other creationists
or secularist sites
โข two churches, one school
โข fewer in 2010 than 2007
26.
27. Conclusions
โข a utopian dream unfulfilled
โข a genuine moral panic
โข a justified conspiracy theory
28. Next steps (1)
1. NI the 'creationism capital of Europe'?
(Analysis of:
โข links from GB organisations to NI
creationists
โข links from NI to RoW)
2. What about creationism in .ie ?
29. Next steps (2)
Project: EU National Web Spheres
โข part of resaw.eu
โข investigating the nature of a national web
domain
โข .. including the interlinking between them
โข case study I: Anglican & Presbyterian
churches in Ireland, north and south
30. Web Archives for Historians
@HistWebArchives , http://webarchivehistorians.org/