Overview of the problems of Reference Rot and what actions to take to ensure the persistence of the digital scholarly record. Presented by Peter Burnhill with Adam Rusbridge & Muriel Mewissen, EDINA, University of Edinburgh, UK; Herbert Van De Sompel, Los Alamos National Laboratory Research Library, USA; Gaelle Bequet, ISSN International Centre, France; at Towards Open Science, LIBER, London, June 2015.
Actions to Ensure the Integrity and Continuity of the Scholarly Record
1. Actions to Ensure the Integrity and
Continuity of the Scholarly Record
Peter Burnhill
with Adam Rusbridge & Muriel Mewissen, EDINA, University of Edinburgh, UK
Herbert Van De Sompel, Los Alamos National Laboratory Research Library, USA
Gaelle Bequet, ISSN International Centre, France
09:40 – 10:00
Towards Open Science, LIBER, London, June 2015
2. Overview
1. The Scholarly (& Cultural) Record
2. Threat to the Continuity of the Scholarly (& Cultural)
Record
3. Threat to the Integrity of our Scholarly Record
4. Ensuring the Integrity & Continuity of the Scholarly Record
5. Actions to Ensure the Integrity & Continuity of the Scholarly Record
– Keywords: Stewardship, Collection, Cooperation, Advocacy, Spend
3. “The Scholarly
Record has a
fuzzy edge”
‘e-journals’
‘book-length work’
1. The (digital) Scholarly Record
conference proceedings
‘data as findings’
New
‘research objects’
4. “The Scholarly
Record has a
fuzzy edge”
(resources needed
for scholarship)
‘e-journals’
Websites,
Databases,
Repositories
‘book-length work’
‘Gov Docs’
1. The (digital) Scholarly Record
conference proceedings
‘e-magazines’
‘e-newsmedia’
‘data as findings’
New
‘research objects’
5. Online
Continuing
Resources
ISSN
‘The (published)
Scholarly
Record’
‘resources needed
for scholarship’ Issued in Parts
(Serials)
Content changes over time
(Integrating)
‘e-journals’
Websites,
Databases,
Repositories
‘Book-length work’
‘Gov Docs’
Focus on what is published and that content
that is issued online as a ‘continuing resource’
Conference proceedings
‘e-magazines’
‘e-newsmedia’
New
‘research objects’
6. to ensure
researchers, students & their teachers have
ease and continuing access to
online resources needed for open scholarship
licence
to use
access
to content & tools
Our Shared Task is
2. Threat to Continuity of the Scholarly (& Cultural) Record
7. what was once available in print,
on-shelf locally …
… is now online & accessed
remotely,
‘anytime/anywhere’
We’ve seen improved Ease of Access…
But what of
Continuity of Access?
(this is mostly due to publishers)
8. Digital back copy is not in the custody of libraries
Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/
9. Digital back copy is not in the custody of libraries
Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/
Libraries boast of ‘e-collections’,
but do they only have ‘e-connections’?
10. access
to content & services
We need to have some digital shelving for the Record
11. Ensuring
ease and continuing access
to the digital back copy
access
to content & services
We need to have some digital shelving for the Record
12. National Science Library,
Chinese Academy of Sciences
Emergence of Keepers of digital content
① Web-scale not-for-profit archiving agencies:
② National libraries …
① Research libraries: consortia & specialist centres …
National Science Library,
Chinese Academy of Sciences
16. + Content Drift: What is at end of URI has changed, or gone!
http://dl00.org
2000
http://dl00.org
2004
http://dl00.org
2005
http://dl00.org
2008
(a) Dynamic content
as values on webpage
changes over time
(b) Static content
but very different (often
unrelated) web pages
17. http://hiberlink.org/
Project 2 years: March 2013
to June 2015
Funder Andrew W. Mellon Foundation
Partners University of Edinburgh
EDINA
Peter Burnhill, Muriel Mewissen, Richard Wincewicz,
Paul Walk, Tim Stickland, [Christine Rees]
Language Technology Group, Informatics
Claire Grover, Beatrice Alex, Richard Tobin, Colin
Matheson, [Ke “Adam” Zhou]
Los Alamos National Laboratory
Research Library
Herbert Van de Sompel, Harihar Shankar,
[Martin Klein, Rob Sanderson]
19. References in Web-Based Scholarly
Communication
To Scholarly Resources To Web at Large Resources
Link Rot DOI, HTTP version of DOI ‘Web today, gone tomorrow’
Content Decay Has ‘fixity’ How to add fixity to the dynamic
Archiving: CLOCKSS,
Portico, LOCKSS, etc, as
per Keepers Registry …
Focus for Hiberlink
20. Findings: Status of Referenced URIs, PMC corpus
Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One
in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0115253
6 publicly accessible web archives for lookup: Internet Archive, archive.is (archive.today),
Archive-It, BL Web Archive, UK National Archives Web Archive & Icelandic National Archive
21. Klein M, Van de Sompel H, Sanderson R, Shankar H, Balakireva L, et al. (2014) Scholarly Context Not Found: One
in Five Articles Suffers from Reference Rot. PLoS ONE 9(12): e115253. doi:10.1371/journal.pone.0115253
http://127.0.0.1:8081/plosone/article?id=info:doi/10.1371/journal.pone.0115253
Findings: Status of Referenced URIs, Elsevier corpus
6 publicly accessible web archives for lookup: Internet Archive, archive.is (archive.today),
Archive-It, BL Web Archive, UK National Archives Web Archive & Icelandic National Archive
22. … what is then a rotten article!
… sale of rotten goods undermes the integrity of the
scholarly record - especially for open scholarship
23. References in Web-Based Scholarly
Communication
To Scholarly Resources To Web at Large Resources
Link Rot DOI, HTTP version of DOI ‘Web today, gone tomorrow’
Content Decay Has ‘fixity’ How to add fixity to the dynamic
Archiving: CLOCKSS,
Portico, LOCKSS, etc, as
per Keepers Registry …
The diverse world of web-archiving:
How to enable ‘pro-active’ archiving
of what is regarded as important
“Think Hiberlink”
There are issues here too;
how happy should we be?
24. 4. Ensuring the Integrity & Continuity of
the Scholarly Record
Good that the likes of CLOCKSS & Portico, the BL
& KB (Netherlands) and some others are doing
something …
But to what extent is the scholarly record still at
risk of loss?
How can we know?
NB: Check out David Rosenthal’s blog posts ….
25. National Science Library,
Chinese Academy of Sciences
All praise to those who have stepped forward
to act as digital shelves!
① Web-scale not-for-profit archiving agencies:
② National libraries …
① Research libraries: consortia & specialist centres …
National Science Library,
Chinese Academy of Sciences
26. Many archiving organisations is a Good Thing
“Digital information is best preserved by replicating it at
multiple archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)
As with Magna Carta,
lots of copies …
27. ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
SERVICES: user
requirements
(a)
(b)
ISSN Register at heart of the
Data Model;
ISSN-L as kernel field
METADATA
on extant e-journals
METADATA
on preservation action
How to know who is looking after what & how?
(and uncover what is still at risk)
Digital Preservation
Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.
(Taken from Figure 1 in reference paper in Serials, March 2009)
Piloting an
E-journal
Preservation
Registry
Service
29. Two Key Performance Indicators (KPIs)
‘Ingest Ratio’ = titles ingested by one or more Keeper
/ ‘online serials’ in ISSN Register
= 28,103 / 165,949 [as of June 2015]
=> 17%
‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers
/ ‘online serials’ in ISSN Register
= 9,836 / 165,949
=> 6%
30. with usage logs for the UK OpenURL Router*
• 8.5m full text requests in UK during 2012
=> 53,311 online titles requested
Analysis in 2013:
‘Ingest Ratio’ = 32% (16,985/53,311)
=> over two thirds 68% (36,326 titles) held by none!
Archival Status of e-Serials Requested
* As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL
resolver services; developed & delivered by EDINA as part of Jisc support for UK universities & colleges
31. with usage logs for the UK OpenURL Router*
• 8.5m full text requests in UK during 2012
53,311 online titles requested
Analysis carried out again in 2015:
‘Ingest Ratio’ = 36% (19,231/53,311) ; up by 2,246 (4%)
=> but still, 64% (34,080 titles) held by none!
‘KeepSafe Ratio’ = 20% (10,847/53,311) ; up by 2,985 (5%)
Archival Status of Requested e-Serials: Update
32. Known Archival Status of Online Continuing Resources
assigned ISSN, by Country, June 2015
33. Known Archival Status of Online Continuing Resources
assigned ISSN, by Country, June 2015
If its being kept safe then tell the Keepers Registry
34. Known Archival Status of Online Continuing Resources
assigned ISSN, by Country, June 2015
If its being kept safe then tell the Keepers Registry
Researchers (and therefore libraries) in any one
country are dependent upon content written and
published as serials in countries other than their own
35. very many ‘at risk’ e-journals from many (small &
not so small) publishers
BIG
publishers
act early but
incompletely
Priority:
find economic way to
archive content from
36. 5a. Actions to Ensure the Integrity & Continuity of
the Scholarly Record
What should be done?
Accept responsibility for stewardship of collections
1. Use the Keepers Registry
2. Commit financial support for web-scale agencies, such
as CLOCKSS & Portico: invest 1%
3. Contribute your collection development expertise
4. Tell publishers, archiving agencies & national library
5. Consider options for collaborative action as LIBER
6. Avoid the 2020 Vision where you get the blame!
37. • Upload list of ISSN & titles
• Receive back report on what is
being archived & what is not
Register now for Member Services:
http://thekeepers.org
New Service: [just launched this week]
Title List Comparison
1. Use the Keepers Registry to check the
archival status of the journals that are
of key importance to you
38. 5b. Actions to Ensure the Integrity & Continuity of
the Scholarly Record
• What should be done?
– Accept responsibility for stewardship of collections
– Think Hiberlink: what about Reference Rot?
1. Good News is that there is Remedy (coming out of R&D)
• to create ‘snapshots’ of referenced content
– to store in web archives
• to include in the citations:
Original URI
Snapshot URI [obtained from a web archive]
Date/Time of snapshot
2. Role for research librarians is to alert publishers,
editors and authors and support new initiatives :
HiberActive infrastructure
39. Help authors do the right thing via a reference manager (eg
EndNote, Reference Manager, Zotero, Mendeley)
① archiving of referenced web content when noted
② use Datetime URI for archived content in the citation
Hiberlink Plug-in developed for Zotero
1. Hiberlink Remedy To Avoid Reference Rot
Help editors & publishers do the right thing, having parsed the
document to extract URIs
① archiving of referenced web content [having author check]
② use Datetime URI for archived content in the citation
Hiberlink Plug-in developed for OJS
42. 2. HiberActive Service Demonstrator
Martin Klein et al. (2014) HiberActive: Pro-Active Archiving of web references from scholarly articles
Open Repositories 2014 http://www.slideshare.net/martinklein0815/hiberactive
43. Cite with Robust References
• For Open Science, the Scholarly Record Must Include:
Original URI, as it was at the time of reading
Snapshot URI, to revisit the content that was noted
Date/Time that Snapshot was taken of what was noted
Date/Time & Original URI enables access to
snapshots created near the Date/Time in any web
archive around the world
using Memento infrastructure
(2015) Robust Links - Motivation
http://robustlinks.mementoweb.org/about/
45. Thanks for listening …
1. The Scholarly (& Cultural) Record
2. Threat to the Continuity of the Scholarly (& Cultural) Record
3. Threat to the Integrity of our Scholarly Record
4. Ensuring the Integrity & Continuity of the Scholarly Record
5. Actions to Ensure the Integrity & Continuity of the Scholarly Record
Keywords: Stewardship, Collection, Cooperation, Advocacy, Spend
hiberlink.org thekeepers.orgedina.ac.uk