Memento Tracer
An Innovative Approach Towards Balancing
Scale and Fidelity for Web Archiving
Presentation at RESAW The Web That Was
Amsterdam, NL, June 20 2019
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity for Web Archiving
1. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
Herbert Van de Sompel
DANS
@hvdsomp
Memento Tracer
An Innovative Approach Towards Balancing
Scale and Fidelity for Web Archiving
2. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Background: Scholarly Orphans Project
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation
3. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Scholarly Orphans Team
• Los Alamos National Laboratory:
• Lyudmila Balakireva
• Martin Klein
• James Powell
• Harihar Shankar
• Herbert Van de Sompel (now at DANS)
• Old Dominion University:
• Sawood Alam
• Grant Atkins (now at Mitre)
• Shawn Jones
• Mat Kelly
• Michael L. Nelson
4. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
• Consideration
• Researchers deposit artifacts in web platforms
• Status quo - Not systematically archived
• No frameworks like LOCKSS/Portico exist for these artifacts
• Researchers only selectively deposit artifacts in portals that
provide archival guarantees; to obtain a cite-able DOI
• Can’t expect researchers to (also) upload all artifacts in IRs
• Web archives only incidentally archive these artifacts, cf.
anecdotal & Hiberlink project evidence
Research and Research Communication on the Web
5. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Emma Schymanski
https://orcid.org/0000-0001-6868-8145
https://github.com/schymane
https://www.slideshare.net/EmmaSchymanski
https://figshare.com/authors/Emma_Schymanski/5087039
https://publons.com/author/1538491/emma-schymanski#profile
https://www.eawag.ch/en/aboutus/portrait/organisation/staff/profile/emma-schymanski/
6. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Emma’s SlideShare Artifact: 0 Mementos
https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge
http://timetravel.mementoweb.org/
7. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Shawn Jones
https://orcid.org/0000-0002-4372-870X
http://www.shawnmjones.org/
https://github.com/shawnmjones
https://www.slideshare.net/shawnmjones
https://en.wikipedia.org/wiki/User:Shawnmjones
https://www.blogger.com/profile/17827543974149663194
8. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Shawn’s GitHub Artifact: 1 Memento
https://github.com/shawnmjones/mediawiki
https://web.archive.org/web/*/https://github.com/shawnmjones/mediawiki
9. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Hiberlink Evidence
Web resources referenced in Elsevier corpus (1996-2012)
without representative Memento in public web archives
Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE
https://doi.org/10.1371/journal.pone.0115253
10. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Scholarly Orphans Project
How to faithfully capture Scholarly Orphans
for long-term archiving?
12. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Scale!
https://twitter.com/brewster_kahle/status/1016003169589981184
13. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Scale!!
https://twitter.com/brewster_kahle/status/1118172506777509890
14. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Scale!!!
https://twitter.com/brewster_kahle/status/1139700494748663809
15. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Fidelity?
https://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-
been-unarchivable.html
http://web.archive.org/web/*/http://cnn.com
16. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Fidelity!!
https://twitter.com/ianmilligan1/status/1136703505442324481https://twitter.com/MellonFdn/status/1138811967060267011
17. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Web Archiving: Scale?
https://twitter.com/mart1nkle1n/status/1136705116738904067
18. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Resource Boundary
https://www.slideshare.net/hvdsomp/paul-evan-peters-lecture
20. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Memento Tracer Framework
http://tracer.mementoweb.org
Inspired by:
• LOCKSS
• Same automated approach for resources of a class
• Webrecorder
• Manual recording of web resources
• Various attempts aimed at automating interactions/behaviors
• E.g., Brozzler, Browsertrix
23. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Current Memento Tracer Capabilities
• Single clicks/links
• All links in an area
• Repeated click on links, with stop condition
• Slides
• Pagination
• Nested traces i.e., “trace in a trace”
• Trace for portal A follow link to portal B execute
trace for portal B
• Identification of page/portal for which a trace exists by URI
(pattern)
24. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Memento Tracer Benefits
• Scalability
• Trace created once is applicable to all web resources of
the same class
• Traces shared via repository (edits, versioning)
• Quality
• Trace used as set of instructions for browser-based
capture framework
• Resource boundary explicit
• Tradeoff
• Quality vs performance
25. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Memento Tracer Challenges
• Memento Tracer:
• Language used to express Traces (interoperability)
• Organization of the shared repository for Traces
• Limitations of the browser event listener approach for recording
Traces
• Selection of a Trace for capturing a web publication by other
means than URI pattern
• Legal constraints
26. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
myresearch.institute - Pilot
For more details and statistics, see our 2019 CNI Spring meeting slides:
https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
27. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
myresearch.institute - Pilot
For more details and statistics, see our 2019 CNI Spring meeting slides:
https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
28. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
myresearch.institute - Pilot
For more details and statistics, see our 2019 CNI Spring meeting slides:
https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
29. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
myresearch.institute - Pilot
For more details and statistics, see our 2019 CNI Spring meeting slides:
https://www.slideshare.net/martinklein0815/an-institutional-perspective-to-rescue-scholarly-orphans
30. Memento Tracer
@mart1nkle1n @hvdsomp
The web that was, Amsterdam, NL, June 20 2019
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
Herbert Van de Sompel
DANS
@hvdsomp
Memento Tracer
An Innovative Approach Towards Balancing
Scale and Fidelity for Web Archiving
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation