1. Update on Memento
http://www.mementoweb.org/
Herbert Van de Sompel
Robert Sanderson
Michael L. Nelson
This research funded by
the Library of Congress
Towards Seamless Navigation
of the Web of the Past
Memento Update
2011 IIPC General Assembly, Den Hague 1
2. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 2
3. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 3
4. Memento wants to make it easy
to navigate the Web of the Past.
Memento Update
2011 IIPC General Assembly, Den Hague 4
5. Tate Online Select Date Tate Online
Today March 16 2008 March 16 2008
From
National Archives
Memento Update
2011 IIPC General Assembly, Den Hague 5
6. Versions: Web vs CMS
World Wide Web Content Management Systems
• Designed to forget about • Designed to be aware of all
prior versions of a resource versions of a resource
• Highly Distributed • Self-contained
• No standard version • Variety of proprietary version
mechanisms mechanisms
• Standardized interlinking • Versions interlinked using
mechanisms proprietary mechanisms
Memento Update
2011 IIPC General Assembly, Den Hague 6
7. Versions are not Integrated
The Web Architecture has a
hard time dealing with the
versions that do exist:
• Cannot talk about a resource
as it used to exist
• Cannot access a prior version
given the current one
• Cannot access the current
version given a prior one
Memento Update
2011 IIPC General Assembly, Den Hague 7
8. Memento Framework
• Regards the Web as a big
Content Management System
• Introduces a uniform
capability to access versions
on the Web
• Does not build new archives
but leverages all systems that
host versions
Memento Update
2011 IIPC General Assembly, Den Hague 8
9. Memento Framework
• Is Distributed: versions may
exist on several servers
• Uses Time as a global
version indicator
• Is based on the primitives of
the Web: resource, resource
state, representation, content
negotiation, link
Memento Update
2011 IIPC General Assembly, Den Hague 9
16. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 16
17. Significant progress has been made towards
seamless navigation of the Web of the Past.
Memento Update
2011 IIPC General Assembly, Den Hague 17
18. Standardization
• Standardization process started
via the IETF
• Interest from IETF and W3C
• Encouraged by major Web
architects, including: Tim
Berners-Lee, Mark Nottingham,
Michael Hausenblas
https://datatracker.ietf.org/doc/draft-vandesompel-memento/
Memento Update
2011 IIPC General Assembly, Den Hague 18
19. Memento Clients
• Several client tools developed
by us and others
• Add-ons for FireFox
(operational) and Internet
Explorer (experimental)
• Applications for Android
(operational) and iPhone/iPad
(in development)
• Paper in current Issue of
Code4Lib Journal
http://www.mementoweb.org/tools/
Memento Update
2011 IIPC General Assembly, Den Hague 19
20. Memento Server Support
• Memento-compliant Wayback
software:
• In use by Internet Archive
• Available to Web archives,
worldwide
• Please experiment with this
new 1.6 version!
http://www.mementoweb.org/tools/
Memento Update
2011 IIPC General Assembly, Den Hague 20
21. Memento Server Support (2)
• Plug-in for MediaWiki
(operational)
• Used on W3C’s main wiki
• Please install it for your
MediaWiki!
http://www.mementoweb.org/tools/
Memento Update
2011 IIPC General Assembly, Den Hague 21
22. Memento Server Validator
• Server side client:
• Attempts to perform all
Memento actions against a
given URI
• Reports success/failure of
the interactions and
warnings for optional
aspects
• Kept up to date with IETF
Internet Draft
http://www.mementoweb.org/tools/validator/
Memento Update
2011 IIPC General Assembly, Den Hague 22
23. Memento Proxy Support
• Several systems that host
Mementos made Memento-
compliant “by proxy”:
• Many Web Archives that do
not yet run Memento-
compliant software
• 3,000+ MediaWiki systems,
including Wikipedia, Wikia
• We would love all of these to
become natively Memento
compliant!
Memento Update
2011 IIPC General Assembly, Den Hague 23
24. Memento Web Site
• Ongoing effort to add materials
that support understanding and
adoption:
• Introduction to Memento
• How to recognize
Mementos, TimeGates,
Original Resources?
• Guidelines for servers that
host Mementos (Web
Archives, CMS, snapshot
archives, etc.)
http://www.mementoweb.org/guide/
Memento Update
2011 IIPC General Assembly, Den Hague 24
25. Funding
• 2007-2010: US $250K grant
from Library of Congress
• Approx. $50K on Memento
• 2010-2011: US $1 Million
follow-up grant from Library of
Congress
• For: Specification, outreach,
tool development, further
research
Memento Update
2011 IIPC General Assembly, Den Hague 25
26. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 26
27. Very few Web sites provide a “timegate” link.
Need additional mechanisms to support Discovery.
Memento Update
2011 IIPC General Assembly, Den Hague 27
28. Batch Discovery: TimeMaps
A TimeMap minimally lists:
• URI and datetime of Mementos known to an archive
• URI of Original Resource
TimeMaps can be aggregated across systems that host Mementos
Memento Update
2011 IIPC General Assembly, Den Hague 28
29. Batch Discovery: Feed of TimeMaps
System that hosts Mementos exposes Feed of TimeMaps to
allow applications to remain in sync with its collection:
• One Atom entry per Original Resource
• The entry links to or includes a TimeMap
• The entry's updated changes when additional
Mementos become available
• The ID of the entry is a tag URI based on URI of
Original Resource
• Can be protected, and include license information
• Could be anonymized by aggregating service
Memento Update
2011 IIPC General Assembly, Den Hague 29
30. Batch Discovery: robots.txt
• robots.txt file is used by Web servers to convey
crawling policies
• Add a directives to support discovery of TimeGates and
Feeds of TimeMaps
TimeGate: http://dutch.archive.org/timegate/
Archived: .nl
TimeGate: http://all.archive.org/timegate/
Archived: *
TimeMapFeed: http://dutch.archive.org/feed/feed1.xml
Memento Update
2011 IIPC General Assembly, Den Hague 30
31. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 31
32. Memento can recreate pages using
resources from different archives.
This poses a branding challenge.
Memento Update
2011 IIPC General Assembly, Den Hague 32
33. Current Branding Practice for Web Archives
Page and embedded resources from same Web Archive
Branding
for
page
and
embedded
resources
from single
archive
Memento Update
2011 IIPC General Assembly, Den Hague 33
34. Branding for Web Archives in Memento Mode
Page and embedded resources from various Web Archives
HTML's
branding
No
branding
No
branding
Will be researched
Memento Update
2011 IIPC General Assembly, Den Hague 34
35. Overview of Memento Framework
Deployment Progress
Memento and Discovery
Memento and Branding
Alternative Web Archiving Strategies
Memento Update
2011 IIPC General Assembly, Den Hague 35
36. Crawl-based Archives host distinct observations.
Transactional Archives never miss an update.
Memento Update
2011 IIPC General Assembly, Den Hague 36
37. Crawl-Based Web Archives
Distinct Observations are Archived for Many Servers
Memento Update
2011 IIPC General Assembly, Den Hague 37
38. Server-Side Transactional Web Archives
Entire Change History is Archived for a Single Server
Memento Update
2011 IIPC General Assembly, Den Hague 38
39. Development of Transactional Web Archive Software
Capture:
• Apache connection filter module captures URI, headers, body
• POSTs in real-time to transactional archive
Access:
• Online, real time access via Memento TimeGates
• Batch Export via WARC files for long term preservation
Memento Update
2011 IIPC General Assembly, Den Hague 39
40. Update on Memento
http://mementoweb.org/
Herbert Van de Sompel
Robert Sanderson
Michael L. Nelson
Towards Seamless Navigation of
the Web of the Past
Memento Update
2011 IIPC General Assembly, Den Hague 40