SlideShare una empresa de Scribd logo
1 de 92
Enabling Personal Use of
Web Archives
Michele C. Weigle, @weiglemc
Web Sciences and Digital Libraries (WS-DL) Group, @WebSciDL
Department of Computer Science
Old Dominion University
June 6, 2018
Workshop on Web Archiving and Digital Libraries (WADL), #WADL2018
@weiglemc, @WebSciDL
ODU WS-DL Group
• Scott Ainsworth
• Sawood Alam
• Lulwah Alkwai
• Mohamed Aturban
• Brian Griffin
• Hussam Hallak
• Shawn Jones
• Mat Kelly
• Corren McCoy
• Louis Nguyen
• Alexander Nwala
@WebSciDL
http://ws-dl.cs.odu.edu/
http://ws-dl.blogspot.com/
June 6, 2018 - #WADL2018 at 2
PhD Students
• Nauman Siddique
• Miranda Smith
MS Students Recent Alumni
• Maheedhar Gunnam (MS)
• Martin Klein
• Hany SalahEldeen
• Surbhi Shankar (MS)
• Erika Siregar (MS)
• Plinio Vargas (MS)
Coming Soon!
• Yasmin AlNoamany
• Ahmed AlSum
• Grant Atkins (MS)
• John Berlin (MS)
• Justin Brunelle
• Chuck Cartledge
• Hung Do (MS)
• Dr. Sampath Jayarathna
• Dr. Jian Wu
• Dr. Michael L. Nelson
• Dr. Michele C. Weigle
Faculty
@weiglemc, @WebSciDL
Computer scientists are toolsmiths
June 6, 2018 - #WADL2018 at 3
Frederick P. Brooks, Jr.. 1996. The computer scientist as toolsmith II. Commun. ACM 39, 3 (March 1996), 61-68,
http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdf
@weiglemc, @WebSciDL
I want to enable the personal
use of web archives…
June 6, 2018 - #WADL2018 at 4
@weiglemc, @WebSciDL
I want to enable the personal use of web
archives… by academics and scholars
June 6, 2018 - #WADL2018 at 5
Liza Potts, ODU, Michigan State
studying communication during disasters
@weiglemc, @WebSciDL
They used screenshots to record news
webpages and tweets
June 6, 2018 - #WADL2018 at 6
@weiglemc, @WebSciDL
We can find webpages for some
filenames
June 6, 2018 - #WADL2018 at 7
http://www.bbc.com/news/world-europe-14287822 https://www.bbc.com/news/world-europe-14276074
@weiglemc, @WebSciDL
But, it’s difficult to manage metadata
with just a filename
June 6, 2018 - #WADL2018 at 8
@weiglemc, @WebSciDL
I want to enable the personal use of web
archives… by academics and scholars
Columbia course in Human Rights Information Technology
• evaluate online advocacy strategies over time
• explore the websites’ degrees of interactivity
• observe the variety of ways groups frame and present issues
online
June 6, 2018 - #WADL2018 at 9
Alex Thurman and Pamela Graham
@weiglemc, @WebSciDL
They want to view how groups’ web
presence changes over time
June 6, 2018 - #WADL2018 at 10
Alex Thurman and Pamela Graham
https://wayback.archive-it.org/1068/*/http://amnesty.ca/
@weiglemc, @WebSciDL
Visual layout changes are important
June 6, 2018 - #WADL2018 at 11
Alex Thurman and Pamela Graham
https://wayback.archive-it.org/1068/*/http://amnesty.ca/
2011-03-11, 21:29:04 2012-03-02, 21:04:40
2013-03-07, 00:03:05 2018-01-14, 20:57:13
@weiglemc, @WebSciDL
I want to enable the personal use of web
archives… by academics and scholars
June 6, 2018 - #WADL2018 at 12
Deborah Kempe
https://archive-it.org/collections/4544
@weiglemc, @WebSciDL
There’s a need for visual browsing of
collection of artists’ websites
June 6, 2018 - #WADL2018 at 13
Deborah Kempe
https://archive-it.org/collections/4544
@weiglemc, @WebSciDL
I want to enable the personal use of
web archives… by journalists
June 6, 2018 - #WADL2018 at 14
similar to our Hurricane Katrina example: https://www.slideshare.net/phonedude/why-careaboutthepast
https://www.nytimes.com/2016/11/17/insider/in-13-
headlines-the-drama-of-election-night.html
@weiglemc, @WebSciDL
Wayback has gone mainstream…
June 6, 2018 - #WADL2018 at 15
"God bless you Internet Archive"
- Rachel Maddow, Dec 12, 2016
Last Week Tonight, Mar 18, 2018
@weiglemc, @WebSciDL
… but what do people think the
Wayback Machine is?
June 6, 2018 - #WADL2018 at 16
https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
@weiglemc, @WebSciDL
… but what do people think the
Wayback Machine is?
June 6, 2018 - #WADL2018 at 17
https://www.cnn.com/2018/02/16/politics/richard-pinedo-guilty-plea/index.html
https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
https://web.archive.org/web/20180115103952/https:/auctionessistance.com/
@weiglemc, @WebSciDL
Caches are not archives
June 6, 2018 - #WADL2018 at 18
http://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html
http://www.wired.co.uk/article/russia-propaganda-online-blog-longform-medium-posts
https://webcache.googleusercontent.com/search?q=cache:qwqnGPqC2vsJ:https://medium.com/
%40TheFoundingSon/huffington-post-vs-whiteness-and-white-women-
1e67193085d4+&cd=15&hl=en&ct=clnk&gl=uk
@weiglemc, @WebSciDL
And, there’s more than just the
Internet Archive
June 6, 2018 - #WADL2018 at 19
http://timetravel.mementoweb.org/list/20020908180610/http://blog.reidreport.com/
@weiglemc, @WebSciDL
Some folks knows this
June 6, 2018 - #WADL2018 at 20
http://archive.is/SKYbp
https://www.nytimes.com/2018/04/24/business/media/joy-reid-homophobic-blog-posts.html
@weiglemc, @WebSciDL
Some folks knows this
June 6, 2018 - #WADL2018 at 21
http://archive.is/SKYbp
https://www.nytimes.com/2018/04/24/business/media/joy-reid-homophobic-blog-posts.html
http://money.cnn.com/2018/04/25/media/joy-reid-msnbc-host-wayback-machine/index.html
@weiglemc, @WebSciDL
Pro tip: submit pages to multiple
archives
June 6, 2018 - #WADL2018 at 22
https://twitter.com/phonedude_mln/status/998948823845261312
@weiglemc, @WebSciDL
I want to enable the personal use of
web archives… by the general public
June 6, 2018 - #WADL2018 at 23
@weiglemc, @WebSciDL
Web archives to the rescue!
June 6, 2018 - #WADL2018 at 24
https://twitter.com/brian3354/status/966081774194511874
@weiglemc, @WebSciDL
Is it really that important to archive
instead of just taking a screenshot?
June 6, 2018 - #WADL2018 at 25
https://twitter.com/AngryBlackLady/status/990032514080108544
https://twitter.com/phonedude_mln/status/990070331737100288
@weiglemc, @WebSciDL
We should be doing both
June 6, 2018 - #WADL2018 at 26
https://twitter.com/conspirator0/status/1000475042017366017
@weiglemc, @WebSciDL
What have we been doing
to make this easier?
June 6, 2018 - #WADL2018 at 27
@weiglemc, @WebSciDL
We wanted to help people
create and access local
archives
June 6, 2018 - #WADL2018 at 28
@weiglemc, @WebSciDL
We wanted to help people create and
access local archives
• WARCreate – Google Chrome extension
• WAIL – user-friendly Heritrix and
OpenWayback
• WAIL-Electron – adds browser-based
crawling, pywb
June 6, 2018 - #WADL2018 at 29
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2013-2017, HD-51670-13 and HK-50181-14
@weiglemc, @WebSciDL
WARCreate (2012)
June 6, 2018 - #WADL2018 at 30
Mat Kelly and Michele C. Weigle, "WARCreate - Create Wayback-Consumable WARC Files from Any
Webpage”, JCDL 2012 demo.
http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html
Google Chrome extension
Create local WARC file of
currently viewed
webpage
http://warcreate.com
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2013-2017, HD-51670-13 and HK-50181-14
@weiglemc, @WebSciDL
WAIL (2013)
June 6, 2018 - #WADL2018 at 31
Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Making Enterprise-Level Archive Tools Accessible
for Personal Web Archiving Using XAMPP," Poster and demo at Personal Digital Archiving, 2013.
http://ws-dl.blogspot.com/2016/06/2016-06-03-lipstick-or-ham-next-steps.html
Stand-alone application
Easy install of Heritrix,
OpenWayback
Replay local WARCs created
with WARCreate
http://machawk1.github.io/wail/
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2013-2017, HD-51670-13 and HK-50181-14
@weiglemc, @WebSciDL
WAIL-Electron (2017)
June 6, 2018 - #WADL2018 at 32
John Berlin, Mat Kelly, Michael L. Nelson and Michele C. Weigle, "WAIL: Collection-Based Personal Web
Archiving," JCDL 2017, poster.
http://ws-dl.blogspot.com/2017/02/2017-02-13-electric-wails-and-ham.html
http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html
Update of original WAIL
Adds headless Chrome-based
crawling
OpenWayback -> pywb
https://github.com/N0taN3rd/wail
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2013-2017, HD-51670-13 and HK-50181-14
@weiglemc, @WebSciDL
What did we learn from this?
• We need additional Memento support for
private web archives
• Capturing complex webpages is hard
June 6, 2018 - #WADL2018 at 33
@weiglemc, @WebSciDL
A Memento Meta Aggregator can aggregate
public and private archives (2018)
June 6, 2018 - #WADL2018 at 34
Mat Kelly, Michael L. Nelson, and Michele C. Weigle, "A Framework for Aggregating Private and Public Web
Archives", JCDL 2018
@weiglemc, @WebSciDL
Today’s webpages are super complex
June 6, 2018 - #WADL2018 at 35
number of network requests per page
John Berlin, "To Relive The Web: A Framework for the Transformation and Archival Replay of Web Pages,"
ODU Master’s Thesis, 2018.
@weiglemc, @WebSciDL
Squidwarc enables high-fidelity
browser-based archiving (2017)
June 6, 2018 - #WADL2018 at 36
John Berlin, "2017-07-24: Replacing Heritrix with Chrome in WAIL, and the release of node-warc, node-
cdxj, and Squidwarc”
http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html
High fidelity archival
crawler
node.js based
Uses Chrome or
Chrome Headless
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2013-2017, HD-51670-13 and HK-50181-14
https://github.com/N0taN3rd/Squidwarc
@weiglemc, @WebSciDL
We wanted to help people
submit webpages to public
archives
June 6, 2018 - #WADL2018 at 37
@weiglemc, @WebSciDL
We wanted to help people submit
webpages to public archives
• Mink – Google Chrome extension
• #icanhazmemento – Twitter bot
• ArchiveNow – Python module, Docker
container, local web service
June 6, 2018 - #WADL2018 at 38
@weiglemc, @WebSciDL
Mink (2014)
June 6, 2018 - #WADL2018 at 39
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2014-2017, HK-50181-14
Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing
Experience Using Web Browsers and Memento," JCDL 2014, poster.
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Google Chrome extension
Submit currently viewed
webpage to public archives
Access mementos from public
archives of currently viewed
webpage
Inspired by LANL’s Memento
for Chrome, http://ws-
dl.blogspot.com/2013/10/2013-10-
14-right-click-to-past-memento.html
https://github.com/machawk1/Mink
@weiglemc, @WebSciDL
Mink (2014)
June 6, 2018 - #WADL2018 at 40
“Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”,
2014-2017, HK-50181-14
Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing
Experience Using Web Browsers and Memento," JCDL 2014, poster.
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Google Chrome extension
Submit currently viewed
webpage to public archives
Access mementos from public
archives of currently viewed
webpage
Inspired by LANL’s Memento
for Chrome, http://ws-
dl.blogspot.com/2013/10/2013-10-
14-right-click-to-past-memento.html
https://github.com/machawk1/Mink
@weiglemc, @WebSciDL
#icanhazmemento (2015)
June 6, 2018 - #WADL2018 at 41
http://ws-dl.blogspot.com/2015/07/2015-07-22-i-can-haz-memento.html
Twitter bot
Include #icanhazmemento in a
tweet with a URL
Bot replies with a link to the
memento of the page closest to
the time of the tweet
If page not archived, bot submits
URL to multiple public archives,
replies with a link to the
memento in Time Travel
Alexander Nwala, "2015-07-22: I Can Haz Memento,"
https://github.com/anwala/icanhazmemento
@weiglemc, @WebSciDL
ArchiveNow (2017)
June 6, 2018 - #WADL2018 at 42
Mohamed Aturban, Mat Kelly, Sawood Alam, John Berlin, Michael L. Nelson and Michele C. Weigle,
"ArchiveNow: Simplified, Extensible, Multi-Archive Preservation," JCDL 2018, poster.
http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html
Python module, Docker
container
Submit URI to multiple archives
Generate local WARCs for
private archives
“Towards a Web-Centric Approach for Capturing the Scholarly Record”, 2016-2019
https://github.com/oduwsdl/archivenow
@weiglemc, @WebSciDL
What did we learn from this?
• People want tools to help them submit to
public archives
• Browser extensions are cool, but don't have
much uptake
• more on this later…
June 6, 2018 - #WADL2018 at 43
@weiglemc, @WebSciDL
We wanted to help people
summarize their archives
June 6, 2018 - #WADL2018 at 44
@weiglemc, @WebSciDL
We wanted to help people
summarize their archives
• Dark and Stormy Archives (DSA) –
Archive-It + Storify
• MementoEmbed – web service
• #whatdiditlooklike – Twitter bot
• Alsummarization – algorithm and web
service
• TimeMap Visualization, tmvis – node.js-
based web service of alsummarization
June 6, 2018 - #WADL2018 at 45
@weiglemc, @WebSciDL
"Dark and Stormy" Archives (2016)
June 6, 2018 - #WADL2018 at 46
Characteristicsof
human-generated
Stories
Characteristicsof
Archive-It
collections
Exclude duplicates
Exclude off-topic pages
Exclude non-English Language
Dynamically slice the collection
Cluster the pages
in each slice
Select high-quality
pages from each
cluster
Order pages
by time
Visualize
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson, "Generating Stories From Archived
Collections," ACM WebSci 2017.
http://ws-dl.blogspot.com/2016/09/2016-09-20-promising-scene-at-end-of.html
“Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant
Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018.
http://ws-dl.blogspot.com/2017/12/2017-12-14-storify-will-be-gone-soon-so.html
@weiglemc, @WebSciDL
MementoEmbed (2018)
June 6, 2018 - #WADL2018 at 47
Python module, Docker
container
Submit URI-M
Returns an archive-aware social
card, with HTML embed code
“Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant
https://github.com/oduwsdl/MementoEmbed
(currently in development)
http://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html
Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018.
@weiglemc, @WebSciDL
MementoEmbed (2018)
June 6, 2018 - #WADL2018 at 48
“Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant
http://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html
Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018.
https://github.com/oduwsdl/MementoEmbed
(currently in development)
Python module, Docker
container
Submit URI-M
Returns an archive-aware social
card, with HTML embed code
@weiglemc, @WebSciDL
#whatdiditlooklike (2015)
June 6, 2018 - #WADL2018 at 49
http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html
Twitter bot
Include #whatdiditlooklike in a
tweet with a URL
Bot generates animated GIF of first
memento of each year
Bot replies with a link to entry in
Tumblr
Tumblr:
http://whatdiditlooklike.mementoweb.org/
Source:
https://github.com/anwala/wdill
Alexander Nwala, "2015-02-05: What Did It Look Like?,"
@weiglemc, @WebSciDL
Alsummarization (2014)
June 6, 2018 - #WADL2018 at 50
Ahmed Alsum and Michael L. Nelson, "Thumbnail Summarization Techniques for Web Archives," ECIR 2014.
Summarize TimeMap
Compare SimHash of
HTML, not images
Hamming distance
threshold of 4 characters
“Visualizing Digital Collections of Web Archives”, 2014-2015, Columbia Libraries Web Archiving
Incentive Program
Mat Kelly, Michael L. Nelson, and Michele C. Weigle, "Visualizing Digital Collections of Web Archives," Web
Archiving Collaboration, 2015, http://ws-dl.blogspot.com/2015/06/2015-06-09-web-archiving-
collaboration.html
700 thumbnails
32 sampled
thumbnails
CoverFlow view
https://github.com/machawk1/ArchiveThumbnails
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 51
M1
M2
M3
M4
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 52
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 53
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
M1
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 54
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
Hamming distance (M1, M2) < 4
reject M2
M1
basis
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 55
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
Hamming distance (M1, M3) > 4
select M3
M1
basis
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 56
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
M1
M3
Hamming distance (M3, M4) > 4
select M4
basis
@weiglemc, @WebSciDL
Choosing mementos based on SimHash
June 6, 2018 - #WADL2018 at 57
8c27981eaed151cfa645ad823932eac6
8c27981eaad951cf8645ad823932eac6
fa3799170258494b9443b9be3977a84e
5a1534161357da6b827ab98037db2640
M1
M2
M3
M4
M1
M3
M4
@weiglemc, @WebSciDL
TimeMap Visualization, tmvis (2017)
June 6, 2018 - #WADL2018 at 58
“Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17
http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
Web service
Takes URI-R or URI-T
Performs Alsummarization and
produces grid view, image slider
view, and timeline view
Will produce embeddable version,
Wayback extension
https://github.com/oduwsdl/tmvis
Surbhi Shankar, "Visualizing Thumbnails Of Archived Web Pages", ODU MS Project, 2017
Maheedhar Gunnam, "How I Changed Over Time: A webservice to summarize TimeMaps based on
SimHashed HTML content", ODU MS Project, 2018
@weiglemc, @WebSciDL
tmvis – Grid View
June 6, 2018 - #WADL2018 at 59
“Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17
http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
@weiglemc, @WebSciDL
tmvis– Image Slider View
June 6, 2018 - #WADL2018 at 60
“Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17
http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
@weiglemc, @WebSciDL
tmvis – Timeline View
June 6, 2018 - #WADL2018 at 61
“Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17
http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
Uses Propublica’s TimelineSetter library, http://propublica.github.io/timeline-setter/
@weiglemc, @WebSciDL
What did we learn from this?
• Webpages can go off-topic through time
• Some mementos aren't captured well
• Some mementos aren't replayed well
June 6, 2018 - #WADL2018 at 62
@weiglemc, @WebSciDL
You don't want off-topic mementos
in your summary
June 6, 2018 - #WADL2018 at 63
2012-01-10, 01:41:57 2012-04-10, 03:26:34 2012-04-17, 03:26:15
2012-04-24, 03:36:58 2012-05-15, 03:47:04
http://wayback.archive-it.org/2950/*/http://www.indyows.org
2012-07-03, 12:18:48
@weiglemc, @WebSciDL
Identify off-topic mementos with
Off-Topic Memento Toolkit (2018)
June 6, 2018 - #WADL2018 at 64
“Tools for Managing Seed URIs”, 2014-2015, Columbia Libraries Web Archiving Incentive Program
“Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant
Shawn Jones, Michele C. Weigle, and Michael L. Nelson, ”The Off-Topic Memento Toolkit," iPres 2018.
Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson, "Detecting Off-Topic Pages Within TimeMaps in
Web Archives," IJDL, Vol. 17, No. 3, July 2016.
Python module
Given a URI-T (TimeMap), identifies
off-topic mementos
Option of 8 different similarity
measures
OTMT Distribution Page:
https://pypi.org/project/otmt/
OTMT Source Code Page:
https://github.com/oduwsdl/off-topic-memento-
toolkit
{"http://wayback.archive-
it.org/1068/timemap/link/http://www.badil.org/": {
"http://wayback.archive-
it.org/1068/20130307084848/http://www. badil.org/": {
"timemap measures": {
"cosine": {
"stemmed": true,
"tokenized": true,
"removed boilerplate": true,
"comparison score": 0.10969941307631487,
"topic status": "off-topic"
},
"bytecount": {
"stemmed": false,
"tokenized": false,
"removed boilerplate": false,
"comparison score": 0.15971409055425445,
"topic status": "on-topic"
} },
"overall topic status": "off-topic" },
...
@weiglemc, @WebSciDL
You don't want damaged mementos
in your summary
June 6, 2018 - #WADL2018 at 65
https://wayback.archive-it.org/1068/*/http://aappb.org/
@weiglemc, @WebSciDL
Memento Damage can tell you how
damaged your mementos are (2017)
June 6, 2018 - #WADL2018 at 66
Web service, Docker container
Given URI-M, calculates and
analyzes memento damage
Service:
http://memento-damage.cs.odu.edu
Github:
https://github.com/oduwsdl/web-
memento-damage
“Increasing the Value of Existing Web Archives,” 2015-2019, III 1526700
Erika Siregar, “Deploying the Memento Damage Service: A Comprehensive Tool for Measuring and Analyzing
Damage on Web Archives”, ODU MS Project, 2017.
Justin Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle and Michael L. Nelson, "Not All Mementos Are
Created Equal: Measuring the Impact of Missing Resources," IJDL, Vol. 16, No. 3-4, September 2015.
http://ws-dl.blogspot.com/2017/11/2017-11-22-deploying-memento-damage.html
@weiglemc, @WebSciDL
Memento Damage can tell you how
damaged your mementos are (2017)
June 6, 2018 - #WADL2018 at 67
Erika Siregar, “Deploying the Memento Damage Service: A Comprehensive Tool for Measuring and Analyzing
Damage on Web Archives”, ODU MS Project, 2017.
Justin Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle and Michael L. Nelson, "Not All Mementos Are
Created Equal: Measuring the Impact of Missing Resources," IJDL, Vol. 16, No. 3-4, September 2015.
Web service, Docker container
Given URI-M, calculates and
analyzes memento damage
Service:
http://memento-damage.cs.odu.edu
Github:
https://github.com/oduwsdl/web-
memento-damage
http://ws-dl.blogspot.com/2017/11/2017-11-22-deploying-memento-damage.html
“Increasing the Value of Existing Web Archives,” 2015-2019, III 1526700
@weiglemc, @WebSciDL
Wayback++ uses client-side rewriting to fix
replay-based damaged mementos (2018)
June 6, 2018 - #WADL2018 at 68
Chrome, Firefox extensions
https://github.com/N0taN3rd/
WaybackPlusPlus
https://www.youtube.com/watch?v=ldyidcaVXHw
John Berlin, Michael L. Nelson, and Michele C. Weigle, "Swimming In A Sea Of JavaScript, Or: How I
Learned To Stop Worrying And Love High-Fidelity Replay," WADL 2018.
http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html
http://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
@weiglemc, @WebSciDL
Where does this take us?
June 6, 2018 - #WADL2018 at 69
@weiglemc, @WebSciDL
We’ve developed a lot of tools
June 6, 2018 - #WADL2018 at 70
@weiglemc, @WebSciDL
But, can a full professor use them?
June 6, 2018 - #WADL2018 at 71
Frederick P. Brooks, Jr.. 1996. The computer scientist as toolsmith II. Commun. ACM 39, 3 (March 1996), 61-68.
Fred Brooks says:
@weiglemc, @WebSciDL
So, let's think bigger
• In a world where the web browser is the
Internet, how can we make web archives
ubiquitous?
June 6, 2018 - #WADL2018 at 72
@weiglemc, @WebSciDL
So, let's think bigger
• In a world where the web browser is the
Internet, how can we make web archives
ubiquitous?
• Bring web archives to the browser - natively
June 6, 2018 - #WADL2018 at 73
Michele C. Weigle, Michael L. Nelson, Martin Klein, and Herbert Van de Sompel, “The Case
for Memento-Aware Browsers”, 2017
@weiglemc, @WebSciDL
What if browsers could natively
identify mementos?
• Look for Memento-Datetime header in
HTTP response
Memento-Datetime: Tue, 08 May 2012 11:24:30 GMT
• Use client-side rewriting (Emu) to improve
replay
• Use native UI elements to annotate
composite mementos
June 6, 2018 - #WADL2018 at 74
@weiglemc, @WebSciDL
Identify mementos in the address bar
June 6, 2018 - #WADL2018 at 75
@weiglemc, @WebSciDL
Identify mementos in the address bar
June 6, 2018 - #WADL2018 at 76
Archive http://web.archive.org/web/2014030402052012/...
Could also identify non-HTML mementos (images, PDF, etc.)
@weiglemc, @WebSciDL
Identify temporal inconsistencies
June 6, 2018 - #WADL2018 at 77
Archive http://web.archive.org/web/20050601025530/..
.
Scott Ainsworth, http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
@weiglemc, @WebSciDL
Identify temporal inconsistencies
June 6, 2018 - #WADL2018 at 78
Archive http://web.archive.org/web/20050601025530/..
.
Scott Ainsworth, http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
+ 5 Years, 11 months (Apr 6, 2011)
@weiglemc, @WebSciDL
What if browsers could natively
interact with Memento aggregators?
• Alert users of unarchived pages as they
browse
• Provide UI elements to summarize and
access past versions of the current webpage
• Integrate web archives and the past web
into “New Tab View”
June 6, 2018 - #WADL2018 at 79
@weiglemc, @WebSciDL
What if browsers could natively
interpret and replay WARCs?
• Users could share WARCs
• Recipient could open the WARC directly in
their browser
• WARC.js (ala PDF.js for WARCs)
June 6, 2018 - #WADL2018 at 80
@weiglemc, @WebSciDL
What if browsers could natively
create mementos?
• Push to public web
archives
• Create local WARCs
June 6, 2018 - #WADL2018 at 81
https://twitter.com/conspirator0/status/1000475042017366017
Just as easily as taking
a screenshot
or maybe along with
taking a screenshot
@weiglemc, @WebSciDL
Firefox Quantum has brought
screenshots natively to the browser
June 6, 2018 - #WADL2018 at 82
@weiglemc, @WebSciDL
Saving full page screenshot
June 6, 2018 - #WADL2018 at 83
@weiglemc, @WebSciDL
Screenshots can be saved in the
Mozilla cloud
June 6, 2018 - #WADL2018 at 84
@weiglemc, @WebSciDL
Screenshots have a URI
June 6, 2018 - #WADL2018 at 85
https://screenshots.firefox.com/MhV6otMl6r2YWOXc/2018.jcdl.org
@weiglemc, @WebSciDL
What if these screenshots were
Memento-enabled?
• Provide Memento HTTP headers for the
screenshots
• Implement Memento datetime negotiation
for the entire screenshot cloud service
June 6, 2018 - #WADL2018 at 86
@weiglemc, @WebSciDL
We could build a crowd-sourced
archive of screenshots
• Take screenshot and save to Memento-
enabled screenshot cloud
• Option to push live webpage to archive at
same time
• Then we have both an archived page and a
screenshot of the page from very close to
the same datetime
June 6, 2018 - #WADL2018 at 87
@weiglemc, @WebSciDL
What about bookmarks?
June 6, 2018 - #WADL2018 at 88
submit to public web archives
local archive saved to ~/Library/WebArchive/
Bookmarking becomes archiving
@weiglemc, @WebSciDL
Viewing a bookmark becomes an
opportunity to interact with archives
June 6, 2018 - #WADL2018 at 89
@weiglemc, @WebSciDL
Memento Embeds for bookmark view
June 6, 2018 - #WADL2018 at 90
@weiglemc, @WebSciDL
Open live web, local memento, or
public memento
June 6, 2018 - #WADL2018 at 91
Open on live web
Open local memento
Open public memento
@weiglemc, @WebSciDL
It’s time for browsers to be
Memento-aware
• Web archives have gone mainstream.
• We’ve learned a lot by building tools to
enable personal use of web archives.
• These ideas need to be integrated directly
into browsers for general public use.
June 6, 2018 - #WADL2018 at 92

Más contenido relacionado

La actualidad más candente

A Framework for Aggregating Private and Public Web Archives
A Framework for Aggregating Private and Public Web ArchivesA Framework for Aggregating Private and Public Web Archives
A Framework for Aggregating Private and Public Web Archivesjcdl2018
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkSawood Alam
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-ItShawn Jones
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Intro to Web Archiving
Intro to Web ArchivingIntro to Web Archiving
Intro to Web ArchivingMichele Weigle
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerWiLS
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemWiLS
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Mat Kelly
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Shawn Jones
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531charper
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and librariesAlison Hitchens
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)Alison Hitchens
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsShawn Jones
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesMichael Nelson
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureEmily Nimsakont
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 

La actualidad más candente (20)

A Framework for Aggregating Private and Public Web Archives
A Framework for Aggregating Private and Public Web ArchivesA Framework for Aggregating Private and Public Web Archives
A Framework for Aggregating Private and Public Web Archives
 
Archive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification FrameworkArchive Assisted Archival Fixity Verification Framework
Archive Assisted Archival Fixity Verification Framework
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Intro to Web Archiving
Intro to Web ArchivingIntro to Web Archiving
Intro to Web Archiving
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
Linked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve MeyerLinked Data and Discovery with Steve Meyer
Linked Data and Discovery with Steve Meyer
 
Let's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library SystemLet's Get Visible! with Karla Smith, Winnefox Library System
Let's Get Visible! with Karla Smith, Winnefox Library System
 
Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count Impact of URI Canonicalization on Memento Count
Impact of URI Canonicalization on Memento Count
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
Charper.lawdi.20130531
Charper.lawdi.20130531Charper.lawdi.20130531
Charper.lawdi.20130531
 
Linked open data and libraries
Linked open data and librariesLinked open data and libraries
Linked open data and libraries
 
What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)What is #LODLAM?! (revised January 2015)
What is #LODLAM?! (revised January 2015)
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Storytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web ArchivesStorytelling for Summarizing Collections in Web Archives
Storytelling for Summarizing Collections in Web Archives
 
Linked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the FutureLinked Data, Library Users, and the Discovery Tools of the Future
Linked Data, Library Users, and the Discovery Tools of the Future
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 

Similar a Enabling Personal Use of Web Archives

Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web ArchivesShawn Jones
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkMat Kelly
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Shawn Jones
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesShawn Jones
 
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Michael Nelson
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingSawood Alam
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesMichael Nelson
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Michael Nelson
 
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...Axel Bruns
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityAccess Innovations, Inc.
 
Preserving the web
Preserving the webPreserving the web
Preserving the webJeremy Floyd
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)Wesley Fryer
 
Scraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You StartScraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You StartAlexander Nwala
 
Reconciling online liberal arts
Reconciling online liberal artsReconciling online liberal arts
Reconciling online liberal artsRebecca Davis
 

Similar a Enabling Personal Use of Web Archives (20)

Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Aggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity FrameworkAggregating Private and Public Web Archives Using the Mementity Framework
Aggregating Private and Public Web Archives Using the Mementity Framework
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
Congressional Deleted Tweets
Congressional Deleted TweetsCongressional Deleted Tweets
Congressional Deleted Tweets
 
Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035Web Archiving in the Year eaee1902f186819154789ee22ca30035
Web Archiving in the Year eaee1902f186819154789ee22ca30035
 
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento RoutingMementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
MementoMap: A Web Archive Profiling Framework for Efficient Memento Routing
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web PagesBlockchain Can Not Be Used To Verify Replayed Archived Web Pages
Blockchain Can Not Be Used To Verify Replayed Archived Web Pages
 
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence Weaponized Web Archives: Provenance Laundering of Short Order Evidence
Weaponized Web Archives: Provenance Laundering of Short Order Evidence
 
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...
Pushed towards Dysfunction: How Social Media API Restrictions Distort Researc...
 
DHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository InteroperabilityDHUG 2018: Towards Web-Centric Repository Interoperability
DHUG 2018: Towards Web-Centric Repository Interoperability
 
Preserving the web
Preserving the webPreserving the web
Preserving the web
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)Roadmap to Blended Learning (October 2013)
Roadmap to Blended Learning (October 2013)
 
Scraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You StartScraping SERPs For Archival Seeds - It Matters When You Start
Scraping SERPs For Archival Seeds - It Matters When You Start
 
Social interactions
Social interactionsSocial interactions
Social interactions
 
Reconciling online liberal arts
Reconciling online liberal artsReconciling online liberal arts
Reconciling online liberal arts
 

Más de Michele Weigle

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Michele Weigle
 
Visualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeVisualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeMichele Weigle
 
How to Write an Academic Paper
How to Write an Academic PaperHow to Write an Academic Paper
How to Write an Academic PaperMichele Weigle
 
How to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationHow to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationMichele Weigle
 
My Academic Story via Internet Archive
My Academic Story via Internet ArchiveMy Academic Story via Internet Archive
My Academic Story via Internet ArchiveMichele Weigle
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksMichele Weigle
 
Strategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseStrategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseMichele Weigle
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
Energy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksEnergy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksMichele Weigle
 
2015-capwic-gradschool
2015-capwic-gradschool2015-capwic-gradschool
2015-capwic-gradschoolMichele Weigle
 
2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-web2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-webMichele Weigle
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past WebMichele Weigle
 
Archive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH OverviewArchive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH OverviewMichele Weigle
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web ArchivesMichele Weigle
 
"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overviewMichele Weigle
 
TDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsTDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsMichele Weigle
 
Visualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItVisualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItMichele Weigle
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItMichele Weigle
 
Communications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksCommunications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksMichele Weigle
 

Más de Michele Weigle (20)

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
 
Visualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeVisualizing Webpage Changes Over Time
Visualizing Webpage Changes Over Time
 
How to Write an Academic Paper
How to Write an Academic PaperHow to Write an Academic Paper
How to Write an Academic Paper
 
How to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationHow to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic Presentation
 
My Academic Story via Internet Archive
My Academic Story via Internet ArchiveMy Academic Story via Internet Archive
My Academic Story via Internet Archive
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor Networks
 
Strategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseStrategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency Response
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Energy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksEnergy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless Nanonetworks
 
2015-capwic-gradschool
2015-capwic-gradschool2015-capwic-gradschool
2015-capwic-gradschool
 
2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-web2015-odu-ece-tools-for-past-web
2015-odu-ece-tools-for-past-web
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past Web
 
Archive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH OverviewArchive What I See Now - 2014 NEH ODH Overview
Archive What I See Now - 2014 NEH ODH Overview
 
Bits of Research
Bits of ResearchBits of Research
Bits of Research
 
Telling Stories with Web Archives
Telling Stories with Web ArchivesTelling Stories with Web Archives
Telling Stories with Web Archives
 
"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview"Archive What I See Now" - NEH ODH overview
"Archive What I See Now" - NEH ODH overview
 
TDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsTDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETs
 
Visualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItVisualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-It
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
 
Communications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksCommunications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor Networks
 

Último

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Enabling Personal Use of Web Archives

  • 1. Enabling Personal Use of Web Archives Michele C. Weigle, @weiglemc Web Sciences and Digital Libraries (WS-DL) Group, @WebSciDL Department of Computer Science Old Dominion University June 6, 2018 Workshop on Web Archiving and Digital Libraries (WADL), #WADL2018
  • 2. @weiglemc, @WebSciDL ODU WS-DL Group • Scott Ainsworth • Sawood Alam • Lulwah Alkwai • Mohamed Aturban • Brian Griffin • Hussam Hallak • Shawn Jones • Mat Kelly • Corren McCoy • Louis Nguyen • Alexander Nwala @WebSciDL http://ws-dl.cs.odu.edu/ http://ws-dl.blogspot.com/ June 6, 2018 - #WADL2018 at 2 PhD Students • Nauman Siddique • Miranda Smith MS Students Recent Alumni • Maheedhar Gunnam (MS) • Martin Klein • Hany SalahEldeen • Surbhi Shankar (MS) • Erika Siregar (MS) • Plinio Vargas (MS) Coming Soon! • Yasmin AlNoamany • Ahmed AlSum • Grant Atkins (MS) • John Berlin (MS) • Justin Brunelle • Chuck Cartledge • Hung Do (MS) • Dr. Sampath Jayarathna • Dr. Jian Wu • Dr. Michael L. Nelson • Dr. Michele C. Weigle Faculty
  • 3. @weiglemc, @WebSciDL Computer scientists are toolsmiths June 6, 2018 - #WADL2018 at 3 Frederick P. Brooks, Jr.. 1996. The computer scientist as toolsmith II. Commun. ACM 39, 3 (March 1996), 61-68, http://www.cs.unc.edu/~brooks/Toolsmith-CACM.pdf
  • 4. @weiglemc, @WebSciDL I want to enable the personal use of web archives… June 6, 2018 - #WADL2018 at 4
  • 5. @weiglemc, @WebSciDL I want to enable the personal use of web archives… by academics and scholars June 6, 2018 - #WADL2018 at 5 Liza Potts, ODU, Michigan State studying communication during disasters
  • 6. @weiglemc, @WebSciDL They used screenshots to record news webpages and tweets June 6, 2018 - #WADL2018 at 6
  • 7. @weiglemc, @WebSciDL We can find webpages for some filenames June 6, 2018 - #WADL2018 at 7 http://www.bbc.com/news/world-europe-14287822 https://www.bbc.com/news/world-europe-14276074
  • 8. @weiglemc, @WebSciDL But, it’s difficult to manage metadata with just a filename June 6, 2018 - #WADL2018 at 8
  • 9. @weiglemc, @WebSciDL I want to enable the personal use of web archives… by academics and scholars Columbia course in Human Rights Information Technology • evaluate online advocacy strategies over time • explore the websites’ degrees of interactivity • observe the variety of ways groups frame and present issues online June 6, 2018 - #WADL2018 at 9 Alex Thurman and Pamela Graham
  • 10. @weiglemc, @WebSciDL They want to view how groups’ web presence changes over time June 6, 2018 - #WADL2018 at 10 Alex Thurman and Pamela Graham https://wayback.archive-it.org/1068/*/http://amnesty.ca/
  • 11. @weiglemc, @WebSciDL Visual layout changes are important June 6, 2018 - #WADL2018 at 11 Alex Thurman and Pamela Graham https://wayback.archive-it.org/1068/*/http://amnesty.ca/ 2011-03-11, 21:29:04 2012-03-02, 21:04:40 2013-03-07, 00:03:05 2018-01-14, 20:57:13
  • 12. @weiglemc, @WebSciDL I want to enable the personal use of web archives… by academics and scholars June 6, 2018 - #WADL2018 at 12 Deborah Kempe https://archive-it.org/collections/4544
  • 13. @weiglemc, @WebSciDL There’s a need for visual browsing of collection of artists’ websites June 6, 2018 - #WADL2018 at 13 Deborah Kempe https://archive-it.org/collections/4544
  • 14. @weiglemc, @WebSciDL I want to enable the personal use of web archives… by journalists June 6, 2018 - #WADL2018 at 14 similar to our Hurricane Katrina example: https://www.slideshare.net/phonedude/why-careaboutthepast https://www.nytimes.com/2016/11/17/insider/in-13- headlines-the-drama-of-election-night.html
  • 15. @weiglemc, @WebSciDL Wayback has gone mainstream… June 6, 2018 - #WADL2018 at 15 "God bless you Internet Archive" - Rachel Maddow, Dec 12, 2016 Last Week Tonight, Mar 18, 2018
  • 16. @weiglemc, @WebSciDL … but what do people think the Wayback Machine is? June 6, 2018 - #WADL2018 at 16 https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213
  • 17. @weiglemc, @WebSciDL … but what do people think the Wayback Machine is? June 6, 2018 - #WADL2018 at 17 https://www.cnn.com/2018/02/16/politics/richard-pinedo-guilty-plea/index.html https://www.politico.com/story/2018/04/25/joy-reid-anti-gay-posts-550213 https://web.archive.org/web/20180115103952/https:/auctionessistance.com/
  • 18. @weiglemc, @WebSciDL Caches are not archives June 6, 2018 - #WADL2018 at 18 http://ws-dl.blogspot.com/2018/01/2018-01-02-link-to-web-archives-not.html http://www.wired.co.uk/article/russia-propaganda-online-blog-longform-medium-posts https://webcache.googleusercontent.com/search?q=cache:qwqnGPqC2vsJ:https://medium.com/ %40TheFoundingSon/huffington-post-vs-whiteness-and-white-women- 1e67193085d4+&cd=15&hl=en&ct=clnk&gl=uk
  • 19. @weiglemc, @WebSciDL And, there’s more than just the Internet Archive June 6, 2018 - #WADL2018 at 19 http://timetravel.mementoweb.org/list/20020908180610/http://blog.reidreport.com/
  • 20. @weiglemc, @WebSciDL Some folks knows this June 6, 2018 - #WADL2018 at 20 http://archive.is/SKYbp https://www.nytimes.com/2018/04/24/business/media/joy-reid-homophobic-blog-posts.html
  • 21. @weiglemc, @WebSciDL Some folks knows this June 6, 2018 - #WADL2018 at 21 http://archive.is/SKYbp https://www.nytimes.com/2018/04/24/business/media/joy-reid-homophobic-blog-posts.html http://money.cnn.com/2018/04/25/media/joy-reid-msnbc-host-wayback-machine/index.html
  • 22. @weiglemc, @WebSciDL Pro tip: submit pages to multiple archives June 6, 2018 - #WADL2018 at 22 https://twitter.com/phonedude_mln/status/998948823845261312
  • 23. @weiglemc, @WebSciDL I want to enable the personal use of web archives… by the general public June 6, 2018 - #WADL2018 at 23
  • 24. @weiglemc, @WebSciDL Web archives to the rescue! June 6, 2018 - #WADL2018 at 24 https://twitter.com/brian3354/status/966081774194511874
  • 25. @weiglemc, @WebSciDL Is it really that important to archive instead of just taking a screenshot? June 6, 2018 - #WADL2018 at 25 https://twitter.com/AngryBlackLady/status/990032514080108544 https://twitter.com/phonedude_mln/status/990070331737100288
  • 26. @weiglemc, @WebSciDL We should be doing both June 6, 2018 - #WADL2018 at 26 https://twitter.com/conspirator0/status/1000475042017366017
  • 27. @weiglemc, @WebSciDL What have we been doing to make this easier? June 6, 2018 - #WADL2018 at 27
  • 28. @weiglemc, @WebSciDL We wanted to help people create and access local archives June 6, 2018 - #WADL2018 at 28
  • 29. @weiglemc, @WebSciDL We wanted to help people create and access local archives • WARCreate – Google Chrome extension • WAIL – user-friendly Heritrix and OpenWayback • WAIL-Electron – adds browser-based crawling, pywb June 6, 2018 - #WADL2018 at 29 “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2013-2017, HD-51670-13 and HK-50181-14
  • 30. @weiglemc, @WebSciDL WARCreate (2012) June 6, 2018 - #WADL2018 at 30 Mat Kelly and Michele C. Weigle, "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage”, JCDL 2012 demo. http://ws-dl.blogspot.com/2013/07/2013-07-10-warcreate-and-wail-warc.html Google Chrome extension Create local WARC file of currently viewed webpage http://warcreate.com “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2013-2017, HD-51670-13 and HK-50181-14
  • 31. @weiglemc, @WebSciDL WAIL (2013) June 6, 2018 - #WADL2018 at 31 Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving Using XAMPP," Poster and demo at Personal Digital Archiving, 2013. http://ws-dl.blogspot.com/2016/06/2016-06-03-lipstick-or-ham-next-steps.html Stand-alone application Easy install of Heritrix, OpenWayback Replay local WARCs created with WARCreate http://machawk1.github.io/wail/ “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2013-2017, HD-51670-13 and HK-50181-14
  • 32. @weiglemc, @WebSciDL WAIL-Electron (2017) June 6, 2018 - #WADL2018 at 32 John Berlin, Mat Kelly, Michael L. Nelson and Michele C. Weigle, "WAIL: Collection-Based Personal Web Archiving," JCDL 2017, poster. http://ws-dl.blogspot.com/2017/02/2017-02-13-electric-wails-and-ham.html http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html Update of original WAIL Adds headless Chrome-based crawling OpenWayback -> pywb https://github.com/N0taN3rd/wail “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2013-2017, HD-51670-13 and HK-50181-14
  • 33. @weiglemc, @WebSciDL What did we learn from this? • We need additional Memento support for private web archives • Capturing complex webpages is hard June 6, 2018 - #WADL2018 at 33
  • 34. @weiglemc, @WebSciDL A Memento Meta Aggregator can aggregate public and private archives (2018) June 6, 2018 - #WADL2018 at 34 Mat Kelly, Michael L. Nelson, and Michele C. Weigle, "A Framework for Aggregating Private and Public Web Archives", JCDL 2018
  • 35. @weiglemc, @WebSciDL Today’s webpages are super complex June 6, 2018 - #WADL2018 at 35 number of network requests per page John Berlin, "To Relive The Web: A Framework for the Transformation and Archival Replay of Web Pages," ODU Master’s Thesis, 2018.
  • 36. @weiglemc, @WebSciDL Squidwarc enables high-fidelity browser-based archiving (2017) June 6, 2018 - #WADL2018 at 36 John Berlin, "2017-07-24: Replacing Heritrix with Chrome in WAIL, and the release of node-warc, node- cdxj, and Squidwarc” http://ws-dl.blogspot.com/2017/07/2017-07-24-replacing-heritrix-with.html High fidelity archival crawler node.js based Uses Chrome or Chrome Headless “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2013-2017, HD-51670-13 and HK-50181-14 https://github.com/N0taN3rd/Squidwarc
  • 37. @weiglemc, @WebSciDL We wanted to help people submit webpages to public archives June 6, 2018 - #WADL2018 at 37
  • 38. @weiglemc, @WebSciDL We wanted to help people submit webpages to public archives • Mink – Google Chrome extension • #icanhazmemento – Twitter bot • ArchiveNow – Python module, Docker container, local web service June 6, 2018 - #WADL2018 at 38
  • 39. @weiglemc, @WebSciDL Mink (2014) June 6, 2018 - #WADL2018 at 39 “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2014-2017, HK-50181-14 Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento," JCDL 2014, poster. http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html Google Chrome extension Submit currently viewed webpage to public archives Access mementos from public archives of currently viewed webpage Inspired by LANL’s Memento for Chrome, http://ws- dl.blogspot.com/2013/10/2013-10- 14-right-click-to-past-memento.html https://github.com/machawk1/Mink
  • 40. @weiglemc, @WebSciDL Mink (2014) June 6, 2018 - #WADL2018 at 40 “Archive What I See Now: Bringing Institutional Web Archiving Tools to the Individual Researcher”, 2014-2017, HK-50181-14 Mat Kelly, Michael L. Nelson and Michele C. Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento," JCDL 2014, poster. http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html Google Chrome extension Submit currently viewed webpage to public archives Access mementos from public archives of currently viewed webpage Inspired by LANL’s Memento for Chrome, http://ws- dl.blogspot.com/2013/10/2013-10- 14-right-click-to-past-memento.html https://github.com/machawk1/Mink
  • 41. @weiglemc, @WebSciDL #icanhazmemento (2015) June 6, 2018 - #WADL2018 at 41 http://ws-dl.blogspot.com/2015/07/2015-07-22-i-can-haz-memento.html Twitter bot Include #icanhazmemento in a tweet with a URL Bot replies with a link to the memento of the page closest to the time of the tweet If page not archived, bot submits URL to multiple public archives, replies with a link to the memento in Time Travel Alexander Nwala, "2015-07-22: I Can Haz Memento," https://github.com/anwala/icanhazmemento
  • 42. @weiglemc, @WebSciDL ArchiveNow (2017) June 6, 2018 - #WADL2018 at 42 Mohamed Aturban, Mat Kelly, Sawood Alam, John Berlin, Michael L. Nelson and Michele C. Weigle, "ArchiveNow: Simplified, Extensible, Multi-Archive Preservation," JCDL 2018, poster. http://ws-dl.blogspot.com/2017/02/2017-02-22-archive-now-archivenow.html Python module, Docker container Submit URI to multiple archives Generate local WARCs for private archives “Towards a Web-Centric Approach for Capturing the Scholarly Record”, 2016-2019 https://github.com/oduwsdl/archivenow
  • 43. @weiglemc, @WebSciDL What did we learn from this? • People want tools to help them submit to public archives • Browser extensions are cool, but don't have much uptake • more on this later… June 6, 2018 - #WADL2018 at 43
  • 44. @weiglemc, @WebSciDL We wanted to help people summarize their archives June 6, 2018 - #WADL2018 at 44
  • 45. @weiglemc, @WebSciDL We wanted to help people summarize their archives • Dark and Stormy Archives (DSA) – Archive-It + Storify • MementoEmbed – web service • #whatdiditlooklike – Twitter bot • Alsummarization – algorithm and web service • TimeMap Visualization, tmvis – node.js- based web service of alsummarization June 6, 2018 - #WADL2018 at 45
  • 46. @weiglemc, @WebSciDL "Dark and Stormy" Archives (2016) June 6, 2018 - #WADL2018 at 46 Characteristicsof human-generated Stories Characteristicsof Archive-It collections Exclude duplicates Exclude off-topic pages Exclude non-English Language Dynamically slice the collection Cluster the pages in each slice Select high-quality pages from each cluster Order pages by time Visualize Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson, "Generating Stories From Archived Collections," ACM WebSci 2017. http://ws-dl.blogspot.com/2016/09/2016-09-20-promising-scene-at-end-of.html “Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018. http://ws-dl.blogspot.com/2017/12/2017-12-14-storify-will-be-gone-soon-so.html
  • 47. @weiglemc, @WebSciDL MementoEmbed (2018) June 6, 2018 - #WADL2018 at 47 Python module, Docker container Submit URI-M Returns an archive-aware social card, with HTML embed code “Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant https://github.com/oduwsdl/MementoEmbed (currently in development) http://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018.
  • 48. @weiglemc, @WebSciDL MementoEmbed (2018) June 6, 2018 - #WADL2018 at 48 “Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant http://ws-dl.blogspot.com/2018/04/2018-04-24-lets-get-visual-and-examine.html Shawn Jones, "Improving Collection Understanding in Web Archives," JCDL Doctoral Consortium, 2018. https://github.com/oduwsdl/MementoEmbed (currently in development) Python module, Docker container Submit URI-M Returns an archive-aware social card, with HTML embed code
  • 49. @weiglemc, @WebSciDL #whatdiditlooklike (2015) June 6, 2018 - #WADL2018 at 49 http://ws-dl.blogspot.com/2015/01/2015-02-05-what-did-it-look-like.html Twitter bot Include #whatdiditlooklike in a tweet with a URL Bot generates animated GIF of first memento of each year Bot replies with a link to entry in Tumblr Tumblr: http://whatdiditlooklike.mementoweb.org/ Source: https://github.com/anwala/wdill Alexander Nwala, "2015-02-05: What Did It Look Like?,"
  • 50. @weiglemc, @WebSciDL Alsummarization (2014) June 6, 2018 - #WADL2018 at 50 Ahmed Alsum and Michael L. Nelson, "Thumbnail Summarization Techniques for Web Archives," ECIR 2014. Summarize TimeMap Compare SimHash of HTML, not images Hamming distance threshold of 4 characters “Visualizing Digital Collections of Web Archives”, 2014-2015, Columbia Libraries Web Archiving Incentive Program Mat Kelly, Michael L. Nelson, and Michele C. Weigle, "Visualizing Digital Collections of Web Archives," Web Archiving Collaboration, 2015, http://ws-dl.blogspot.com/2015/06/2015-06-09-web-archiving- collaboration.html 700 thumbnails 32 sampled thumbnails CoverFlow view https://github.com/machawk1/ArchiveThumbnails
  • 51. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 51 M1 M2 M3 M4
  • 52. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 52 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4
  • 53. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 53 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4 M1
  • 54. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 54 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4 Hamming distance (M1, M2) < 4 reject M2 M1 basis
  • 55. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 55 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4 Hamming distance (M1, M3) > 4 select M3 M1 basis
  • 56. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 56 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4 M1 M3 Hamming distance (M3, M4) > 4 select M4 basis
  • 57. @weiglemc, @WebSciDL Choosing mementos based on SimHash June 6, 2018 - #WADL2018 at 57 8c27981eaed151cfa645ad823932eac6 8c27981eaad951cf8645ad823932eac6 fa3799170258494b9443b9be3977a84e 5a1534161357da6b827ab98037db2640 M1 M2 M3 M4 M1 M3 M4
  • 58. @weiglemc, @WebSciDL TimeMap Visualization, tmvis (2017) June 6, 2018 - #WADL2018 at 58 “Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17 http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html Web service Takes URI-R or URI-T Performs Alsummarization and produces grid view, image slider view, and timeline view Will produce embeddable version, Wayback extension https://github.com/oduwsdl/tmvis Surbhi Shankar, "Visualizing Thumbnails Of Archived Web Pages", ODU MS Project, 2017 Maheedhar Gunnam, "How I Changed Over Time: A webservice to summarize TimeMaps based on SimHashed HTML content", ODU MS Project, 2018
  • 59. @weiglemc, @WebSciDL tmvis – Grid View June 6, 2018 - #WADL2018 at 59 “Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17 http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
  • 60. @weiglemc, @WebSciDL tmvis– Image Slider View June 6, 2018 - #WADL2018 at 60 “Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17 http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html
  • 61. @weiglemc, @WebSciDL tmvis – Timeline View June 6, 2018 - #WADL2018 at 61 “Visualizing Webpage Changes Over Time”, 2017-2019, HAA-256368-17 http://ws-dl.blogspot.com/2017/10/2017-10-16-visualizing-webpage-changes.html Uses Propublica’s TimelineSetter library, http://propublica.github.io/timeline-setter/
  • 62. @weiglemc, @WebSciDL What did we learn from this? • Webpages can go off-topic through time • Some mementos aren't captured well • Some mementos aren't replayed well June 6, 2018 - #WADL2018 at 62
  • 63. @weiglemc, @WebSciDL You don't want off-topic mementos in your summary June 6, 2018 - #WADL2018 at 63 2012-01-10, 01:41:57 2012-04-10, 03:26:34 2012-04-17, 03:26:15 2012-04-24, 03:36:58 2012-05-15, 03:47:04 http://wayback.archive-it.org/2950/*/http://www.indyows.org 2012-07-03, 12:18:48
  • 64. @weiglemc, @WebSciDL Identify off-topic mementos with Off-Topic Memento Toolkit (2018) June 6, 2018 - #WADL2018 at 64 “Tools for Managing Seed URIs”, 2014-2015, Columbia Libraries Web Archiving Incentive Program “Combining Social Media Storytelling With Web Archives”, 2015-2019, IMLS National Leadership Grant Shawn Jones, Michele C. Weigle, and Michael L. Nelson, ”The Off-Topic Memento Toolkit," iPres 2018. Yasmin AlNoamany, Michele C. Weigle, and Michael L. Nelson, "Detecting Off-Topic Pages Within TimeMaps in Web Archives," IJDL, Vol. 17, No. 3, July 2016. Python module Given a URI-T (TimeMap), identifies off-topic mementos Option of 8 different similarity measures OTMT Distribution Page: https://pypi.org/project/otmt/ OTMT Source Code Page: https://github.com/oduwsdl/off-topic-memento- toolkit {"http://wayback.archive- it.org/1068/timemap/link/http://www.badil.org/": { "http://wayback.archive- it.org/1068/20130307084848/http://www. badil.org/": { "timemap measures": { "cosine": { "stemmed": true, "tokenized": true, "removed boilerplate": true, "comparison score": 0.10969941307631487, "topic status": "off-topic" }, "bytecount": { "stemmed": false, "tokenized": false, "removed boilerplate": false, "comparison score": 0.15971409055425445, "topic status": "on-topic" } }, "overall topic status": "off-topic" }, ...
  • 65. @weiglemc, @WebSciDL You don't want damaged mementos in your summary June 6, 2018 - #WADL2018 at 65 https://wayback.archive-it.org/1068/*/http://aappb.org/
  • 66. @weiglemc, @WebSciDL Memento Damage can tell you how damaged your mementos are (2017) June 6, 2018 - #WADL2018 at 66 Web service, Docker container Given URI-M, calculates and analyzes memento damage Service: http://memento-damage.cs.odu.edu Github: https://github.com/oduwsdl/web- memento-damage “Increasing the Value of Existing Web Archives,” 2015-2019, III 1526700 Erika Siregar, “Deploying the Memento Damage Service: A Comprehensive Tool for Measuring and Analyzing Damage on Web Archives”, ODU MS Project, 2017. Justin Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle and Michael L. Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources," IJDL, Vol. 16, No. 3-4, September 2015. http://ws-dl.blogspot.com/2017/11/2017-11-22-deploying-memento-damage.html
  • 67. @weiglemc, @WebSciDL Memento Damage can tell you how damaged your mementos are (2017) June 6, 2018 - #WADL2018 at 67 Erika Siregar, “Deploying the Memento Damage Service: A Comprehensive Tool for Measuring and Analyzing Damage on Web Archives”, ODU MS Project, 2017. Justin Brunelle, Mat Kelly, Hany SalahEldeen, Michele C. Weigle and Michael L. Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources," IJDL, Vol. 16, No. 3-4, September 2015. Web service, Docker container Given URI-M, calculates and analyzes memento damage Service: http://memento-damage.cs.odu.edu Github: https://github.com/oduwsdl/web- memento-damage http://ws-dl.blogspot.com/2017/11/2017-11-22-deploying-memento-damage.html “Increasing the Value of Existing Web Archives,” 2015-2019, III 1526700
  • 68. @weiglemc, @WebSciDL Wayback++ uses client-side rewriting to fix replay-based damaged mementos (2018) June 6, 2018 - #WADL2018 at 68 Chrome, Firefox extensions https://github.com/N0taN3rd/ WaybackPlusPlus https://www.youtube.com/watch?v=ldyidcaVXHw John Berlin, Michael L. Nelson, and Michele C. Weigle, "Swimming In A Sea Of JavaScript, Or: How I Learned To Stop Worrying And Love High-Fidelity Replay," WADL 2018. http://ws-dl.blogspot.com/2017/01/2017-01-20-cnncom-has-been-unarchivable.html http://ws-dl.blogspot.com/2018/04/2018-05-01-high-fidelity-ms-thesis-to.html
  • 69. @weiglemc, @WebSciDL Where does this take us? June 6, 2018 - #WADL2018 at 69
  • 70. @weiglemc, @WebSciDL We’ve developed a lot of tools June 6, 2018 - #WADL2018 at 70
  • 71. @weiglemc, @WebSciDL But, can a full professor use them? June 6, 2018 - #WADL2018 at 71 Frederick P. Brooks, Jr.. 1996. The computer scientist as toolsmith II. Commun. ACM 39, 3 (March 1996), 61-68. Fred Brooks says:
  • 72. @weiglemc, @WebSciDL So, let's think bigger • In a world where the web browser is the Internet, how can we make web archives ubiquitous? June 6, 2018 - #WADL2018 at 72
  • 73. @weiglemc, @WebSciDL So, let's think bigger • In a world where the web browser is the Internet, how can we make web archives ubiquitous? • Bring web archives to the browser - natively June 6, 2018 - #WADL2018 at 73 Michele C. Weigle, Michael L. Nelson, Martin Klein, and Herbert Van de Sompel, “The Case for Memento-Aware Browsers”, 2017
  • 74. @weiglemc, @WebSciDL What if browsers could natively identify mementos? • Look for Memento-Datetime header in HTTP response Memento-Datetime: Tue, 08 May 2012 11:24:30 GMT • Use client-side rewriting (Emu) to improve replay • Use native UI elements to annotate composite mementos June 6, 2018 - #WADL2018 at 74
  • 75. @weiglemc, @WebSciDL Identify mementos in the address bar June 6, 2018 - #WADL2018 at 75
  • 76. @weiglemc, @WebSciDL Identify mementos in the address bar June 6, 2018 - #WADL2018 at 76 Archive http://web.archive.org/web/2014030402052012/... Could also identify non-HTML mementos (images, PDF, etc.)
  • 77. @weiglemc, @WebSciDL Identify temporal inconsistencies June 6, 2018 - #WADL2018 at 77 Archive http://web.archive.org/web/20050601025530/.. . Scott Ainsworth, http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html
  • 78. @weiglemc, @WebSciDL Identify temporal inconsistencies June 6, 2018 - #WADL2018 at 78 Archive http://web.archive.org/web/20050601025530/.. . Scott Ainsworth, http://ws-dl.blogspot.com/2015/12/2015-12-08-evaluating-temporal.html + 5 Years, 11 months (Apr 6, 2011)
  • 79. @weiglemc, @WebSciDL What if browsers could natively interact with Memento aggregators? • Alert users of unarchived pages as they browse • Provide UI elements to summarize and access past versions of the current webpage • Integrate web archives and the past web into “New Tab View” June 6, 2018 - #WADL2018 at 79
  • 80. @weiglemc, @WebSciDL What if browsers could natively interpret and replay WARCs? • Users could share WARCs • Recipient could open the WARC directly in their browser • WARC.js (ala PDF.js for WARCs) June 6, 2018 - #WADL2018 at 80
  • 81. @weiglemc, @WebSciDL What if browsers could natively create mementos? • Push to public web archives • Create local WARCs June 6, 2018 - #WADL2018 at 81 https://twitter.com/conspirator0/status/1000475042017366017 Just as easily as taking a screenshot or maybe along with taking a screenshot
  • 82. @weiglemc, @WebSciDL Firefox Quantum has brought screenshots natively to the browser June 6, 2018 - #WADL2018 at 82
  • 83. @weiglemc, @WebSciDL Saving full page screenshot June 6, 2018 - #WADL2018 at 83
  • 84. @weiglemc, @WebSciDL Screenshots can be saved in the Mozilla cloud June 6, 2018 - #WADL2018 at 84
  • 85. @weiglemc, @WebSciDL Screenshots have a URI June 6, 2018 - #WADL2018 at 85 https://screenshots.firefox.com/MhV6otMl6r2YWOXc/2018.jcdl.org
  • 86. @weiglemc, @WebSciDL What if these screenshots were Memento-enabled? • Provide Memento HTTP headers for the screenshots • Implement Memento datetime negotiation for the entire screenshot cloud service June 6, 2018 - #WADL2018 at 86
  • 87. @weiglemc, @WebSciDL We could build a crowd-sourced archive of screenshots • Take screenshot and save to Memento- enabled screenshot cloud • Option to push live webpage to archive at same time • Then we have both an archived page and a screenshot of the page from very close to the same datetime June 6, 2018 - #WADL2018 at 87
  • 88. @weiglemc, @WebSciDL What about bookmarks? June 6, 2018 - #WADL2018 at 88 submit to public web archives local archive saved to ~/Library/WebArchive/ Bookmarking becomes archiving
  • 89. @weiglemc, @WebSciDL Viewing a bookmark becomes an opportunity to interact with archives June 6, 2018 - #WADL2018 at 89
  • 90. @weiglemc, @WebSciDL Memento Embeds for bookmark view June 6, 2018 - #WADL2018 at 90
  • 91. @weiglemc, @WebSciDL Open live web, local memento, or public memento June 6, 2018 - #WADL2018 at 91 Open on live web Open local memento Open public memento
  • 92. @weiglemc, @WebSciDL It’s time for browsers to be Memento-aware • Web archives have gone mainstream. • We’ve learned a lot by building tools to enable personal use of web archives. • These ideas need to be integrated directly into browsers for general public use. June 6, 2018 - #WADL2018 at 92