SlideShare una empresa de Scribd logo
1 de 57
Tools for Managing the Past Web
Dr. Michele C. Weigle
Web Sciences and Digital Libraries (WS-DL) Group
Department of Computer Science
Old Dominion University
ODU - ECE Seminar
February 20, 2015
What is the past web?
February 20, 2015 2
Why should I care about the
past web and web archives?
The Web holds our stories
February 20, 2015 4
But webpages can disappear
• Average lifespan of a webpage: 50-100 days
• A year after publication, about 11% of content
shared on social media will be gone.
February 20, 2015
SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012
http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html
5
Maybe it's archived?
February 20, 2015 6
archive.org/web
Why archives matter
• Malaysia Airlines Flight
17 (MH17)
• Ukrainian separatists
originally took credit for
downing a transport plane
in that location
• Later deleted the post
• Internet Archive had
archived the post before
deletion
February 20, 2015 7
http://www.csmonitor.com/World/Europe/2014/0717/Web-
evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
Web archiving in the news - 2015
February 20, 2015 8
http://www.newyorker.com/magazine/2015/01/26/cobweb
But Wayback is not Google
• Wayback Machine has no full-text search
– too big to be indexed
– 452 billion web pages, 9 petabytes of data
– growing at 20 TB/week
• Enter URL and pick a date
February 20, 2015 9
"It’s more like a phone book than like an archive."
-Jill Lepore, The New Yorker
The Internet Archive isn't the
only archive in town
#ofarchivedpages
How can I access the
archives?
February 20, 2015
MementoFox
Memento for Chrome
http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html
http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html
http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html
Mink
http://www.mementoweb.org
11
TimeTravel
February 20, 2015 12
http://timetravel.mementoweb.org
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 13
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 14
The State of Web Archiving
"Hooray! It's in the archive!"
vs.
"How well was it archived?"
current:
future:
February 20, 2015 15
Damaged Memento
February 20, 2015 16
How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
17
How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
M = 0.24
(missing main)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
18
How damaged are these mementos?
February 20, 2015
M = 0.17
(live web)
M = 0.24
(missing main)
M = 0.29
(missing logo + navigation)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
19
How damaged are these mementos?
February 20, 2015
M = 0.17
D = 0.09
(live web)
M = 0.24
D = 0.41
(missing main)
M = 0.29
D = 0.36
(missing logo + navigation)
Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing
Resources", JCDL 2014, Best Student Paper
20
How to detect damage?
February 20, 2015
vs.
Brunelle et al., JCDL 2014
21
February 20, 2015
Good News:
Although M is steady/increasing, D is decreasing
22
M = percentage missing
D = our damage metric
Sampled 45,000 mementos
- one memento/year of ~1850 webpages
- webpages from Bitly URIs shared over Twitter and Archive-It collections
Brunelle et al., JCDL 2014
Using JavaScript can result in
damaged mementos
February 20, 2015 23
JavaScript is
responsible for an
increasing proportion
of missing embedded
resources over time.
Brunelle, Kelly, Weigle and Nelson, "The Impact of JavaScript on Archivability," International Journal of Digital Libraries (IJDL), 2015
http://ws-dl.blogspot.com/2012/10/2012-10-10-zombies-in-archives.html
Sept 3, 2008
2012
Sometimes the live web "leaks" into
the archive
February 20, 2015 24
Different parts of a page can be
crawled at different times
February 20, 2015
Ainsworth and Nelson, "Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web
Archive", JCDL 2013
25
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 26
Which page did Chris Hayes
mean to tweet?
February 20, 2015 27
Tweet on Oct 3, 2014
Likely target (captured Oct 1, 2014)
What you see depends on
when you click
February 20, 2015 28
Oct 9, 2014
Oct 10, 2014
Nov 19-Dec 15, 2014 Today (Feb 2015) – now fergusonaction.com
Mapping Tweet Relevance
February 20, 2015 29
SalahEldeen and Nelson, "Reading the Correct History? Modeling Temporal Intention in Resource Sharing”, JCDL 2013
Let the reader choose live or
archived
February 20, 2015 30
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 31
Browsing TimeMaps
February 20, 2015 32
How were
these 4
thumbnails
chosen?
What did usps.com look like?
February 20, 2015 33
http://whatdiditlooklike.mementoweb.org/
Animated GIF
1st memento of each
year
Submit a URL via
Twitter:
“#whatdiditlooklike URL”
Which tells you more about the
past of www.apple.com?
February 20, 2015
700 thumbnails
(not even all of them!)
32 sampled thumbnails
34
AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
TimeMap Thumbnail
Summaries
• Compare HTML, not images
• Compute SimHash of HTML
– result is a string representing the content of
the page
• Calculate Hamming distance between
SimHashes of consecutive mementos
• Generate thumbnails of mementos that have at
least a 4 character difference in SimHash
– threshold too low -> near duplicate images
– threshold too high -> miss important
changes
February 20, 2015 35
3 lines of difference
AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
Grid View
February 20, 2015 36
Cover Flow View
February 20, 2015 37
Embed in Wayback
February 20, 2015 38
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 39
Archive What I See Now
• Humanities
researchers know
they should
archive web
resources
• Standard web
archiving tools are
difficult for non IT
experts
February 20, 2015
"Archive What I See Now", NEH Digital Humanities Implementation Grant, 2014-2017, http://bit.ly/odu-dhig-2014
40
Why not just take a screenshot or
“save as”?
February 20, 2015
Can't interact with
a screenshot
"Save Page As..."output is
difficult to keep organized --
especially with multiple
captures over time
41
What about archiving pages behind
authentication or that change quickly?
February 20, 2015
Facebook - requires login
Twitter - changes faster
than typical crawling rate
42
How we're addressing the problem
• Google Chrome extension
• Archive the current state
of the page in standard
Web Archive (WARC)
format
• Compatible with
Wayback
February 20, 2015 43
Kelly and Weigle, "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage", JCDL 2012
Kelly, Weigle, and Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation
2012, Tools Demo Session
WARCreate
WARCreate - Work in Progress
• New modes of operation
– record mode
• while activated, add capture of each page visited to the
WARC
– countdown mode
• every interval, refresh and add new capture of page
– event mode
• add new capture of page every time it dynamically
reloads or refreshes
February 20, 2015 44
What to do with created WARCs?
February 20, 2015 45
Kelly, Weigle, and Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital
Archiving 2013, Poster Session
Kelly, Nelson, and Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013
WAIL
• Load created WARCs into
a Wayback instance on
your local computer
• Single-click install of
Wayback (and other
archiving tools)
• Available for Windows,
OS X
Bridging the gap between the past web
and the live web
February 20, 2015
Mink
46
Kelly, Nelson, and Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento,"
poster, ACM/IEEE Digital Libraries (DL), September 2014.
• Google Chrome extension
• For each page you visit,
displays the number of
archived versions available
• Provides access by date
• Allows for submission to
public archiving services
Tools
February 20, 2015 47
WARCreate
Mink
WAIL
https://ws-dl.cs.odu.edu/Software
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 48
Storify
February 20, 2015
https://storify.com/nzherald/mu
49
Bookmarking is not preserving
February 20, 2015 50
Bookmarking is not preserving
February 20, 2015 51
Archive-It Collections
February 20, 2015 52
https://archive-it.org/collections/2358
Storytelling For Archives
Archived collectionsStorytelling services
Archived enriched
stories
February 20, 2015 53
AlNoamany, "Using Web Archives to Enrich the Live Web Experience Through Storytelling", TCDL Bulletin, December 2013.
Tools for Storytelling
• Tools for Users
– use existing tools like Storify to view the stories of
a collection
• Tools for Curators
– use existing stories to augment your collections
– create stories from your collections
• candidate mementos automatically selected
February 20, 2015 54
Story Types
Fixed Page – Fixed Time:
differences in GeoIP,
mobile, etc.
Fixed Page – Sliding Time:
evolution of a single page
(or domain) through time
Sliding Page – Fixed Time:
different perspectives on a
point in time
Sliding Page – Sliding Time:
broadest possible coverage
of a collection
same
Time
different
URI
same
different
Issues: topic modeling, eliminating duplicates, maximizing
novelty, structural & content quality
February 20, 2015 55
ODU WS-DL Projects
Tools for Managing the Past Web
• Archive Quality
• Tweet Intention
• TimeMap
Summaries
• Archive What I See
Now
• Storytelling for
Archives
February 20, 2015 56
Web Sciences and Digital Libraries
Group (WS-DL)
• Scott Ainsworth
• Sawood Alam
• Lulwah Alkwai
• Yasmin AlNoamany
• Mohamed Aturban
• Justin Brunelle
• Mat Kelly
• Corren McCoy
• Shawn Jones
• Amara Naas
• Louis Nguyen
• Alexander Nwala
• Hany SalahEldeen
@WebSciDL
http://ws-dl.cs.odu.edu/
http://ws-dl.blogspot.com/
Dr. Michele C. Weigle
mweigle@cs.odu.edu
@weiglemc
http://www.cs.odu.edu/~mweigle/
February 20, 2015 57
Faculty
• Dr. Michael L. Nelson
• Dr. Michele C. Weigle
PhD Students

Más contenido relacionado

La actualidad más candente

TEDx: Ideas Worth Spreading (Tweets and Retweets)
TEDx: Ideas Worth Spreading (Tweets and Retweets)TEDx: Ideas Worth Spreading (Tweets and Retweets)
TEDx: Ideas Worth Spreading (Tweets and Retweets)Baker Publishing Company
 
Choose Your Own WIki Adventure - V2
Choose Your Own WIki Adventure - V2Choose Your Own WIki Adventure - V2
Choose Your Own WIki Adventure - V2Dan McDowell
 
Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocksnaveenv
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKSnaveenv
 
7steps Flatten Classroom - NCTIES 1145
7steps Flatten Classroom - NCTIES 11457steps Flatten Classroom - NCTIES 1145
7steps Flatten Classroom - NCTIES 1145Vicki Davis
 
Why Should I Care? New Technologies for Libraries & Librarians
Why Should I Care? New Technologies for Libraries & LibrariansWhy Should I Care? New Technologies for Libraries & Librarians
Why Should I Care? New Technologies for Libraries & LibrariansNicole C. Engard
 
Wiki Summer Training2
Wiki Summer Training2Wiki Summer Training2
Wiki Summer Training2Robin Young
 
Building Collaborative Applications with Wikis
Building Collaborative Applications with WikisBuilding Collaborative Applications with Wikis
Building Collaborative Applications with WikisMeredith Farkas
 
Wikispaces MRA 2014
Wikispaces MRA 2014Wikispaces MRA 2014
Wikispaces MRA 2014Melissa Aho
 
Wikis For Teachers
Wikis For TeachersWikis For Teachers
Wikis For TeachersFadzWilson
 
Web 2.0 Tools: Outreach and Community Building
Web 2.0 Tools: Outreach and Community BuildingWeb 2.0 Tools: Outreach and Community Building
Web 2.0 Tools: Outreach and Community BuildingBrian Gray
 
Its Wikiriffic
Its WikirifficIts Wikiriffic
Its WikirifficKaren Luik
 
How Flat is Your Classroom?
How Flat is Your Classroom?How Flat is Your Classroom?
How Flat is Your Classroom?Julie Lindsay
 
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"Daniel Bryant
 
Sinsai.info - How open collaboration helps disaster-affected people.
Sinsai.info - How open collaboration helps disaster-affected people.Sinsai.info - How open collaboration helps disaster-affected people.
Sinsai.info - How open collaboration helps disaster-affected people.Hal Seki
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitShawn Jones
 
Flattening Classrooms Expanding Minds
Flattening Classrooms Expanding MindsFlattening Classrooms Expanding Minds
Flattening Classrooms Expanding MindsVicki Davis
 

La actualidad más candente (19)

TEDx: Ideas Worth Spreading (Tweets and Retweets)
TEDx: Ideas Worth Spreading (Tweets and Retweets)TEDx: Ideas Worth Spreading (Tweets and Retweets)
TEDx: Ideas Worth Spreading (Tweets and Retweets)
 
Choose Your Own WIki Adventure - V2
Choose Your Own WIki Adventure - V2Choose Your Own WIki Adventure - V2
Choose Your Own WIki Adventure - V2
 
Wiki Technology By It Rocks
Wiki Technology By It RocksWiki Technology By It Rocks
Wiki Technology By It Rocks
 
Wiki Technology By IT ROCKS
Wiki Technology By IT ROCKSWiki Technology By IT ROCKS
Wiki Technology By IT ROCKS
 
7steps Flatten Classroom - NCTIES 1145
7steps Flatten Classroom - NCTIES 11457steps Flatten Classroom - NCTIES 1145
7steps Flatten Classroom - NCTIES 1145
 
Why Should I Care? New Technologies for Libraries & Librarians
Why Should I Care? New Technologies for Libraries & LibrariansWhy Should I Care? New Technologies for Libraries & Librarians
Why Should I Care? New Technologies for Libraries & Librarians
 
Wiki Summer Training2
Wiki Summer Training2Wiki Summer Training2
Wiki Summer Training2
 
Building Collaborative Applications with Wikis
Building Collaborative Applications with WikisBuilding Collaborative Applications with Wikis
Building Collaborative Applications with Wikis
 
Wikispaces MRA 2014
Wikispaces MRA 2014Wikispaces MRA 2014
Wikispaces MRA 2014
 
Wikis For Teachers
Wikis For TeachersWikis For Teachers
Wikis For Teachers
 
Web 2.0 Tools: Outreach and Community Building
Web 2.0 Tools: Outreach and Community BuildingWeb 2.0 Tools: Outreach and Community Building
Web 2.0 Tools: Outreach and Community Building
 
Wikiworld for TETC
Wikiworld for TETCWikiworld for TETC
Wikiworld for TETC
 
Its Wikiriffic
Its WikirifficIts Wikiriffic
Its Wikiriffic
 
How Flat is Your Classroom?
How Flat is Your Classroom?How Flat is Your Classroom?
How Flat is Your Classroom?
 
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"
OSCON EU 2016 "Seven (More) Deadly Sins of Microservices"
 
Wikiworld4 L Uclass
Wikiworld4 L UclassWikiworld4 L Uclass
Wikiworld4 L Uclass
 
Sinsai.info - How open collaboration helps disaster-affected people.
Sinsai.info - How open collaboration helps disaster-affected people.Sinsai.info - How open collaboration helps disaster-affected people.
Sinsai.info - How open collaboration helps disaster-affected people.
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
Flattening Classrooms Expanding Minds
Flattening Classrooms Expanding MindsFlattening Classrooms Expanding Minds
Flattening Classrooms Expanding Minds
 

Similar a 2015-odu-ece-tools-for-past-web

Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Don Boozer
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...WiLS
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunitiesAhmed AlSum
 
Museum Websites
Museum WebsitesMuseum Websites
Museum WebsitesWiLS
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past WebMichele Weigle
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web ArchivesMichael Nelson
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...The Frick Collection
 
Power to the Users (and Librarians)
Power to the Users (and Librarians)Power to the Users (and Librarians)
Power to the Users (and Librarians)Guus van den Brekel
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!daveyp
 
Digital collections: Increasing awareness and use
Digital collections:  Increasing awareness and useDigital collections:  Increasing awareness and use
Digital collections: Increasing awareness and useButtes
 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...CILIP MDG
 
TCEA 2011 Presentation --21st Century Librarians
TCEA 2011 Presentation --21st Century LibrariansTCEA 2011 Presentation --21st Century Librarians
TCEA 2011 Presentation --21st Century Librarianstechnolibrary
 
Wordpressrefreshnew2013
Wordpressrefreshnew2013Wordpressrefreshnew2013
Wordpressrefreshnew2013WiLS
 
Library websites of the future
Library websites of the futureLibrary websites of the future
Library websites of the futureRachel Vacek
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnershipdorohoward
 
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01mfc619
 

Similar a 2015-odu-ece-tools-for-past-web (20)

Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?Wikipedia: Why? Who? and How?
Wikipedia: Why? Who? and How?
 
Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...Digitization Basics for Archives and Special Collections – Part 1: Select and...
Digitization Basics for Archives and Special Collections – Part 1: Select and...
 
Davis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for LibrariesDavis Digital Preservation and the Web: Challenges for Libraries
Davis Digital Preservation and the Web: Challenges for Libraries
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Museum Websites
Museum WebsitesMuseum Websites
Museum Websites
 
Tools for Managing the Past Web
Tools for Managing the Past WebTools for Managing the Past Web
Tools for Managing the Past Web
 
Combining Storytelling and Web Archives
Combining Storytelling and Web ArchivesCombining Storytelling and Web Archives
Combining Storytelling and Web Archives
 
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
Making the Black Hole Gray: Web Archiving Art Resources at New York Art Resou...
 
Power to the Users (and Librarians)
Power to the Users (and Librarians)Power to the Users (and Librarians)
Power to the Users (and Librarians)
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!
 
Digital collections: Increasing awareness and use
Digital collections:  Increasing awareness and useDigital collections:  Increasing awareness and use
Digital collections: Increasing awareness and use
 
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
An introduction to the Wikidata Thesis Toolkit / Helen Williams (London Schoo...
 
OnlineInfo2008wikis
OnlineInfo2008wikisOnlineInfo2008wikis
OnlineInfo2008wikis
 
TCEA 2011 Presentation --21st Century Librarians
TCEA 2011 Presentation --21st Century LibrariansTCEA 2011 Presentation --21st Century Librarians
TCEA 2011 Presentation --21st Century Librarians
 
Wikis Towson OTS
Wikis Towson OTSWikis Towson OTS
Wikis Towson OTS
 
Wordpressrefreshnew2013
Wordpressrefreshnew2013Wordpressrefreshnew2013
Wordpressrefreshnew2013
 
Library websites of the future
Library websites of the futureLibrary websites of the future
Library websites of the future
 
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for PartnershipWikipedia & Cultural Heritage Institutions: Opportunities for Partnership
Wikipedia & Cultural Heritage Institutions: Opportunities for Partnership
 
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01
Usingweb20toolstoenlivenprojectsnov20 091120174410 Phpapp01
 
Wikis
WikisWikis
Wikis
 

Más de Michele Weigle

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Michele Weigle
 
WS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
 
Intro to Web Archiving
Intro to Web ArchivingIntro to Web Archiving
Intro to Web ArchivingMichele Weigle
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesMichele Weigle
 
Visualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeVisualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeMichele Weigle
 
How to Write an Academic Paper
How to Write an Academic PaperHow to Write an Academic Paper
How to Write an Academic PaperMichele Weigle
 
How to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationHow to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationMichele Weigle
 
My Academic Story via Internet Archive
My Academic Story via Internet ArchiveMy Academic Story via Internet Archive
My Academic Story via Internet ArchiveMichele Weigle
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksMichele Weigle
 
Strategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseStrategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseMichele Weigle
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCMichele Weigle
 
Energy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksEnergy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksMichele Weigle
 
2015-capwic-gradschool
2015-capwic-gradschool2015-capwic-gradschool
2015-capwic-gradschoolMichele Weigle
 
TDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsTDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsMichele Weigle
 
Visualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItVisualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItMichele Weigle
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItMichele Weigle
 
Communications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksCommunications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksMichele Weigle
 
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksA Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksMichele Weigle
 
A Framework for Incident Detection And Notification in Vehicular Ad Hoc Networks
A Framework for Incident Detection And Notification in Vehicular Ad Hoc NetworksA Framework for Incident Detection And Notification in Vehicular Ad Hoc Networks
A Framework for Incident Detection And Notification in Vehicular Ad Hoc NetworksMichele Weigle
 

Más de Michele Weigle (20)

Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
Comparing the Archival Rate of Arabic, English, Danish, and Korean Language W...
 
WS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web ArchivesWS-DL’s Work towards Enabling Personal Use of Web Archives
WS-DL’s Work towards Enabling Personal Use of Web Archives
 
Intro to Web Archiving
Intro to Web ArchivingIntro to Web Archiving
Intro to Web Archiving
 
Enabling Personal Use of Web Archives
Enabling Personal Use of Web ArchivesEnabling Personal Use of Web Archives
Enabling Personal Use of Web Archives
 
Visualizing Webpage Changes Over Time
Visualizing Webpage Changes Over TimeVisualizing Webpage Changes Over Time
Visualizing Webpage Changes Over Time
 
How to Write an Academic Paper
How to Write an Academic PaperHow to Write an Academic Paper
How to Write an Academic Paper
 
How to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic PresentationHow to Prepare and Give and Academic Presentation
How to Prepare and Give and Academic Presentation
 
My Academic Story via Internet Archive
My Academic Story via Internet ArchiveMy Academic Story via Internet Archive
My Academic Story via Internet Archive
 
A Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor NetworksA Retasking Framework For Wireless Sensor Networks
A Retasking Framework For Wireless Sensor Networks
 
Strategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency ResponseStrategies for Sensor Data Aggregation in Support of Emergency Response
Strategies for Sensor Data Aggregation in Support of Emergency Response
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Energy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless NanonetworksEnergy Harvesting-aware Design for Wireless Nanonetworks
Energy Harvesting-aware Design for Wireless Nanonetworks
 
2015-capwic-gradschool
2015-capwic-gradschool2015-capwic-gradschool
2015-capwic-gradschool
 
Bits of Research
Bits of ResearchBits of Research
Bits of Research
 
TDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETsTDMA Slot Reservation in Cluster-Based VANETs
TDMA Slot Reservation in Cluster-Based VANETs
 
Visualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-ItVisualizing Digital Collections at Archive-It
Visualizing Digital Collections at Archive-It
 
Information Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-ItInformation Visualization - Visualizing Digital Collections at Archive-It
Information Visualization - Visualizing Digital Collections at Archive-It
 
Communications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor NetworksCommunications and Energy-Harvesting in Nanosensor Networks
Communications and Energy-Harvesting in Nanosensor Networks
 
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc NetworksA Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
A Framework for Dynamic Traffic Monitoring Using Vehicular Ad-Hoc Networks
 
A Framework for Incident Detection And Notification in Vehicular Ad Hoc Networks
A Framework for Incident Detection And Notification in Vehicular Ad Hoc NetworksA Framework for Incident Detection And Notification in Vehicular Ad Hoc Networks
A Framework for Incident Detection And Notification in Vehicular Ad Hoc Networks
 

Último

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 

Último (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

2015-odu-ece-tools-for-past-web

  • 1. Tools for Managing the Past Web Dr. Michele C. Weigle Web Sciences and Digital Libraries (WS-DL) Group Department of Computer Science Old Dominion University ODU - ECE Seminar February 20, 2015
  • 2. What is the past web? February 20, 2015 2
  • 3. Why should I care about the past web and web archives?
  • 4. The Web holds our stories February 20, 2015 4
  • 5. But webpages can disappear • Average lifespan of a webpage: 50-100 days • A year after publication, about 11% of content shared on social media will be gone. February 20, 2015 SalahEldeen and Nelson, "Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost?", TPDL 2012 http://ws-dl.blogspot.com/2012/02/2012-02-11-losing-my-revolution-year.html 5
  • 6. Maybe it's archived? February 20, 2015 6 archive.org/web
  • 7. Why archives matter • Malaysia Airlines Flight 17 (MH17) • Ukrainian separatists originally took credit for downing a transport plane in that location • Later deleted the post • Internet Archive had archived the post before deletion February 20, 2015 7 http://www.csmonitor.com/World/Europe/2014/0717/Web- evidence-points-to-pro-Russia-rebels-in-downing-of-MH17-video
  • 8. Web archiving in the news - 2015 February 20, 2015 8 http://www.newyorker.com/magazine/2015/01/26/cobweb
  • 9. But Wayback is not Google • Wayback Machine has no full-text search – too big to be indexed – 452 billion web pages, 9 petabytes of data – growing at 20 TB/week • Enter URL and pick a date February 20, 2015 9 "It’s more like a phone book than like an archive." -Jill Lepore, The New Yorker
  • 10. The Internet Archive isn't the only archive in town #ofarchivedpages
  • 11. How can I access the archives? February 20, 2015 MementoFox Memento for Chrome http://ws-dl.blogspot.com/2010/03/2010-03-19-mementofox-add-on-released.html http://ws-dl.blogspot.com/2013/10/2013-10-14-right-click-to-past-memento.html http://ws-dl.blogspot.com/2014/10/2014-10-03-integrating-live-and.html Mink http://www.mementoweb.org 11
  • 12. TimeTravel February 20, 2015 12 http://timetravel.mementoweb.org
  • 13. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 13
  • 14. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 14
  • 15. The State of Web Archiving "Hooray! It's in the archive!" vs. "How well was it archived?" current: future: February 20, 2015 15
  • 17. How damaged are these mementos? February 20, 2015 M = 0.17 (live web) Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources", JCDL 2014, Best Student Paper 17
  • 18. How damaged are these mementos? February 20, 2015 M = 0.17 (live web) M = 0.24 (missing main) Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources", JCDL 2014, Best Student Paper 18
  • 19. How damaged are these mementos? February 20, 2015 M = 0.17 (live web) M = 0.24 (missing main) M = 0.29 (missing logo + navigation) Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources", JCDL 2014, Best Student Paper 19
  • 20. How damaged are these mementos? February 20, 2015 M = 0.17 D = 0.09 (live web) M = 0.24 D = 0.41 (missing main) M = 0.29 D = 0.36 (missing logo + navigation) Brunelle, Kelly, SalahEldeen, Weigle, and Nelson, "Not All Mementos Are Created Equal: Measuring the Impact of Missing Resources", JCDL 2014, Best Student Paper 20
  • 21. How to detect damage? February 20, 2015 vs. Brunelle et al., JCDL 2014 21
  • 22. February 20, 2015 Good News: Although M is steady/increasing, D is decreasing 22 M = percentage missing D = our damage metric Sampled 45,000 mementos - one memento/year of ~1850 webpages - webpages from Bitly URIs shared over Twitter and Archive-It collections Brunelle et al., JCDL 2014
  • 23. Using JavaScript can result in damaged mementos February 20, 2015 23 JavaScript is responsible for an increasing proportion of missing embedded resources over time. Brunelle, Kelly, Weigle and Nelson, "The Impact of JavaScript on Archivability," International Journal of Digital Libraries (IJDL), 2015
  • 25. Different parts of a page can be crawled at different times February 20, 2015 Ainsworth and Nelson, "Evaluating Sliding and Sticky Target Policies by Measuring Temporal Drift in Acyclic Walks Through a Web Archive", JCDL 2013 25
  • 26. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 26
  • 27. Which page did Chris Hayes mean to tweet? February 20, 2015 27 Tweet on Oct 3, 2014 Likely target (captured Oct 1, 2014)
  • 28. What you see depends on when you click February 20, 2015 28 Oct 9, 2014 Oct 10, 2014 Nov 19-Dec 15, 2014 Today (Feb 2015) – now fergusonaction.com
  • 29. Mapping Tweet Relevance February 20, 2015 29 SalahEldeen and Nelson, "Reading the Correct History? Modeling Temporal Intention in Resource Sharing”, JCDL 2013
  • 30. Let the reader choose live or archived February 20, 2015 30
  • 31. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 31
  • 32. Browsing TimeMaps February 20, 2015 32 How were these 4 thumbnails chosen?
  • 33. What did usps.com look like? February 20, 2015 33 http://whatdiditlooklike.mementoweb.org/ Animated GIF 1st memento of each year Submit a URL via Twitter: “#whatdiditlooklike URL”
  • 34. Which tells you more about the past of www.apple.com? February 20, 2015 700 thumbnails (not even all of them!) 32 sampled thumbnails 34 AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
  • 35. TimeMap Thumbnail Summaries • Compare HTML, not images • Compute SimHash of HTML – result is a string representing the content of the page • Calculate Hamming distance between SimHashes of consecutive mementos • Generate thumbnails of mementos that have at least a 4 character difference in SimHash – threshold too low -> near duplicate images – threshold too high -> miss important changes February 20, 2015 35 3 lines of difference AlSum and Nelson, "Thumbnail Summarization Techniques for Web Archives", ECIR 2014
  • 39. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 39
  • 40. Archive What I See Now • Humanities researchers know they should archive web resources • Standard web archiving tools are difficult for non IT experts February 20, 2015 "Archive What I See Now", NEH Digital Humanities Implementation Grant, 2014-2017, http://bit.ly/odu-dhig-2014 40
  • 41. Why not just take a screenshot or “save as”? February 20, 2015 Can't interact with a screenshot "Save Page As..."output is difficult to keep organized -- especially with multiple captures over time 41
  • 42. What about archiving pages behind authentication or that change quickly? February 20, 2015 Facebook - requires login Twitter - changes faster than typical crawling rate 42
  • 43. How we're addressing the problem • Google Chrome extension • Archive the current state of the page in standard Web Archive (WARC) format • Compatible with Wayback February 20, 2015 43 Kelly and Weigle, "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage", JCDL 2012 Kelly, Weigle, and Nelson. "WARCreate - Create Wayback-Consumable WARC Files from Any Webpage," Digital Preservation 2012, Tools Demo Session WARCreate
  • 44. WARCreate - Work in Progress • New modes of operation – record mode • while activated, add capture of each page visited to the WARC – countdown mode • every interval, refresh and add new capture of page – event mode • add new capture of page every time it dynamically reloads or refreshes February 20, 2015 44
  • 45. What to do with created WARCs? February 20, 2015 45 Kelly, Weigle, and Nelson. "Making Enterprise-Level Archive Tools Accessible for Personal Web Archiving," Personal Digital Archiving 2013, Poster Session Kelly, Nelson, and Weigle. "WARCreate and WAIL: WARC, Wayback and Heritrix Made Easy," Digital Preservation 2013 WAIL • Load created WARCs into a Wayback instance on your local computer • Single-click install of Wayback (and other archiving tools) • Available for Windows, OS X
  • 46. Bridging the gap between the past web and the live web February 20, 2015 Mink 46 Kelly, Nelson, and Weigle, "Mink: Integrating the Live and Archived Web Viewing Experience Using Web Browsers and Memento," poster, ACM/IEEE Digital Libraries (DL), September 2014. • Google Chrome extension • For each page you visit, displays the number of archived versions available • Provides access by date • Allows for submission to public archiving services
  • 47. Tools February 20, 2015 47 WARCreate Mink WAIL https://ws-dl.cs.odu.edu/Software
  • 48. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 48
  • 50. Bookmarking is not preserving February 20, 2015 50
  • 51. Bookmarking is not preserving February 20, 2015 51
  • 52. Archive-It Collections February 20, 2015 52 https://archive-it.org/collections/2358
  • 53. Storytelling For Archives Archived collectionsStorytelling services Archived enriched stories February 20, 2015 53 AlNoamany, "Using Web Archives to Enrich the Live Web Experience Through Storytelling", TCDL Bulletin, December 2013.
  • 54. Tools for Storytelling • Tools for Users – use existing tools like Storify to view the stories of a collection • Tools for Curators – use existing stories to augment your collections – create stories from your collections • candidate mementos automatically selected February 20, 2015 54
  • 55. Story Types Fixed Page – Fixed Time: differences in GeoIP, mobile, etc. Fixed Page – Sliding Time: evolution of a single page (or domain) through time Sliding Page – Fixed Time: different perspectives on a point in time Sliding Page – Sliding Time: broadest possible coverage of a collection same Time different URI same different Issues: topic modeling, eliminating duplicates, maximizing novelty, structural & content quality February 20, 2015 55
  • 56. ODU WS-DL Projects Tools for Managing the Past Web • Archive Quality • Tweet Intention • TimeMap Summaries • Archive What I See Now • Storytelling for Archives February 20, 2015 56
  • 57. Web Sciences and Digital Libraries Group (WS-DL) • Scott Ainsworth • Sawood Alam • Lulwah Alkwai • Yasmin AlNoamany • Mohamed Aturban • Justin Brunelle • Mat Kelly • Corren McCoy • Shawn Jones • Amara Naas • Louis Nguyen • Alexander Nwala • Hany SalahEldeen @WebSciDL http://ws-dl.cs.odu.edu/ http://ws-dl.blogspot.com/ Dr. Michele C. Weigle mweigle@cs.odu.edu @weiglemc http://www.cs.odu.edu/~mweigle/ February 20, 2015 57 Faculty • Dr. Michael L. Nelson • Dr. Michele C. Weigle PhD Students