SlideShare a Scribd company logo
1 of 22
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
Focused Crawl of Web Archives
to Build Event Collections
Martin Klein
Lyudmila Balakireva
Herbert Van de Sompel
Research Library
Los Alamos National Laboratory
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
2
โ€ข Often orchestrated by subject matter experts, archivists,
special collection librarians, technicians
โ€ข Potentially with guidance from institutional collection policy
โ€ข Results in a list of seeds (URIs, social media accounts, etc)
โ€ข Utilization of crawling services such as Archive-It, Social Feed
Manager
Background โ€“ Event Collection Building
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
3
โ€ข Temporal: time passed since event is of concern
๏ƒ  Use of web archives
โ€ข Selection: seeds often picked manually
๏ƒ  Use of references from Wikipedia pages
โ€ข Relevance: seed assessment often done by humans
๏ƒ  Use of focused crawling with content and temporal
relevance assessment
Inspiration from:
โ€œExtracting Event-Centric Document Collections from Large-Scale Web Archivesโ€
Gerhard Gossen, Elena Demidova, Thomas Risse
https://doi.org/10.1007/978-3-319-67008-9_10
Problems and our Approach
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
4
โ€ข Web archives are an invaluable resource for researchers,
historians, journalists, etc.
โ€ข Often broad in scope, large in scale, covering different
temporal intervals
โ€ข Makes discovery, access, and analysis difficult
Background โ€“ Archived Web
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
5
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
6
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
7
โ€ข Can we create event collections by focused crawling online-
available web archives?
โ€ข How do event collections created from the archived web
compare to those created from the live web?
โ€ข How does the amount of time passed since the event affect
the collections built from the live and the archived web?
โ€ข How do event collections built from the archived web compare
to manually curated collections?
Questions
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
8
โ€ข Topics limited to terror attacks and mass shootings in the U.S.
โ€ข From different times in the past
โ€ข Focused crawl of:
a) 22 archives, simultaneously, via Memento infrastructure
b) the live web
โ€ข Take content and temporal relevance into account, equally
weighted
โ€ข Use eventsโ€™ Wikipedia page as input for focused crawler
Experiment
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
9
1. Content of Wikipedia page + random 60% of pageโ€™s references
โ€ข Generate topic vector (TF-IDF of 1grams + 2grams)
2. Content of remaining 40% of Wikipedia pageโ€™s outlinks
โ€ข Generate topic vector (TF-IDF of 1grams + 2grams)
โ€ข Compute cosine similarity value between vectors 1 and 2
โ€ข Run 10 times
โ€ข Take average similarity value as content threshold
Content Relevance
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
10
โ€ข Define temporal interval for which crawled pages are
considered relevant
โ€ข Event date extracted from Wikipedia event page
โ€ข Change point determined from graph of proportional
Wikipedia page edits per day
Temporal Relevance
1
Event Date Change Point Today
0 0
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
11
โ€ข Extract datetime from pages via:
โ€ข URI
http://www.cnn.com/2017/12/09/us/wildfire-fighting-tactics/
โ€ข Meta tags
<meta property="article:published" itemprop="datePublished"
content="2017-12-09T10:14:50-05:00" />
โ€ข ODUโ€™s Carbondate tool
http://carbondate.cs.odu.edu/
โ€ข Memento datetime
โ€ข X-Header
Datetime Extraction
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
12
โ€ข Use version of Wikipedia page that was live at change point
โ€ข Crawl stop conditions:
โ€ข No more relevant documents left
โ€ข 5 levels deep
โ€ข Utilized crawl priority queue
Crawls
Level 2
Level 1
Level 0
Child 1
Seed
Child 2 Child 3
Child 3.2Child 3.1Child 2.1Child 1.1 Child 3.2Child 1.2
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
13
โ€ข New York City, October 31st 2017
โ€ข San Bernadino, December 2nd 2015
โ€ข Tucson, January 8th 2011
โ€ข Binghampton, April 3rd 2009
Collections Crawled (in November 2017)
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
14
NYC, 10/31/2017 โ€“ URIs per Level
0 1 2 3 4 5
Crawl depth
NumberofURIs
0500100015002000
Web Archive Crawl
0102030405060708090100
All URIs
Relevant URIs
0 1 2 3 4 5
Crawl depth
0500100015002000
Live Web Crawl
0102030405060708090100
Percent
All URIs
Relevant URIs
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
15
TUC, 01/08/2011 โ€“ URIs per Level
0 1 2 3 4 5
Crawl depth
NumberofURIs
020000400006000080000
Web Archive Crawl
0102030405060708090100
All URIs
Relevant URIs
0 1 2 3 4 5
Crawl depth
020000400006000080000
Live Web Crawl
0102030405060708090100
Percent
All URIs
Relevant URIs
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
16
NYC, 10/31/2017 โ€“ Relevance overโ€ฆ
Crawled Documents Crawl Time
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
17
TUC, 01/08/2011 โ€“ Relevance overโ€ฆ
Crawled Documents Crawl Time
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
18
TUC, 01/08/2011 โ€“ Comparison to Archive-IT
0 5000 10000 15000
050001000015000
Documents
AccumulatedRelevance
Web Archive Crawl
Archiveโˆ’It Crawl
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
19
TUC, 01/08/2011 โ€“ Web Archive Contributions
web.archive.org 75%
wayback.archiveโˆ’it.org
14%
webarchive.loc.gov 7%
web.archive.bibalex.org 2%
archive.is 2%
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
20
โ€ข Web archives are great resources to build event collections of
web resources
โ€ข Crawling web archives is much slower than the live web
โ€ข Collections about very recent events benefit more from the
live web than the archived web
but
โ€ข Collections about events from the distant past benefit more
from the archived web than the live web
โ€ข Utilizing multiple web archives is beneficial for the collection
โ€ข Focused crawls have the potential to outperform manual
collection building
Takeaways
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
21
https://web.archive.org/web/20171206181955/https:/twitter.com/TVNewsArchive/status/938466726190096384
Focused Crawl of Web Archives to Build Event Collections
@mart1nkle1n
WebSci 2018, 05/30/2018, Amsterdam, NL
Focused Crawl of Web Archives
to Build Event Collections
Martin Klein
Lyudmila Balakireva
Herbert Van de Sompel
Research Library
Los Alamos National Laboratory

More Related Content

What's hot

Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionEmily Nimsakont
ย 
2015-03-18 research seminar part 1
2015-03-18 research seminar part 12015-03-18 research seminar part 1
2015-03-18 research seminar part 1ifi8106tlu
ย 
Esshc presentation ashkan
Esshc presentation ashkanEsshc presentation ashkan
Esshc presentation ashkanBram van den Hout
ย 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...Marius Politze
ย 
Zeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhZeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhMarcia Zeng
ย 
SSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countrySSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countryRick Mourits
ย 
Life after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureLife after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureEmily Nimsakont
ย 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLAquiles Alencar Brayner
ย 
Walk Before You Run: Prerequisites to Linked Data
Walk Before You Run: Prerequisites to Linked DataWalk Before You Run: Prerequisites to Linked Data
Walk Before You Run: Prerequisites to Linked DataKenning Arlitsch
ย 
OpenGLAM CH Hackathons
OpenGLAM CH HackathonsOpenGLAM CH Hackathons
OpenGLAM CH HackathonsBeat Estermann
ย 
Presenting Your Digital Research
Presenting Your Digital ResearchPresenting Your Digital Research
Presenting Your Digital ResearchShawn Day
ย 
Introducing linked data into BBC News online
Introducing linked data into BBC News onlineIntroducing linked data into BBC News online
Introducing linked data into BBC News onlineJeremy Tarling
ย 
A Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 PresentationA Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 Presentationblalbritton
ย 
Free UKSG webinar: Exploring how emerging open science services can enhance i...
Free UKSG webinar: Exploring how emerging open science services can enhance i...Free UKSG webinar: Exploring how emerging open science services can enhance i...
Free UKSG webinar: Exploring how emerging open science services can enhance i...UKSG: connecting the knowledge community
ย 

What's hot (14)

Linked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An IntroductionLinked Data for Law Libraries: An Introduction
Linked Data for Law Libraries: An Introduction
ย 
2015-03-18 research seminar part 1
2015-03-18 research seminar part 12015-03-18 research seminar part 1
2015-03-18 research seminar part 1
ย 
Esshc presentation ashkan
Esshc presentation ashkanEsshc presentation ashkan
Esshc presentation ashkan
ย 
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
EUNIS 2018 - Migration of a web service back-end from a relational to a docum...
ย 
Zeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadhZeng marcia ifla-subjectaccesssmartdatadh
Zeng marcia ifla-subjectaccesssmartdatadh
ย 
SSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a countrySSHA 2019: Reconstructring a country
SSHA 2019: Reconstructring a country
ย 
Life after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the FutureLife after MARC: Cataloging Tools of the Future
Life after MARC: Cataloging Tools of the Future
ย 
Doctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BLDoctoral open day_digital_research_session_Social_Sciences_BL
Doctoral open day_digital_research_session_Social_Sciences_BL
ย 
Walk Before You Run: Prerequisites to Linked Data
Walk Before You Run: Prerequisites to Linked DataWalk Before You Run: Prerequisites to Linked Data
Walk Before You Run: Prerequisites to Linked Data
ย 
OpenGLAM CH Hackathons
OpenGLAM CH HackathonsOpenGLAM CH Hackathons
OpenGLAM CH Hackathons
ย 
Presenting Your Digital Research
Presenting Your Digital ResearchPresenting Your Digital Research
Presenting Your Digital Research
ย 
Introducing linked data into BBC News online
Introducing linked data into BBC News onlineIntroducing linked data into BBC News online
Introducing linked data into BBC News online
ย 
A Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 PresentationA Comparative Kalendar - DH2013 Presentation
A Comparative Kalendar - DH2013 Presentation
ย 
Free UKSG webinar: Exploring how emerging open science services can enhance i...
Free UKSG webinar: Exploring how emerging open science services can enhance i...Free UKSG webinar: Exploring how emerging open science services can enhance i...
Free UKSG webinar: Exploring how emerging open science services can enhance i...
ย 

Similar to Focused Crawl of Web Archives to Build Event Collections

Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
ย 
Information sharing about Columbia University Libraryโ€™s recent web archiving ...
Information sharing about Columbia University Libraryโ€™s recent web archiving ...Information sharing about Columbia University Libraryโ€™s recent web archiving ...
Information sharing about Columbia University Libraryโ€™s recent web archiving ...Anna Perricci
ย 
2015 04-21-eexcess emtacl
2015 04-21-eexcess emtacl2015 04-21-eexcess emtacl
2015 04-21-eexcess emtaclTamara Pianos
ย 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
ย 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsAnna Perricci
ย 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
ย 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
ย 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Sally Chambers
ย 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
ย 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
ย 
Resources in uct libraries acc_hons_shellyh_2017
Resources in uct libraries acc_hons_shellyh_2017Resources in uct libraries acc_hons_shellyh_2017
Resources in uct libraries acc_hons_shellyh_2017Susanne Noll
ย 
WS-DLโ€™s Work towards Enabling Personal Use of Web Archives
WS-DLโ€™s Work towards Enabling Personal Use of Web ArchivesWS-DLโ€™s Work towards Enabling Personal Use of Web Archives
WS-DLโ€™s Work towards Enabling Personal Use of Web ArchivesMichele Weigle
ย 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)TimelessFuture
ย 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC
ย 
Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914Beat Estermann
ย 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...The Frick Collection
ย 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseRDTF-Discovery
ย 
Digital collections: Increasing awareness and use
Digital collections:  Increasing awareness and useDigital collections:  Increasing awareness and use
Digital collections: Increasing awareness and useButtes
ย 
Towards a Repository for Dutch Development Organizations
Towards a Repository for Dutch Development OrganizationsTowards a Repository for Dutch Development Organizations
Towards a Repository for Dutch Development OrganizationsIAALD Community
ย 
Toward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandToward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandSistemaBibliotecarioSapienza
ย 

Similar to Focused Crawl of Web Archives to Build Event Collections (20)

Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
ย 
Information sharing about Columbia University Libraryโ€™s recent web archiving ...
Information sharing about Columbia University Libraryโ€™s recent web archiving ...Information sharing about Columbia University Libraryโ€™s recent web archiving ...
Information sharing about Columbia University Libraryโ€™s recent web archiving ...
ย 
2015 04-21-eexcess emtacl
2015 04-21-eexcess emtacl2015 04-21-eexcess emtacl
2015 04-21-eexcess emtacl
ย 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
ย 
Collaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive AwardsCollaboration and Cash: Web Archiving Incentive Awards
Collaboration and Cash: Web Archiving Incentive Awards
ย 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ย 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
ย 
Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive Investigating the PROMISE of a Belgian web archive
Investigating the PROMISE of a Belgian web archive
ย 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
ย 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
ย 
Resources in uct libraries acc_hons_shellyh_2017
Resources in uct libraries acc_hons_shellyh_2017Resources in uct libraries acc_hons_shellyh_2017
Resources in uct libraries acc_hons_shellyh_2017
ย 
WS-DLโ€™s Work towards Enabling Personal Use of Web Archives
WS-DLโ€™s Work towards Enabling Personal Use of Web ArchivesWS-DLโ€™s Work towards Enabling Personal Use of Web Archives
WS-DLโ€™s Work towards Enabling Personal Use of Web Archives
ย 
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
WebART: Facilitating Scholarly Use of Web Archives (IIPC, Apr. 2013)
ย 
OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.OCLC Research Update at ALA Chicago. June 26, 2017.
OCLC Research Update at ALA Chicago. June 26, 2017.
ย 
Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914Estermann Wikidata GLAM Example Projects 20170914
Estermann Wikidata GLAM Example Projects 20170914
ย 
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
Making the Black Hole Gray: Implementing the Web Archiving of Specialist Art ...
ย 
Uk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcaseUk discovery-jisc-project-showcase
Uk discovery-jisc-project-showcase
ย 
Digital collections: Increasing awareness and use
Digital collections:  Increasing awareness and useDigital collections:  Increasing awareness and use
Digital collections: Increasing awareness and use
ย 
Towards a Repository for Dutch Development Organizations
Towards a Repository for Dutch Development OrganizationsTowards a Repository for Dutch Development Organizations
Towards a Repository for Dutch Development Organizations
ย 
Toward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandToward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, poland
ย 

More from Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
ย 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
ย 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
ย 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
ย 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
ย 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
ย 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
ย 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
ย 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
ย 
A Vision of the Libraryโ€™s Role in Archiving Scholarly Artifacts
A Vision of the Libraryโ€™s Role  in Archiving Scholarly ArtifactsA Vision of the Libraryโ€™s Role  in Archiving Scholarly Artifacts
A Vision of the Libraryโ€™s Role in Archiving Scholarly ArtifactsMartin Klein
ย 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
ย 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
ย 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
ย 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
ย 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
ย 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
ย 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
ย 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
ย 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
ย 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
ย 

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
ย 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
ย 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
ย 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
ย 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
ย 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
ย 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
ย 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
ย 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
ย 
A Vision of the Libraryโ€™s Role in Archiving Scholarly Artifacts
A Vision of the Libraryโ€™s Role  in Archiving Scholarly ArtifactsA Vision of the Libraryโ€™s Role  in Archiving Scholarly Artifacts
A Vision of the Libraryโ€™s Role in Archiving Scholarly Artifacts
ย 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
ย 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
ย 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
ย 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
ย 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
ย 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
ย 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
ย 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
ย 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
ย 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
ย 

Recently uploaded

โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...nirzagarg
ย 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdfMatthew Sinclair
ย 
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...nilamkumrai
ย 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...tanu pandey
ย 
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceBusty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceDelhi Call girls
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...singhpriety023
ย 
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRL
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRLLucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRL
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRLimonikaupta
ย 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrHenryBriggs2
ย 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...SUHANI PANDEY
ย 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC
ย 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC
ย 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftAanSulistiyo
ย 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...SUHANI PANDEY
ย 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirtrahman018755
ย 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtrahman018755
ย 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...SUHANI PANDEY
ย 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"growthgrids
ย 

Recently uploaded (20)

โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men  ๐Ÿ”mehsana๐Ÿ”   Escorts...
โžฅ๐Ÿ” 7737669865 ๐Ÿ”โ–ป mehsana Call-girls in Women Seeking Men ๐Ÿ”mehsana๐Ÿ” Escorts...
ย 
20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf20240508 QFM014 Elixir Reading List April 2024.pdf
20240508 QFM014 Elixir Reading List April 2024.pdf
ย 
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls ๐ŸŽ—๏ธ 9352988975 Sizzling | Escorts | Girls Are Re...
ย 
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
Thalassery Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call G...
ย 
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...Katraj ( Call Girls ) Pune  6297143586  Hot Model With Sexy Bhabi Ready For S...
Katraj ( Call Girls ) Pune 6297143586 Hot Model With Sexy Bhabi Ready For S...
ย 
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort ServiceBusty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
Busty DesiโšกCall Girls in Vasundhara Ghaziabad >เผ’8448380779 Escort Service
ย 
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting  High Prof...
VIP Model Call Girls Hadapsar ( Pune ) Call ON 9905417584 Starting High Prof...
ย 
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRL
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRLLucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRL
Lucknow โคCALL GIRL 88759*99948 โคCALL GIRLS IN Lucknow ESCORT SERVICEโคCALL GIRL
ย 
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrStory Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
Story Board.pptxrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr
ย 
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
Ganeshkhind ! Call Girls Pune - 450+ Call Girl Cash Payment 8005736733 Neha T...
ย 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
ย 
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
APNIC Policy Roundup, presented by Sunny Chendi at the 5th ICANN APAC-TWNIC E...
ย 
Microsoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck MicrosoftMicrosoft Azure Arc Customer Deck Microsoft
Microsoft Azure Arc Customer Deck Microsoft
ย 
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
Shikrapur - Call Girls in Pune Neha 8005736733 | 100% Gennuine High Class Ind...
ย 
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
valsad Escorts Service โ˜Ž๏ธ 6378878445 ( Sakshi Sinha ) High Profile Call Girls...
ย 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
ย 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
ย 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
ย 
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency""Boost Your Digital Presence: Partner with a Leading SEO Agency"
"Boost Your Digital Presence: Partner with a Leading SEO Agency"
ย 
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐ŸฅตLow Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
Low Sexy Call Girls In Mohali 9053900678 ๐ŸฅตHave Save And Good Place ๐Ÿฅต
ย 

Focused Crawl of Web Archives to Build Event Collections

  • 1. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL Focused Crawl of Web Archives to Build Event Collections Martin Klein Lyudmila Balakireva Herbert Van de Sompel Research Library Los Alamos National Laboratory
  • 2. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 2 โ€ข Often orchestrated by subject matter experts, archivists, special collection librarians, technicians โ€ข Potentially with guidance from institutional collection policy โ€ข Results in a list of seeds (URIs, social media accounts, etc) โ€ข Utilization of crawling services such as Archive-It, Social Feed Manager Background โ€“ Event Collection Building
  • 3. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 3 โ€ข Temporal: time passed since event is of concern ๏ƒ  Use of web archives โ€ข Selection: seeds often picked manually ๏ƒ  Use of references from Wikipedia pages โ€ข Relevance: seed assessment often done by humans ๏ƒ  Use of focused crawling with content and temporal relevance assessment Inspiration from: โ€œExtracting Event-Centric Document Collections from Large-Scale Web Archivesโ€ Gerhard Gossen, Elena Demidova, Thomas Risse https://doi.org/10.1007/978-3-319-67008-9_10 Problems and our Approach
  • 4. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 4 โ€ข Web archives are an invaluable resource for researchers, historians, journalists, etc. โ€ข Often broad in scope, large in scale, covering different temporal intervals โ€ข Makes discovery, access, and analysis difficult Background โ€“ Archived Web
  • 5. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 5
  • 6. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 6
  • 7. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 7 โ€ข Can we create event collections by focused crawling online- available web archives? โ€ข How do event collections created from the archived web compare to those created from the live web? โ€ข How does the amount of time passed since the event affect the collections built from the live and the archived web? โ€ข How do event collections built from the archived web compare to manually curated collections? Questions
  • 8. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 8 โ€ข Topics limited to terror attacks and mass shootings in the U.S. โ€ข From different times in the past โ€ข Focused crawl of: a) 22 archives, simultaneously, via Memento infrastructure b) the live web โ€ข Take content and temporal relevance into account, equally weighted โ€ข Use eventsโ€™ Wikipedia page as input for focused crawler Experiment
  • 9. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 9 1. Content of Wikipedia page + random 60% of pageโ€™s references โ€ข Generate topic vector (TF-IDF of 1grams + 2grams) 2. Content of remaining 40% of Wikipedia pageโ€™s outlinks โ€ข Generate topic vector (TF-IDF of 1grams + 2grams) โ€ข Compute cosine similarity value between vectors 1 and 2 โ€ข Run 10 times โ€ข Take average similarity value as content threshold Content Relevance
  • 10. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 10 โ€ข Define temporal interval for which crawled pages are considered relevant โ€ข Event date extracted from Wikipedia event page โ€ข Change point determined from graph of proportional Wikipedia page edits per day Temporal Relevance 1 Event Date Change Point Today 0 0
  • 11. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 11 โ€ข Extract datetime from pages via: โ€ข URI http://www.cnn.com/2017/12/09/us/wildfire-fighting-tactics/ โ€ข Meta tags <meta property="article:published" itemprop="datePublished" content="2017-12-09T10:14:50-05:00" /> โ€ข ODUโ€™s Carbondate tool http://carbondate.cs.odu.edu/ โ€ข Memento datetime โ€ข X-Header Datetime Extraction
  • 12. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 12 โ€ข Use version of Wikipedia page that was live at change point โ€ข Crawl stop conditions: โ€ข No more relevant documents left โ€ข 5 levels deep โ€ข Utilized crawl priority queue Crawls Level 2 Level 1 Level 0 Child 1 Seed Child 2 Child 3 Child 3.2Child 3.1Child 2.1Child 1.1 Child 3.2Child 1.2
  • 13. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 13 โ€ข New York City, October 31st 2017 โ€ข San Bernadino, December 2nd 2015 โ€ข Tucson, January 8th 2011 โ€ข Binghampton, April 3rd 2009 Collections Crawled (in November 2017)
  • 14. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 14 NYC, 10/31/2017 โ€“ URIs per Level 0 1 2 3 4 5 Crawl depth NumberofURIs 0500100015002000 Web Archive Crawl 0102030405060708090100 All URIs Relevant URIs 0 1 2 3 4 5 Crawl depth 0500100015002000 Live Web Crawl 0102030405060708090100 Percent All URIs Relevant URIs
  • 15. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 15 TUC, 01/08/2011 โ€“ URIs per Level 0 1 2 3 4 5 Crawl depth NumberofURIs 020000400006000080000 Web Archive Crawl 0102030405060708090100 All URIs Relevant URIs 0 1 2 3 4 5 Crawl depth 020000400006000080000 Live Web Crawl 0102030405060708090100 Percent All URIs Relevant URIs
  • 16. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 16 NYC, 10/31/2017 โ€“ Relevance overโ€ฆ Crawled Documents Crawl Time
  • 17. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 17 TUC, 01/08/2011 โ€“ Relevance overโ€ฆ Crawled Documents Crawl Time
  • 18. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 18 TUC, 01/08/2011 โ€“ Comparison to Archive-IT 0 5000 10000 15000 050001000015000 Documents AccumulatedRelevance Web Archive Crawl Archiveโˆ’It Crawl
  • 19. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 19 TUC, 01/08/2011 โ€“ Web Archive Contributions web.archive.org 75% wayback.archiveโˆ’it.org 14% webarchive.loc.gov 7% web.archive.bibalex.org 2% archive.is 2%
  • 20. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 20 โ€ข Web archives are great resources to build event collections of web resources โ€ข Crawling web archives is much slower than the live web โ€ข Collections about very recent events benefit more from the live web than the archived web but โ€ข Collections about events from the distant past benefit more from the archived web than the live web โ€ข Utilizing multiple web archives is beneficial for the collection โ€ข Focused crawls have the potential to outperform manual collection building Takeaways
  • 21. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL 21 https://web.archive.org/web/20171206181955/https:/twitter.com/TVNewsArchive/status/938466726190096384
  • 22. Focused Crawl of Web Archives to Build Event Collections @mart1nkle1n WebSci 2018, 05/30/2018, Amsterdam, NL Focused Crawl of Web Archives to Build Event Collections Martin Klein Lyudmila Balakireva Herbert Van de Sompel Research Library Los Alamos National Laboratory