SlideShare una empresa de Scribd logo
1 de 29
Archival HTTP Redirection
Retrieval Policies
Temporal Web Analytics Workshop 2013, Rio De Janiro
Ahmed AlSum,
Michael L. Nelson
Old Dominion University
Norfolk VA, USA
{aalsum,mln}@cs.odu.edu
Robert Sanderson,
Herbert Van de Sompel
Los Alamos National Laboratory
Los Alamos NM, USA
{rsanderson,herbertv}@lanl.gov
1
Agenda
• Introduction
• Abstract Model
• Experiment And Results
• Retrieval Policies
2
Memento Terminology
URI-R, R
URI-M, M
URI-T, TM
http://www.amazon.com
http://web.archive.org/web/20110411070244/http://amazon.com
Original Resource
Memento
TimeMap
3
Live Redirect
http://bit.ly/r9kIfC redirects to http://www.cs.odu.edu
% curl -I http://bit.ly/r9kIfC
HTTP/1.1 301 Moved
….
Location: http://www.cs.odu.edu/
…
4
Live Redirect
http://bit.ly/r9kIfC redirects to http://www.cs.odu.edu
5
Archived Redirect
www.draculathemusical.co.uk
redirects
www.dracula-uk.com/index.html
http://api.wayback.archive.org/memento/20020212194020/http://www.draculathemusical.co.uk/
Archived redirects
http://api.wayback.archive.org/memento/20020212194020/http://www.geocities.com/draculathemusicalhttp://api.wayback.archive.org/memento/20020212194020/http://www.geocities.com/draculathemusical
6
Abstract Model
7
Abstract Model
M1 M2 M3
8
URI Stability
• URI’s stability is a count of the change in HTTP
responses across time (200, 3xx, or 4xx) and the
number of different URIs in the “Location” for 3xx
status code.
High Stability = 1 No Stability = 0
9
Timemap Redirection Categories
• Category 1
All Mementos have 200 HTTP status code
10
Timemap Redirection Categories
• Category 2
All Mementos have redirection to the same URI.
11
Timemap Redirection Categories
• Category 3
All Mementos have redirection to different URIs.
12
Timemap Redirection Categories
• Category 4
Mementos have different HTTP status code.
13
Timemap Redirection Categories
All Mementos have 200 HTTP status code All Mementos have redirection to the same URI.
All Mementos have redirection to different URIs. Mementos have different HTTP status code.
14
URI Reliability
M1
3xx
M2
3xx
M3
3xx
rel=original
R`M
rel=original
R`M
rel=original
R`M
? ? ?200 404 3xx
15
HTTP Redirection Relationship
between URI-R & URI-M
Live Web URI − R
OK Redirection
Web Archive
URI-M
OK Case 1 5
Redirection 2 3,4
Case 1
Case 2 Case 3 Case 4 Case 5
16
Experiment & Results
17
Experiment
• Dataset: 10,000 sample URIs from
• Dataset doesn’t have bit.ly nor doi.
• Experiment foucsed on the root page (no embedded
resources)
HTTP Status/Code (10,000 URI-R)
OK (200) 82.83%
Redirection (3xx) 14.71%
Redirection (301) 8.4%
Redirection (302) 6.1%
Redirection (others) 0.2%
Not-Found (4xx) 1.18%
Others 1.28%
HTTP Status/Code (894,717 URI-M)
OK (200) 93.46%
Redirection (3xx) 5.69%
Not-Found (4xx) 0.26%
Others 0.59%
URIs Live HTTP status code Memento HTTP status code
18
Time span Number of Mementos
19
URI Stability
TimeMap Category Percentage Stability
All Mementos have OK 52% 1
Mementos have mix status code 36% 0.91
All Mementos have Redirection 0.92% 0.85
Redirection to the same URI 0.62%
Redirection to different URIs 0.30%
URI has no Mementos at all 10.97% 0
Stability in semi-log scale Stability for |TM(R)| < 300 20
URI Reliability
• 23% of the mementos did not lead to a successful
memento at the end.
Reliabilityin semi-log scale Reliabilityfor |TM(R)| < 300
21
HTTP Redirection Relationship
between URI-R & URI-M
Live Web URI − R
OK Redirection
Web Archive
URI-M
OK Case 1 5
Redirection 2 3,4
Case 1
Case 2 Case 3 Case 4 Case 5
80.8%
2.74% 1.34%
1.33%
13.7%
22
RETRIEVAL POLICIES
ARCHIVED HTTP REDIRECTION RETRIEVAL POLICIES
23
Current Wayback
Machine Policy
24
Policy one:
URI-R with HTTP redirection
Retrieve the memento M for R.
Status(M) =200
Status(M) =3xx
Stop
Go to Policy 2
Stop
Yes
Yes
Yes No
No
No
25
Policy one:
URI-R with HTTP redirection
• Evaluation:
o Policy scope has: 1471 URIs (that have live redirection)
o 77 out of 1471 have no mementos at all
o 17 out of 77 have been retrieved mementos based on live redirection
• Implementation
26
Tool Comment
MementoFox v 9.6+
mcurl v 1.0
IA Wayback Machine For bit.ly URIs only
Policy two:
URI-M with HTTP redirection
http://www.cnn.com/
Accept-Datetime: Sun, 13 May 2006
http://www.cnn.com/
27
Policy two:
URI-M with HTTP redirection
• Evaluation:
o Policy scope: 2980 TimeMap (that showed HTTP redirection status code in at least one memento)
o Success criteria: Using policy two contributed to the original TimeMap
o Success percentage: 58% of the cases
28
Conclusion
• Quantitative study with 10,000 URIs.
• 48% were not fully stable through time.
• 27% were not perfectly reliable through time.
• New archival retrieval policy:
o Policy one: successfully retreived mementos for17 out of 77
o Policy two: Expanded the timemap for 58% of cases.
• aalsum@cs.odu.edu
• @aalsum
29

Más contenido relacionado

Similar a Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013

"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
Ahmed AlSum
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web Applications
Lionel Briand
 

Similar a Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013 (20)

Web Performance BootCamp 2013
Web Performance BootCamp 2013Web Performance BootCamp 2013
Web Performance BootCamp 2013
 
Travel_Time_Reliability
Travel_Time_ReliabilityTravel_Time_Reliability
Travel_Time_Reliability
 
Web Performance Bootcamp 2014
Web Performance Bootcamp 2014Web Performance Bootcamp 2014
Web Performance Bootcamp 2014
 
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
Provenance Analysis and RDF Query Processing: W3C PROV for Data Quality and T...
 
The RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your ServicesThe RED Method: How To Instrument Your Services
The RED Method: How To Instrument Your Services
 
REST
RESTREST
REST
 
SOAP vs REST
SOAP vs RESTSOAP vs REST
SOAP vs REST
 
"Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ..."Web Archive services framework for tighter integration between the past and ...
"Web Archive services framework for tighter integration between the past and ...
 
A Modern Approach to Performance Monitoring
A Modern Approach to Performance MonitoringA Modern Approach to Performance Monitoring
A Modern Approach to Performance Monitoring
 
Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?Traceability Beyond Source Code: An Elusive Target?
Traceability Beyond Source Code: An Elusive Target?
 
The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.The RED Method: How to monitoring your microservices.
The RED Method: How to monitoring your microservices.
 
Take a REST!
Take a REST!Take a REST!
Take a REST!
 
RESTful Microservices
RESTful MicroservicesRESTful Microservices
RESTful Microservices
 
Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...Predicting query performance and explaining results to assist Linked Data con...
Predicting query performance and explaining results to assist Linked Data con...
 
LOP – Capturing and Linking Open Provenance on LOD Cycle
LOP – Capturing and Linking Open Provenance on LOD CycleLOP – Capturing and Linking Open Provenance on LOD Cycle
LOP – Capturing and Linking Open Provenance on LOD Cycle
 
The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...The importance of standards for data exchange and interchange on the Royal So...
The importance of standards for data exchange and interchange on the Royal So...
 
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
CLEAR: a Credible Live Evaluation Method of Website Archivability, iPRES2013
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
 
Detecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARCDetecting Off-Topic Web Pages at #CUWARC
Detecting Off-Topic Web Pages at #CUWARC
 
Automated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web ApplicationsAutomated Inference of Access Control Policies for Web Applications
Automated Inference of Access Control Policies for Web Applications
 

Más de Ahmed AlSum (6)

Restoring US First Website
Restoring US First WebsiteRestoring US First Website
Restoring US First Website
 
Web archiving challenges and opportunities
Web archiving challenges and opportunitiesWeb archiving challenges and opportunities
Web archiving challenges and opportunities
 
Thumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web ArchivesThumbnail Summarization Techniques For Web Archives
Thumbnail Summarization Techniques For Web Archives
 
Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013Web Archiving Profile - WADL 2013
Web Archiving Profile - WADL 2013
 
ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013ArcLink - IIPC GA 2013
ArcLink - IIPC GA 2013
 
How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011How Much of the Web is Archived? JCDL 2011
How Much of the Web is Archived? JCDL 2011
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

Archival HTTP Redirection Retrieval Policies - TemporalWeb 2013