SlideShare una empresa de Scribd logo
1 de 33
Descargar para leer sin conexión
Not All Mementos are Created Equal: Measuring the Impact of Missing Resources 
Justin F. Brunelle, Mat Kelly, HanySalahEldeen, Michele C. Weigle, Michael L. Nelson 
Old Dominion University 
{jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu 
1
Goal: Automatically measure the quality of the archives 
2 
20% missing
Goal: Automatically measure the quality of the archives 
3 
14% missing
Goal: Automatically measure the quality of the archives 
4 
28% missing
Goal: Automatically measure the quality of the archives 
5 
7% missing
“Live” XKCD 
•Missing 17% of embedded resources 
•Looks complete 
6
“Live” XKCD 
•Take three resources: 
•Logo 
•Main Comic 
•Navigation Strip 
•Relative importance? 
•All present in “Live” XKCD 
7
Damaging XKCD 
•Created a local memento 
•Removed the logo and navigation strip 
•Now missing 29% of embedded resources 
•Human assessment: looks OK 
8
Damaging XKCD 
•From our local memento 
•Removed the Main Comic 
•Now missing 24% of embedded resources 
•Human assessment: Not a usable memento 
9
Damaging XKCD 
•From our local memento 
•Removed the Main Comic 
•Now missing 24% of embedded resources 
•Human assessment: Not a usable memento 
•Percent of missing embedded resources is not a suitable metric for memento quality 
10
Image Importance 
•Size (as percentage of all pixels) 
11
Image Importance 
•Size 
•Position (in viewport?) 
12
Image Importance 
•Size 
•Position 
•Centrality (in the vertical or horizontal center?) 
13
Missing CSS 
•Damage not limited to images 
•When missing CSS, content shifts left 
14
Missing CSS 
•Partitioned snapshot into thirds 
•Background color determined 
•Pixel-by-pixel comparison 
15
Missing CSS 
•Calculated the amount of content in each vertical third 
•If >=80% in left column and missing CSS, CSS is important 
•Only performed if stylesheetsare missing 
16
Percent Missing vs. Weighted Damage 
•푀푀= Percent of embedded resources missing 
푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 
•퐷푀= Damage rating of missing embedded resources 
퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 
퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 17 
퐼=퐼푚푎푔푒 
푀푀=푀푢푙푡푖푀푒푑푖푎 
퐶=퐶푆푆
Calculated Damage 
•푀푀= Percent of embedded resources missing 
푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 
•퐷푀= Damage rating of missing embedded resources 
퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 
퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 18 
푀푀=0.29 
퐷푀=0.36 
푀푀=0.24 
퐷푀=0.41
What do Web users think? 
19
Setting up the Turk Test 
•Amazon’s mechanical turkersrepresent real web users 
•Two legs of the experiment: 
•Manually damaged memento vs. Live resource 
•10 manually damaged mementos and resources 
•Real Memento vs. Real Memento 
•100 URI-Rs, one memento per year 
20
21
22
23
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
5 
0 
5-0 
24
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
4 
1 
4-1 
25
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
0 
5 
0-5 
26
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
0 
5 
0-5 
27 
No agreement!
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
3 
2 
3-2 
28
Quantifying TurkerResponse 
•5 turkersfor each comparison 
•Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) 
•Measure turkeragreement: Defined only by 4-1 and 5-0 splits 
Image A 
Image B 
Split 
Turker1 
Y 
Turker2 
Y 
Turker3 
Y 
Turker4 
Y 
Turker5 
Y 
Result 
3 
2 
3-2 
29 
Split decision  No agreement!
Turk Results 
•Compared damage(퐷푀) and percent missing (푀푀) 
•M0: Manually damaged mementos 
•D: Internet Archive Mementos 
•M: Percent missing in Internet Archive Mementos 
•퐷푀vs. Live: 78.9% true positives 
•푀푀vs. Live: 47.2% true positives 
•Worse than a 50/50chance! 
•퐷푀vs 퐷푀: 58.4% true positives 
30
Damage in the Internet Archive 
•1,000 URI-Rs from Bitly 
•1,000 URI-Rs from Archive-it 
•Remove non-HTML representations 
•1,861 URI-Rs remaining 
•Sample 1 memento per year from Internet Archive 
•Measure damage 
31
•Measured Internet Archive mementos 
•Damage generally improves over time 
•Despite missing more resources over time 
Damage in the Internet Archive 
32
Conclusions 
•퐷푀is a better measure of memento quality than 푀푀 
•On average, the Internet Archive is improving its quality over time 
•Internet Archive is also missing more embedded resources over time 
•Improved damage weighting (58.4% correct can be improved) 
•Measure cumulative temporal damage ratings 
•E.g., a logo that never changes for 10 years and is used by 100 mementos is more important than the one used in a single memento. 
33

Más contenido relacionado

Más de Justin Brunelle

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...Justin Brunelle
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacationsJustin Brunelle
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsJustin Brunelle
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentJustin Brunelle
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer ScientistJustin Brunelle
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACMJustin Brunelle
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODUJustin Brunelle
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODUJustin Brunelle
 

Más de Justin Brunelle (9)

iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
iPRES2015: Archiving Deferred Representations Using a Two-Tiered Crawling App...
 
How I spend my summer vacations
How I spend my summer vacationsHow I spend my summer vacations
How I spend my summer vacations
 
An Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMapsAn Evaluation of Caching Policies for Memento TimeMaps
An Evaluation of Caching Policies for Memento TimeMaps
 
Filling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated ContentFilling in the Blanks: Capturing Dynamically Generated Content
Filling in the Blanks: Capturing Dynamically Generated Content
 
Day in the Life of a Computer Scientist
Day in the Life of a Computer ScientistDay in the Life of a Computer Scientist
Day in the Life of a Computer Scientist
 
Agile Engineering - ODU ACM
Agile Engineering - ODU ACMAgile Engineering - ODU ACM
Agile Engineering - ODU ACM
 
Records expo
Records expoRecords expo
Records expo
 
Digital Preservation - ODU
Digital Preservation - ODUDigital Preservation - ODU
Digital Preservation - ODU
 
Digital Preservation at ODU
Digital Preservation at ODUDigital Preservation at ODU
Digital Preservation at ODU
 

Último

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxBerniceCayabyab1
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptJoemSTuliba
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024innovationoecd
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayupadhyaymani499
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...Universidade Federal de Sergipe - UFS
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentationtahreemzahra82
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuinethapagita
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxSimeonChristian
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 

Último (20)

GenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptxGenBio2 - Lesson 1 - Introduction to Genetics.pptx
GenBio2 - Lesson 1 - Introduction to Genetics.pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Four Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.pptFour Spheres of the Earth Presentation.ppt
Four Spheres of the Earth Presentation.ppt
 
OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024OECD bibliometric indicators: Selected highlights, April 2024
OECD bibliometric indicators: Selected highlights, April 2024
 
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort ServiceHot Sexy call girls in  Moti Nagar,🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Moti Nagar,🔝 9953056974 🔝 escort Service
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Citronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyayCitronella presentation SlideShare mani upadhyay
Citronella presentation SlideShare mani upadhyay
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
REVISTA DE BIOLOGIA E CIÊNCIAS DA TERRA ISSN 1519-5228 - Artigo_Bioterra_V24_...
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Harmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms PresentationHarmful and Useful Microorganisms Presentation
Harmful and Useful Microorganisms Presentation
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 GenuineCall Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
Call Girls in Majnu Ka Tilla Delhi 🔝9711014705🔝 Genuine
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptxGood agricultural practices 3rd year bpharm. herbal drug technology .pptx
Good agricultural practices 3rd year bpharm. herbal drug technology .pptx
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 

Not All Mementos Are Created Equal: Measuring The Impact Of Missing Mementos

  • 1. Not All Mementos are Created Equal: Measuring the Impact of Missing Resources Justin F. Brunelle, Mat Kelly, HanySalahEldeen, Michele C. Weigle, Michael L. Nelson Old Dominion University {jbrunelle, mkelly, hany, mweigle, mln}@cs.odu.edu 1
  • 2. Goal: Automatically measure the quality of the archives 2 20% missing
  • 3. Goal: Automatically measure the quality of the archives 3 14% missing
  • 4. Goal: Automatically measure the quality of the archives 4 28% missing
  • 5. Goal: Automatically measure the quality of the archives 5 7% missing
  • 6. “Live” XKCD •Missing 17% of embedded resources •Looks complete 6
  • 7. “Live” XKCD •Take three resources: •Logo •Main Comic •Navigation Strip •Relative importance? •All present in “Live” XKCD 7
  • 8. Damaging XKCD •Created a local memento •Removed the logo and navigation strip •Now missing 29% of embedded resources •Human assessment: looks OK 8
  • 9. Damaging XKCD •From our local memento •Removed the Main Comic •Now missing 24% of embedded resources •Human assessment: Not a usable memento 9
  • 10. Damaging XKCD •From our local memento •Removed the Main Comic •Now missing 24% of embedded resources •Human assessment: Not a usable memento •Percent of missing embedded resources is not a suitable metric for memento quality 10
  • 11. Image Importance •Size (as percentage of all pixels) 11
  • 12. Image Importance •Size •Position (in viewport?) 12
  • 13. Image Importance •Size •Position •Centrality (in the vertical or horizontal center?) 13
  • 14. Missing CSS •Damage not limited to images •When missing CSS, content shifts left 14
  • 15. Missing CSS •Partitioned snapshot into thirds •Background color determined •Pixel-by-pixel comparison 15
  • 16. Missing CSS •Calculated the amount of content in each vertical third •If >=80% in left column and missing CSS, CSS is important •Only performed if stylesheetsare missing 16
  • 17. Percent Missing vs. Weighted Damage •푀푀= Percent of embedded resources missing 푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 •퐷푀= Damage rating of missing embedded resources 퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 17 퐼=퐼푚푎푔푒 푀푀=푀푢푙푡푖푀푒푑푖푎 퐶=퐶푆푆
  • 18. Calculated Damage •푀푀= Percent of embedded resources missing 푀푀= 퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠푀푖푠푠푖푛푔 푇표푡푎푙퐸푚푏푒푑푑푒푑푅푒푠표푢푟푐푒푠 •퐷푀= Damage rating of missing embedded resources 퐷푀= 퐷푀퐴푐푡푢푎푙 퐷푀푃표푡푒푛푡푖푎푙 퐷푀푃표푡푒푛푡푖푎푙= 푖=1 푛[퐼|푀푀] 퐷[퐼|푀푀](푖) 푛[퐼|푀푀] + 푖=1 푛[퐶] 퐷[퐶](푖) 푛퐶 18 푀푀=0.29 퐷푀=0.36 푀푀=0.24 퐷푀=0.41
  • 19. What do Web users think? 19
  • 20. Setting up the Turk Test •Amazon’s mechanical turkersrepresent real web users •Two legs of the experiment: •Manually damaged memento vs. Live resource •10 manually damaged mementos and resources •Real Memento vs. Real Memento •100 URI-Rs, one memento per year 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 5 0 5-0 24
  • 25. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 4 1 4-1 25
  • 26. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 0 5 0-5 26
  • 27. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 0 5 0-5 27 No agreement!
  • 28. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 3 2 3-2 28
  • 29. Quantifying TurkerResponse •5 turkersfor each comparison •Assume 퐷퐴< 퐷퐵(i.e., A is less damaged) •Measure turkeragreement: Defined only by 4-1 and 5-0 splits Image A Image B Split Turker1 Y Turker2 Y Turker3 Y Turker4 Y Turker5 Y Result 3 2 3-2 29 Split decision  No agreement!
  • 30. Turk Results •Compared damage(퐷푀) and percent missing (푀푀) •M0: Manually damaged mementos •D: Internet Archive Mementos •M: Percent missing in Internet Archive Mementos •퐷푀vs. Live: 78.9% true positives •푀푀vs. Live: 47.2% true positives •Worse than a 50/50chance! •퐷푀vs 퐷푀: 58.4% true positives 30
  • 31. Damage in the Internet Archive •1,000 URI-Rs from Bitly •1,000 URI-Rs from Archive-it •Remove non-HTML representations •1,861 URI-Rs remaining •Sample 1 memento per year from Internet Archive •Measure damage 31
  • 32. •Measured Internet Archive mementos •Damage generally improves over time •Despite missing more resources over time Damage in the Internet Archive 32
  • 33. Conclusions •퐷푀is a better measure of memento quality than 푀푀 •On average, the Internet Archive is improving its quality over time •Internet Archive is also missing more embedded resources over time •Improved damage weighting (58.4% correct can be improved) •Measure cumulative temporal damage ratings •E.g., a logo that never changes for 10 years and is used by 100 mementos is more important than the one used in a single memento. 33