SlideShare una empresa de Scribd logo
1 de 18
Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Joseph Greene
Research Repository Librarian
University College Dublin
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
How accurate are IR
usage statistics?
Open Repositories 2016
Dublin, 16 June
Usage statistics are important for OA
repositories
• How is the service used overall?
• Advocacy
– Connects with authors on what is most important
to them: the use of their research
• KPI for return on investment
– Usage of a Library service
– Visibility of university’s
research
Monthly email sent to all
depositors
Infographic distributed semi-annually
by College Liaison Librarians
How accurate are they? Web robots
• Some follow rules
– Search engines, Internet Archive, link checkers,
Twitterbot, etc.
– robots.txt, naming themselves in the user agent
string
• Others do not
– Email spammers, comment spammers, dictionary
attackers, phishers, etc.
– Often mimic human users
Experimental study
• Simple random sample of 2 years of UCD
repository’s download data
– n=341, N=3.3 million; 96.20% certainty
• Manually checked to determine if robot or human
• Compared findings against our robot detection
technique
– U. Minho DSpace Stats Add-on
– Monthly outlier exclusion (manual)
Greene, J. Web robot detection in scholarly Open Access institutional
repositories. Library Hi Tech, July 2016
First finding
85% of the Research
Repository UCD’s
unfiltered downloads
come from robots
• This is confirmed in a 2013 IRUS-UK white paper
on 20 IRs; 85% was also found to be robots
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Accuracyofdownloadstats(inverseprecition)
Recall (robots)
Catching more robots improves stats
(But how much depends on the number of robots)
Getbetterstats
Catch more robots
Typical website, 15% robot traffic
OA journal, 40% robot
Internet Archive, 91% robot
OA repositories, 85% robot
How did we do at UCD?
• What proportion of robot downloads did we
catch? (Recall)
– Our method catches 94% of all robots
• How often were we correct -- how many are
actually human? (Precision)
– 98.9% of downloads that we label robots really
are robots
• How accurate are the download stats -- how
many are actually made by human beings?
(Inverse precision)
– 73% of the download statistics as reported are
human
How does that compare?
• Who knows? There are no other studies like this
on repositories!
• Applied DSpace's and EPrints' web robot
detection algorithms to our data
– Experimental
– Real data
– Same dataset used for each ‘system’
– Algorithms easy to mimic in vitro
– But SEO, crawl behaviour may be different for
different systems
Robot detection techniques used
DSpace EPrints
Minho DSpace
Statistics Add-on
Rate of requests ✓3
User agent string ✓ ✓ ✓
robots.txt access ✓
Volume of requests ✓2
✓3
List of known robot IP addresses ✓ ✓
Reverse DNS name lookup ✓1
Trap file ✓
User agents per IP address
Width of traversal in the URL space ✓3
1
Only implemented nominally or experimentally
2
Via the repeat download or ‘double-click’ filter
3
Data available as a configurable report for manual decision making
Results
0.897 0.911 0.890
0.942
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no manual
outlier checking)
Minho plus monthly
manual checking
(UCD)
Robots detected (Recall)
1.000
0.940
0.989 0.989
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no manual
outlier checking)
Minho plus monthly
manual checking
(UCD)
Accuracy of detection (Precision)
0.620
0.552 0.590
0.730
0.144
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho (no
manual outlier
checking)
Minho plus
monthly manual
checking (UCD)
Without
filtration
Accuracy of download stats
(Inverse precision)
I.e. 38% of DSpace’s
reported downloads are
made by robots, etc.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace EPrints Minho Minho with
monthly manual
checking (UCD)
No robot
detection
Robot detection in OA IR systems
Recall Precision Negative precision (accuracy of download stats)
Thank you!

Más contenido relacionado

Destacado

Web Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoWeb Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoSara Baraccani
 
Visibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkVisibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkUCD Library
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163Mohd Yusak
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...UCD Library
 
Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Henri Kaufman
 
Access to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyAccess to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyUCD Library
 
Presentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaPresentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaMarc Garriga
 
Custom Components In Flex 4
Custom Components In Flex 4Custom Components In Flex 4
Custom Components In Flex 4Mrinal Wadhwa
 
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...UCD Library
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyesUCD Library
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)oreinaue
 
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...UCD Library
 
Presentation of #da12data initiative in the Open Data Week, Nantes
Presentation of #da12data  initiative in the Open Data Week, NantesPresentation of #da12data  initiative in the Open Data Week, Nantes
Presentation of #da12data initiative in the Open Data Week, NantesMarc Garriga
 
Introduction
IntroductionIntroduction
IntroductionDeep Deep
 
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Marc Garriga
 

Destacado (20)

Web Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadratoWeb Squared - dal web 2.0 al web al quadrato
Web Squared - dal web 2.0 al web al quadrato
 
Visibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your WorkVisibility and Engagement: Using Social Media for Your Work
Visibility and Engagement: Using Social Media for Your Work
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
 
Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011Last news from New York / Buzz the Brand 2011
Last news from New York / Buzz the Brand 2011
 
OpenGovernment
OpenGovernmentOpenGovernment
OpenGovernment
 
Access to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarneyAccess to virtual & physical resources. Author: Eoin McCarney
Access to virtual & physical resources. Author: Eoin McCarney
 
mdalton_IFLA
mdalton_IFLAmdalton_IFLA
mdalton_IFLA
 
Graphis Feature
Graphis FeatureGraphis Feature
Graphis Feature
 
Confluence
ConfluenceConfluence
Confluence
 
Presentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de BarcelonaPresentació de Web 2.0 a l'Ajuntament de Barcelona
Presentació de Web 2.0 a l'Ajuntament de Barcelona
 
Custom Components In Flex 4
Custom Components In Flex 4Custom Components In Flex 4
Custom Components In Flex 4
 
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
New Competencies for the Academic Librarian: A Case Study of Patron-Driven Ac...
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyes
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)
 
Web 2.0 in Campaigns
Web 2.0 in CampaignsWeb 2.0 in Campaigns
Web 2.0 in Campaigns
 
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
The Information Literacy Impact Factor: How to Measure Value - Author: Lorna ...
 
Presentation of #da12data initiative in the Open Data Week, Nantes
Presentation of #da12data  initiative in the Open Data Week, NantesPresentation of #da12data  initiative in the Open Data Week, Nantes
Presentation of #da12data initiative in the Open Data Week, Nantes
 
Introduction
IntroductionIntroduction
Introduction
 
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
Presentation of iCity Project at Polytechnic University of Catalonia (Compute...
 

Similar a How Accurate are IR Usage Statistics?

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...CONUL Conference
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery ToolsNikki Kerber
 
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...UCD Library
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...LIBER Europe
 
We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) Alejandra Nann
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728Michael Levine-Clark
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardEMBL-ABR
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodKarry Lu
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information RetrievalCarsten Eickhoff
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...Prateek Singh
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfAvijitChaudhuri3
 

Similar a How Accurate are IR Usage Statistics? (20)

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
Unit 1
Unit 1Unit 1
Unit 1
 
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...
 
Usability Report - Discovery Tools
Usability Report - Discovery ToolsUsability Report - Discovery Tools
Usability Report - Discovery Tools
 
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
COUNTER Standards for Open Access: the Value of Measuring/ the Measuring of V...
 
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
COUNTER Standards for Open Access: The Value of Measuring/The Measuring of Va...
 
We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?) We Went Mobile! (Or Did We?)
We Went Mobile! (Or Did We?)
 
eScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiativeseScience at the Royal Society of Chemistry and our current initiatives
eScience at the Royal Society of Chemistry and our current initiatives
 
Discovery study detailed results 20140728
Discovery study detailed results 20140728Discovery study detailed results 20140728
Discovery study detailed results 20140728
 
eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...eScience Resources for the Chemistry Community from the Royal Society of Chem...
eScience Resources for the Chemistry Community from the Royal Society of Chem...
 
Digital libraries
Digital librariesDigital libraries
Digital libraries
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Introduction to Information Retrieval
Introduction to Information RetrievalIntroduction to Information Retrieval
Introduction to Information Retrieval
 
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
Sentiment mining- The Design and Implementation of an Internet PublicOpinion...Sentiment mining- The Design and Implementation of an Internet PublicOpinion...
Sentiment mining- The Design and Implementation of an Internet Public Opinion...
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.ppt
ML.pptML.ppt
ML.ppt
 
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdfML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
ML.pptvdvdvdvdvdfvdfgvdsdgdsfgdfgdfgdfgdf
 

Más de UCD Library

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityUCD Library
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryUCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationUCD Library
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaUCD Library
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaUCD Library
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsUCD Library
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and PreservationUCD Library
 

Más de UCD Library (20)

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrity
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA Humanities
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and education
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishers
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in China
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in China
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital Collection
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining Collections
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locations
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and Preservation
 

Último

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Jisc
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 

Último (20)

General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 

How Accurate are IR Usage Statistics?

  • 1. Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire UCD Library University College Dublin, Belfield, Dublin 4, Ireland Joseph Greene Research Repository Librarian University College Dublin joseph.greene@ucd.ie http://researchrepository.ucd.ie How accurate are IR usage statistics? Open Repositories 2016 Dublin, 16 June
  • 2. Usage statistics are important for OA repositories • How is the service used overall? • Advocacy – Connects with authors on what is most important to them: the use of their research • KPI for return on investment – Usage of a Library service – Visibility of university’s research
  • 3.
  • 4. Monthly email sent to all depositors
  • 5. Infographic distributed semi-annually by College Liaison Librarians
  • 6. How accurate are they? Web robots • Some follow rules – Search engines, Internet Archive, link checkers, Twitterbot, etc. – robots.txt, naming themselves in the user agent string • Others do not – Email spammers, comment spammers, dictionary attackers, phishers, etc. – Often mimic human users
  • 7. Experimental study • Simple random sample of 2 years of UCD repository’s download data – n=341, N=3.3 million; 96.20% certainty • Manually checked to determine if robot or human • Compared findings against our robot detection technique – U. Minho DSpace Stats Add-on – Monthly outlier exclusion (manual) Greene, J. Web robot detection in scholarly Open Access institutional repositories. Library Hi Tech, July 2016
  • 8. First finding 85% of the Research Repository UCD’s unfiltered downloads come from robots • This is confirmed in a 2013 IRUS-UK white paper on 20 IRs; 85% was also found to be robots
  • 9. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accuracyofdownloadstats(inverseprecition) Recall (robots) Catching more robots improves stats (But how much depends on the number of robots) Getbetterstats Catch more robots Typical website, 15% robot traffic OA journal, 40% robot Internet Archive, 91% robot OA repositories, 85% robot
  • 10. How did we do at UCD? • What proportion of robot downloads did we catch? (Recall) – Our method catches 94% of all robots • How often were we correct -- how many are actually human? (Precision) – 98.9% of downloads that we label robots really are robots • How accurate are the download stats -- how many are actually made by human beings? (Inverse precision) – 73% of the download statistics as reported are human
  • 11. How does that compare? • Who knows? There are no other studies like this on repositories! • Applied DSpace's and EPrints' web robot detection algorithms to our data – Experimental – Real data – Same dataset used for each ‘system’ – Algorithms easy to mimic in vitro – But SEO, crawl behaviour may be different for different systems
  • 12. Robot detection techniques used DSpace EPrints Minho DSpace Statistics Add-on Rate of requests ✓3 User agent string ✓ ✓ ✓ robots.txt access ✓ Volume of requests ✓2 ✓3 List of known robot IP addresses ✓ ✓ Reverse DNS name lookup ✓1 Trap file ✓ User agents per IP address Width of traversal in the URL space ✓3 1 Only implemented nominally or experimentally 2 Via the repeat download or ‘double-click’ filter 3 Data available as a configurable report for manual decision making
  • 14. 0.897 0.911 0.890 0.942 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Robots detected (Recall)
  • 15. 1.000 0.940 0.989 0.989 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Accuracy of detection (Precision)
  • 16. 0.620 0.552 0.590 0.730 0.144 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho (no manual outlier checking) Minho plus monthly manual checking (UCD) Without filtration Accuracy of download stats (Inverse precision) I.e. 38% of DSpace’s reported downloads are made by robots, etc.
  • 17. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace EPrints Minho Minho with monthly manual checking (UCD) No robot detection Robot detection in OA IR systems Recall Precision Negative precision (accuracy of download stats)

Notas del editor

  1. Download and other usage statistics in an item view
  2. In addition, data is provided to Schools for quality reviews and accreditation
  3. Have been aware of web robots since 2009. Using U Minho plus visually checking for outliers once/month Hit 1mil dls in 2015, decided we must know more about it (how to properly identify, how accurate our statistics are); want to have confidence in the information that we produce
  4. Experiment: simple random sample of 2 years of download data (n=341, N=3.3 million for 96.20% certainty), manually checked to determine if robot or human. DSpace 1.8.2 with U. Minho DSpace Statistics Add-on v. 4. Apache Tomcat behind Apache HTTP server; logs in Apache Combined Log Format. Minho registers every download in the PostgreSQL database. Results to be published in July 2016 issue of Library Hi Tech (Greene 2016)
  5. See: INFORMATION POWER LTD. 2013. IRUS download data: identifying unusual usage [Online]. Available: http://www.irus.mimas.ac.uk/news/IRUS_download_data_Final_report.pdf [Accessed 2015-12-11]. Confirms 85% figure DORAN, D. & GOKHALE, S. S. 2011. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22, 183-210. Hypothesizes why so high in OA (p.191)
  6. Typical website (15% robot traffic) (precision = 0.8727, mean of four studies; robots:total sessions = 0.1516, mean of four studies) OA journal (40% robot) HUNTINGTON, P., NICHOLAS, D. & JAMALI, H. R. 2008. Web robot detection in the scholarly information environment. Journal of Information Science, 34, 726-741. OA repositories (85% robot) Greene 2016 and Information Power 2013 (see above) Internet Archive (91% robot) ALNOAMANY, Y., WEIGLE, M. C. & NELSON, M. L. 2013. Access patterns for robots and humans in web archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 339-348. Reverse is also true: fail to catch robots (e.g. deterioration over time as robots improve their capabilities), accuracy of stats diminishes Formula: Greene 2016 𝐏𝐢𝐧𝐯 = 𝐓𝐑(𝐑−𝐏𝐑−𝟏)+𝟐𝐓𝐏𝐑−𝐏(𝐓+𝐑−𝟏) 𝐑(𝐓𝐑−𝐏−𝐓)+𝐏 R = recall (robot detection) P = precision (robot detection) Pinv = inverse precision (human stats) T = ratio of robots to total
  7. Greene 2016