SlideShare una empresa de Scribd logo
1 de 15
Descargar para leer sin conexión
1/15
Why repository harvestability
matters
By Lucas Anastasiou,
KMi, The Open University
UKCoRR members meeting
3rd December 2013
2/15
Outline
• Exposing your metadata
• OAI organic issues
• OAI misuse
• Open Access principles
• Monitor tools
• Conclusion
3/15
Exposing your metadata
• One of the direct objectives of maintaining a
(institutional) repository is visibility and
dissemination
• OAI-PMH : a lower barrier mechanism for
repository interoperability
4/15
Exposing your content
• OAI-PMH is the protocol to provide a list of your repository
resources and provide incremental updates
– Your repository’s ‘RSS feed’
• OAI-PMH is highly adopted
– openDOAR reports 2508 repositories worldwide
– Mature ‘out-of-the-box’ repository software supporting it:
Eprints, Dspace, Bepress
• ResourceSync in the horizon
– Do we need another standard?
5/15
OAI-PMH organic issues
• No search mechanism : cannot filter by entity criteria
– E.g. get me all records of author X
• Built for metadata harvesting not for content harvesting
• Weak pagination mechanism
– resumptionTokens is a common issue
• Low granularity (per day)
• No default way to expose content in different ways according to
licensing
– though there is a workaround
• Long procedure to harvest full corpus, no way to process data on
the fly, need a local copy of data to work with it
6/15
OAI-PMH misuse
Common issues identified by CORE’s 609 tracked repositories
0 5 10 15 20
Misuse of external resolution service
Error 500
Soft/Hard 404
Exclusion from robots.txt
Bad ListRecords implementation
DNS resolution error
No incremental updates
Invalid XML characters
Wrong semantic tag
7/15
Open Access content principles
• Content referencing
– Content referencing Open repositories should always
establish a link from the metadata record to the item the
metadata record describes using a dereferencable
identifier pointing to the version held in the repository.
• Content accesibility to machine agents
– Open repositories must provide universal access to
machines with the same level of access as humans have.
8/15
Common issues (1)
User-agent: *
Sitemap: http://repository.jisc.ac.uk/sitemap.xml
Disallow: /
Since OAI-PMH is used for content harvesting, exclusion from
content is an (major) issue
Breaks the principle of content to be universally machine
accessible
Presumption of innonence.
Exclusion from robots.txt
9/15
Common issues (2)
Misuse of external resolving service
. . .
<dc:identifier>
http://hdl.handle.net/123456789/1656
</dc:identifier>
. . .
10/15
Common issues (3)
11/15
Discoverability / Harvestability
• Improve discoverability by service providers
• Open Access repositories registries
– OpenDOAR
– ROAR
• Once a document is online and Open Access it should be
retrievable and can be processed by humans and machine agents
as well
• Follow guidelines / best practices, avoid pitfalls
12/15
Monitor tools
• Validation tools to check against your repository metadata/content
exposure
– CORE repository analytics
– RIOXX Guidelines
• http://rioxx.net/
• Consistency of metadata fields, tracking of research outputs across
scholar systems
– OpenAIRE
• https://www.openaire.eu/
• OpenAIRE guidelines, OpenAIRE validator service
13/15
CORE repository analytics
Simple information on how much content has been aggregated
by CORE service, repository harvesting status and logs of
harvesting attempts (with issues included)
http://core.kmi.open.ac.uk/repository_analytics
14/15
So why does it matter after all?
• To achieve the primary goal of IR: to ‘open and disseminate
research outputs to a worldwide audience’
• Provide the best quality content to service providers that
make your repository more ‘discoverable’ , ‘accessible’,
‘reusable’. Aggregators can act as ambassadors in your behalf
• Add value (or avoid removing value) by disseminating your
content in the best possible way
15/15
Conclusion
Currently used protocol (OAI-PMH) has limitations
– By design
– By misuse
Don’t need to embrace a new standard (yet), make the best out
of the current standard
Unleashing your data to the web, makes your organisation
research output more visible, expands your audience

Más contenido relacionado

Similar a Uk CORR presentation

For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...Torsten Reimer
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...Open Science Fair
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Nancy Pontika
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Frank Oellien
 
Technical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerationsTechnical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerationsPaul Walk
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studieseohallor
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publicationspetrknoth
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsNandana Mihindukulasooriya
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Artefactual Systems - AtoM
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...petrknoth
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research dataARDC
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Jenny Mitcham
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositoriesPaul Walk
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...OpenAIRE
 
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics ClassCourtney Mumma
 
OA in the Library Collection: The Challenge of Identifying and Managing Open ...
OA in the Library Collection: The Challenge of Identifying and Managing Open ...OA in the Library Collection: The Challenge of Identifying and Managing Open ...
OA in the Library Collection: The Challenge of Identifying and Managing Open ...NASIG
 
Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Ina Smith
 

Similar a Uk CORR presentation (20)

For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...For repositories to succeed they have to end. Reflections on (not just) the U...
For repositories to succeed they have to end. Reflections on (not just) the U...
 
OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...OSFair2017 training | Machine accessibility of Open Access scientific publica...
OSFair2017 training | Machine accessibility of Open Access scientific publica...
 
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...Developing Infrastructure to Support Closer Collaboration of Aggregators with...
Developing Infrastructure to Support Closer Collaboration of Aggregators with...
 
AtoM Community Update: 2019-05
AtoM Community Update: 2019-05AtoM Community Update: 2019-05
AtoM Community Update: 2019-05
 
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
Text Mining - Techniques & Limitations (A Pharmaceutical Industry Viewpoint)
 
Technical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerationsTechnical standards & the RDTF Vision: some considerations
Technical standards & the RDTF Vision: some considerations
 
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
Harvesting Repositories:  DPLA, Europeana, & Other Case StudiesHarvesting Repositories:  DPLA, Europeana, & Other Case Studies
Harvesting Repositories: DPLA, Europeana, & Other Case Studies
 
Towards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific PublicationsTowards an Infrastructure for Mining Scientific Publications
Towards an Infrastructure for Mining Scientific Publications
 
Application integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standardsApplication integration with the W3C Linked Data standards
Application integration with the W3C Linked Data standards
 
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
Building the Future Together: AtoM3, Governance, and the Sustainability of Op...
 
From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...From Open Access Metadata to Open Access Content: Two Principles for Increase...
From Open Access Metadata to Open Access Content: Two Principles for Increase...
 
Using Archivemedia to preserve research data
Using Archivemedia to preserve research dataUsing Archivemedia to preserve research data
Using Archivemedia to preserve research data
 
Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...Project update: A collaborative approach to "filling the digital preservation...
Project update: A collaborative approach to "filling the digital preservation...
 
Next generation repositories
Next generation repositoriesNext generation repositories
Next generation repositories
 
A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...A user journey in OpenAIRE services through the lens of repository managers -...
A user journey in OpenAIRE services through the lens of repository managers -...
 
OAI-PMH
OAI-PMHOAI-PMH
OAI-PMH
 
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
2013 05-15 Intro to Archivematica - UBC SLAIS Digital Records Forensics Class
 
OA in the Library Collection: The Challenge of Identifying and Managing Open ...
OA in the Library Collection: The Challenge of Identifying and Managing Open ...OA in the Library Collection: The Challenge of Identifying and Managing Open ...
OA in the Library Collection: The Challenge of Identifying and Managing Open ...
 
Digitisation and institutional repositories 2
Digitisation and institutional repositories 2Digitisation and institutional repositories 2
Digitisation and institutional repositories 2
 
Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)Online Journal Management using Open Journal Systems (OJS)
Online Journal Management using Open Journal Systems (OJS)
 

Último

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Adtran
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesMd Hossain Ali
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarPrecisely
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPathCommunity
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXTarek Kalaji
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024D Cloud Solutions
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Will Schroeder
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationIES VE
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintMahmoud Rabie
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAshyamraj55
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 

Último (20)

20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™Meet the new FSP 3000 M-Flex800™
Meet the new FSP 3000 M-Flex800™
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just MinutesAI Fame Rush Review – Virtual Influencer Creation In Just Minutes
AI Fame Rush Review – Virtual Influencer Creation In Just Minutes
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
AI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity WebinarAI You Can Trust - Ensuring Success with Data Integrity Webinar
AI You Can Trust - Ensuring Success with Data Integrity Webinar
 
UiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation DevelopersUiPath Community: AI for UiPath Automation Developers
UiPath Community: AI for UiPath Automation Developers
 
VoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBXVoIP Service and Marketing using Odoo and Asterisk PBX
VoIP Service and Marketing using Odoo and Asterisk PBX
 
Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024Artificial Intelligence & SEO Trends for 2024
Artificial Intelligence & SEO Trends for 2024
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
Apres-Cyber - The Data Dilemma: Bridging Offensive Operations and Machine Lea...
 
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve DecarbonizationUsing IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
Using IESVE for Loads, Sizing and Heat Pump Modeling to Achieve Decarbonization
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Empowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership BlueprintEmpowering Africa's Next Generation: The AI Leadership Blueprint
Empowering Africa's Next Generation: The AI Leadership Blueprint
 
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPAAnypoint Code Builder , Google Pub sub connector and MuleSoft RPA
Anypoint Code Builder , Google Pub sub connector and MuleSoft RPA
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 

Uk CORR presentation

  • 1. 1/15 Why repository harvestability matters By Lucas Anastasiou, KMi, The Open University UKCoRR members meeting 3rd December 2013
  • 2. 2/15 Outline • Exposing your metadata • OAI organic issues • OAI misuse • Open Access principles • Monitor tools • Conclusion
  • 3. 3/15 Exposing your metadata • One of the direct objectives of maintaining a (institutional) repository is visibility and dissemination • OAI-PMH : a lower barrier mechanism for repository interoperability
  • 4. 4/15 Exposing your content • OAI-PMH is the protocol to provide a list of your repository resources and provide incremental updates – Your repository’s ‘RSS feed’ • OAI-PMH is highly adopted – openDOAR reports 2508 repositories worldwide – Mature ‘out-of-the-box’ repository software supporting it: Eprints, Dspace, Bepress • ResourceSync in the horizon – Do we need another standard?
  • 5. 5/15 OAI-PMH organic issues • No search mechanism : cannot filter by entity criteria – E.g. get me all records of author X • Built for metadata harvesting not for content harvesting • Weak pagination mechanism – resumptionTokens is a common issue • Low granularity (per day) • No default way to expose content in different ways according to licensing – though there is a workaround • Long procedure to harvest full corpus, no way to process data on the fly, need a local copy of data to work with it
  • 6. 6/15 OAI-PMH misuse Common issues identified by CORE’s 609 tracked repositories 0 5 10 15 20 Misuse of external resolution service Error 500 Soft/Hard 404 Exclusion from robots.txt Bad ListRecords implementation DNS resolution error No incremental updates Invalid XML characters Wrong semantic tag
  • 7. 7/15 Open Access content principles • Content referencing – Content referencing Open repositories should always establish a link from the metadata record to the item the metadata record describes using a dereferencable identifier pointing to the version held in the repository. • Content accesibility to machine agents – Open repositories must provide universal access to machines with the same level of access as humans have.
  • 8. 8/15 Common issues (1) User-agent: * Sitemap: http://repository.jisc.ac.uk/sitemap.xml Disallow: / Since OAI-PMH is used for content harvesting, exclusion from content is an (major) issue Breaks the principle of content to be universally machine accessible Presumption of innonence. Exclusion from robots.txt
  • 9. 9/15 Common issues (2) Misuse of external resolving service . . . <dc:identifier> http://hdl.handle.net/123456789/1656 </dc:identifier> . . .
  • 11. 11/15 Discoverability / Harvestability • Improve discoverability by service providers • Open Access repositories registries – OpenDOAR – ROAR • Once a document is online and Open Access it should be retrievable and can be processed by humans and machine agents as well • Follow guidelines / best practices, avoid pitfalls
  • 12. 12/15 Monitor tools • Validation tools to check against your repository metadata/content exposure – CORE repository analytics – RIOXX Guidelines • http://rioxx.net/ • Consistency of metadata fields, tracking of research outputs across scholar systems – OpenAIRE • https://www.openaire.eu/ • OpenAIRE guidelines, OpenAIRE validator service
  • 13. 13/15 CORE repository analytics Simple information on how much content has been aggregated by CORE service, repository harvesting status and logs of harvesting attempts (with issues included) http://core.kmi.open.ac.uk/repository_analytics
  • 14. 14/15 So why does it matter after all? • To achieve the primary goal of IR: to ‘open and disseminate research outputs to a worldwide audience’ • Provide the best quality content to service providers that make your repository more ‘discoverable’ , ‘accessible’, ‘reusable’. Aggregators can act as ambassadors in your behalf • Add value (or avoid removing value) by disseminating your content in the best possible way
  • 15. 15/15 Conclusion Currently used protocol (OAI-PMH) has limitations – By design – By misuse Don’t need to embrace a new standard (yet), make the best out of the current standard Unleashing your data to the web, makes your organisation research output more visible, expands your audience

Notas del editor

  1. A low level technique to achieve interoperability across the distributed ecosystem of institutional repositories is OAI-PMH
  2. Built for metadata not for content Inconsistency for page content Granularity (by day) Long procedure
  3. Built for metadata not for content Inconsistency for page content Granularity (by day) e.g. arxiv has 2700 daily updates Long procedure
  4. X-axis number of repositories Similar problems reported for the PerX project UK repos : all 154 • When repository uses URL in <dc: identifier> : http://hdl.handle.net/123456789/ (3) misuse of external resolving service(hdl.handle.net) • Error 500 (9) • Soft/Hard 404’s (15) Hard 404 are easy to detect while soft you need to examine the response and deduce • If CORE is allowed in Robots.txt (6) • When resumptionToken repeats (4) bad OAI-PMH implementation • When Resumption token not valid (8) • If no <dc:identifier> (1) • When DNS not resolved/timeout (2) • 401 Unauthorized (Requires authentication) (1) • bad hdl.handle.net links (http://hdl.handle.net/10311/228) (2) • Report invalid XML characters – ideally The character and location (16) • Connection Time out (2) • Records Stored in <dc:identifier.uri> rather than <dc:identifier> (2) wrong semantics • Detect if incremental harvest is supported by repository 3 repos do not support incremental udpates • Detect when doing incremental updates: <error code="noRecordsMatch">The combination of the values of the from, until, set and metadataPrefix arguments result in an empty set.</error> • Detect when repository is using default/generic oai identifier. e.g. oai:generic.eprints.org
  5. Concepts to faciliate transition from open access metadata to open access content Principle 1: Identifier must by actionable - if a record conatins a url ,this url must be accesible
  6. Service must be unobrusively accesible Other ways to identify abusing robots/limit those
  7. Identifier is not actionable
  8. Make sure these registries hold the latest/fresh entry of your endpoint Follow guidelines/ best practices Avoid pitfalls Differentiate your licensed content using OAI protocol’s Sets AVOID password protection
  9. Raises an alarm to repository administrators if these numbers are not consistent with their datasets
  10. Focus on building services on content rather on the content Other service providers: CORE , BASE, Google Scholar?, RepUK As if nobody’s reading