Presentation on the Knowledge Exchange project on metadata standards for exchange between Current Research Information Systems and Open Access Repositories. Presented at EUROCRIS, Aalborg, Denmark, 4 June 2010 by Mogens Sandfaer.
3. +
CRIS
OAR
INTER
OPERA
Knowledge Exchange is an international co-operative effort
that supports the use and development of e-infrastructures
for higher education and research.
Partners are:
Denmark’s Electronic Research Library (DEFF)
German Research Foundation (DFG)
Joint Information Systems Committee (JISC) in UK
SURF foundation in the Netherlands
4. +
Motivation: Enable broad collaboration in the
information management of research publications CRIS
OAR
INTER
OPERA
Current Research Information Systems
a label for research management systems of various
types, dealing with many aspects of research activities
contain metadata on research publications
Open Access Repositories
a label for for open research output archives aiming at
preservation and dissemination of publications etc.
contain metadata on research publications
Theyshare the challenge of achieving full metadata
coverage for the publications within their scope
5. +
Motivation: Enable broad collaboration in the
information management of research publications CRIS
OAR
INTER
OPERA
If CRIS and OAR easily could exchange metadata about
publications, they could support each other
But CRIS and OAR have grown out of different
communities and have developed rather different
approaches to publication metadata
If a university has a CRIS and an OAR, generally a
publication must be registered twice to comply with
both systems’ requirements
Both CRIS and OAR strive to be complete in their
coverage of publications – both would benefit from
collaboration – not to mention the authors/researchers.
6. +
Motivation: Enable broad collaboration in the
information management of research publications CRIS
OAR
INTER
OPERA
CRIS use a variety of formats – some use CERIF
(or variants thereof) and some use various local or
national formats
In many disciplines, publications are of global interest
and are often results of international collaboration
They are often of interest to more than one CRIS
CRIS with different formats would benefit from an easy
and precise mechanism to exchange publication
metadata
7. +
Motivation: Enable broad collaboration in the
information management of research publications CRIS
OAR
INTER
OPERA
OAR use a variety of formats – some use Dublin Core
(or variants thereof), some use library formats such as
MARC and MODS, and some use use various local or
national formats
In many disciplines publications are of global interest
and are often results of international collaboration
They are often of interest to more than one OAR
OAR with different formats would benefit from an easy
and precise mechanism to exchange publication
metadata
8. +
Aim and purpose
CRIS
OAR
INTER
OPERA
To increase the metadata interoperability
between CRIS and OAR systems
and thus also
between CRIS and CRIS with different formats
between OAR and OAR with different formats
by defining and proposing
1.a metadata exchange format for publications
2.a set of common vocabularies for key elements
9. +
Project participants
CRIS
OAR
INTER
OPERA
UK - JISC DE - DFG NL - SURF DK - DEFF
Rosemary Russell, Wolfram Horstmann, Marga van Meel, Adrian Price,
UKOLN Bielefeld University KNAW Copenhagen Univ. Project
manager
Michael Day, Najko Jahn, Arnoud Jippes, Mikael Elbaek,
UKOLN Bielefeld University KNAW Technical Univ. DK Project
director
Simon Lambert, Friedrich Summann, Ed Simmons Mogens Sandfaer,
Rutherford Appleton Bielefeld University Nijmegen Univ. Technical Univ. DK
Max Stempfhuber,
Aachen University
10. + good
Building new bridges in the old world
Not designing new (and better) worlds
CRIS
OAR
INTER
OPERA
This metadata island knows
well what is doing - Good
reasons govern its choice
of format and vocabulary
11. + good
Building new bridges in the old world
Not designing new (and better) worlds
CRIS
OAR
INTER
OPERA
We (simply) build a bridge
that will enable these islands to communicate
- without changing their language and life style.
That will allow them to exchange publication metadata
without studying and understanding the particularities of the other part.
This metadata island knows
well what is doing - Good
reasons govern its choice
of format and vocabulary
12. +
Challenges stemming from
CRIS
different missions of formats OAR
INTER
OPERA
The different nature (and tasks) of
CRISformats
Repository formats
The granularity challenge
13. +
The different nature of CRIS
CRIS
and repository formats OAR
INTER
OPERA
Typical CRIS main entities and their relations
(many triples & many detailed fields)
14. +
The different nature of CRIS
CRIS
and repository formats OAR
INTER
OPERA
Simple
Dublin Core
15 fields in
a single flat
structure
Aimed at the
description of
some sort of
“document”
May be
enhanced to
provide more
granularity
and specificity
But mostly isn’t
15. +
Bridging publications metadata
CRIS
OAR
INTER
OPERA
CRIS formats are characterized by their
broader view on research information depicting research
results as well as the actors and various environmental
factors in their own right
(often) high level of detail and specificity in describing the
various entities (very granular and precise)
ability to handle the dynamics of time – as everything else
but research publications changes over time as well as
their interrelations
16. +
Bridging publications metadata
CRIS
OAR
INTER
OPERA
OAR (DC) formats are characterized by their
Narrow view on depicting research results – generally
publications
(mostly) low level of detail and specificity in describing the
various aspects (less granular)
absence of need to handle the dynamics of time – as they
deal with research publications tied to a specific point in
time
17. +
Bridging publications metadata
CRIS
OAR
INTER
OPERA
Implode the relational/network nature of the
CRIS formats to a single structure – adequate for
describing publications
Design the field/element hierarchy so that highly
granular as well less granular metadata may be
represented – without loss of information
18. + Project approach
CRIS
OAR
INTER
OPERA
DRIVER
DRIVER
CERIF
DRIVER
METIS
Metadata DC
exchange
format and
vocabulary
DRIVER ePrints
default
DRIVER
DDF-MXD
NARCIS
MODS
19. + Project approach
CRIS
OAR
INTER
OPERA
1. Analyze metadata practices of CRIS and OAR
Looking at formats in actual use at KE partners
Chart entities and granularities, similarities, differences
20. + Project approach
CRIS
OAR
INTER
OPERA
2. Define entities/elements/attributes to be exchanged
Respecting differences in granularity
So that metadata may be exported without loss of information
So that the format may be used by very granular
environments as well as less granular
3. Define/propose common exchange vocabulary
For the identified key concepts/entities
4. Define/propose common exchange syntax
Handle differences in granularity
21. +
Some potential use cases
CRIS
OAR
INTER
OPERA
CRISOAR
OARCRIS
CRISCRIS
OAROAR
CRIS/OAROpenAIRE (EU Open Access pilot)
PublisherCRIS/OAR
Subject repositoryCRIS/OAR (institutional)
23. +
Based on ideal examples – ”use
CRIS
cases” OAR
INTER
OPERA
24. +
Ideal example of a publication
CRIS
OAR
INTER
OPERA
25. +
Basic idea evolved
CRIS
OAR
INTER
OPERA
To carrie both the highest granularity (CRIS) and the lowest
level (OAR?)
26. +
The DC elements are used as a
baseline.
Title Format
Creator Indentifier
Subject Source
Description Language
Publisher Relation
Contributer Coverage
Date Rigths
Type
27. +
Main entities of interest
The publication is in focus and other entities are in relation to the publication
34. +
Vocabularies
CRIS
OAR
INTER
OPERA
Person
Role
Description: role is the person role in
relation to the publication.
Terms:
Author Translator
Primary Author Illustrator
Corresponding Author Inventor
Editor Supervisor
Publisher
35. +
Publication in detail – type, review
CRIS
and OAR
INTER
OPERA
36. +
Publication types
CRIS
OAR
Publication INTER
OPERA
Type
Description: the format does provide a gross list of publication
types based on an analysis of the formats analysed in the project.
A mapping between the different systems and formats in the
analysis can be found on a web page.
Mapping between common vocabularies can be found at:
http://weekschild.uci.ru.nl/KE/?select=all
The formats analysed: CERIF2008, MODS/DIDL, DRIVER_DC, DDF-
MXD; EPrints, METIS, PURE
37. +
Publication types (terms)
CRIS
OAR
INTER
OPERA
Journal Letter Conference talk Net publication
Journal comment Thesis Doctoral Patent
Journal review article Thesis PhD Software
Journal book review Thesis Master Data set
Book Working paper, preprint Newspaper article
Book chapter Report Radio/TV broadcast
Book preface Report chapter Exhibition catalogue
Conference paper Lecture Notes Student report
Conference abstract Lecture Other
Conference poster Memorandum
38. +
Vocabularies - Versions
CRIS
OAR
Version INTER
OPERA
Description: This element and vocabulary is expressing the version of
the document i.e. draft or published version of the document. The terms
are based on the VERSIONS toolkit excluding the term “updated”.
Important! Different versions should be self contained and constitute
individual records. This mirrors best-practices for repositories but not
always the case for CRIS.
Terms:
Draft i.e. working paper
Submitted i.e. pre print
Accepted i.e. post print
Published i.e. publisher edition
Updated i.e. reprint
VERSIONS project: http://www2.lse.ac.uk/library/versions/