The FAIR Data Principles are a hot topic in research data managment. Their adoption within the H2020 funding programme means researchers now have to pay much more attention to how their share, publish and archive their data.
In this light, how can libraries help their research communities implement the FAIR principles? And write better data management plans?
This questions were addressed in a LIBER webinar containing some guidance and reflections on the principles themselves. Presented by Alastair Dunning, Head Research Data Services at the TU Delft (hosts of the 4TU.Centre for Research Data), it is based on a study of 37 data repositories (from subject specific repositories, to generic data archives, to national infrastructures), seeing how far they comply with each of the individual facets of the Data principles.
3. • Alastair Dunning
@alastairdunning
• Jasmin Böhmer
@JasminBoehmer
Technical University of Delft
Hosts of 4TU.Centre for Research Data
Are the FAIR Data
Principles Fair?
• Madeleine de Smaele
@MadeleineSmaele
4. Are the FAIR Data
Principles Fair?
Blog Post with all the information:
http://bit.ly/2lIgc9p
LIBER Webinar, 10th March
5. Motivation for this Project
● H2020 / EU demands on open data and research data
management.
● Providing insight and support for repositories to improve their
information architecture and digital infrastructure to comply to
H2020 and FAIR demands.
● Own aspiration to offer the best possible service and support for
4TU.Centre for Research Data.
● Working towards practices to improve interoperability and reuse-
value of data-sets in research data repositories.
6. Using the FAIR principles and corresponding facets as scoring
matrix
Applying a traffic-light rating system:
Use the information available on the web-interface of the
repository online to evaluate the FAIR Principles
Methodology
11. F1 (meta)data are assigned a globally unique and eternally persistent identifier.
49% of the repositories
do not assign DOI,
HANDLE, or URN.
E.g. Subject Based
Repositories use
project ID’s or subject
specific ID-systems.
These links do not work
in public spheres.
12. A4 metadata are accessible, even when the data are no longer available.
97% of the repository do not
clearly write about their
metadata persistency, if the
data is not available (anymore).
The transparency and integrity
of the repository is improved by
providing metadata-records for
closed, restricted, or
unavailable data-sets.
13. I2 (meta)data use vocabularies that follow FAIR principles.
100% of the repositories do not
have visible ontologies or
(controlled) vocabulary.
Adding a semantic layer that
enables links to unambiguous
terms and definitions needs a
lot of curation effort.
Is e.g. ORCID (Open Researcher
and Contributor ID) a
vocabulary?
14. R1 meta(data) have a plurality of accurate and relevant attributes.
38% of the repositories do not
provide sufficient information
that helps to determine the
value of reuse for the
information seeker.
Specific information are mostly
included in the documentation.
Displaying those information in
appropriate metadata fields
would be beneficial.
21. Narrow -
(meta)data are retrievable by their identifier using
a standardized communications protocol.
the protocol is open, free, and universally
implementable.
the protocol allows for an authentication and
authorization procedure, where necessary.
22. Broad -
(meta)data include qualified references to other
(meta)data.
(meta)data meet domain-relevant community
standards (takes a long time to figure out)
23. Technical vs Policy
● (meta)data are retrievable by their identifier
using a standardized communications protocol.
● the protocol is open, free, and universally
implementable.
● the protocol allows for an authentication and
authorization procedure, where necessary.
● metadata are accessible, even when the data
are no longer available.
25. Compliance of Social Science Data Repositories against
FAIR Findable Principles (F1, F2, F3 and F4)
26. Practice for Social Science Repositories Analysed
● Data only available on request
● Licence not visible / clear
● Plenty of free text documentation on collection of data
exists
● No structured metadata per dataset / no machine
readable metadata
● But still seem to work well within the discipline
27. LASA - Longitudinal Aging Study Amsterdam. Aging research and
collecting data on aging in the Netherlands
No global identifier
No structured metadata
But plenty of documentation
28. ● Licence sometimes clear (no data protection issues)
● Some free text documentation on the overall collection of data
exists
● No structured metadata per dataset / sometime the data is
dynamically created following query
● No global identifiers per dataset
● Meeting existing disciplinary norms but not fully embedded as
machine readable data
Practice for Climate Data Repositories Analysed
29. SACA - Southeast Asian Climate Assessment
No structured metadata
But plenty of documentation
No global identifier
31. ● Create a permanent identifier for each
dataset
● Always use an open license or clear
License
● Make sure each dataset has rich
metadata associated with it (Dublin
Core good starting place!)
● Make data available via http
32. Some Final Points (I)
● FAIR principles are deliberately vague -
principles to be interpreted
● Nothing about back-up and preservation.
Relationship to Data Seal of Approval?
● Much more work to be done on relationship
between FAIR data and FAIR repository
33. Some Final Points (II)
● To create FAIR dataset demands alliance
between repository and dataset creator
● Governance? How are principles updated
● FAIR principles derive not from libraries /
archives but more from life sciences; but still
require good knowledge of metadata /
archiving practice
34. Questions?
• Type your questions in the chat box.
• Rob Grim (moderator) will select and pose
questions to the speakers
• Unanswered questions will be addressed
by Alastair in a blog post (to be published
following the webinar)
35. WEBINAR: Research Data Services
Thank You!
Final Notes:
1. Blog post with more information
https://openworking.wordpress.com/2017/02/10/fair-
principles-connecting-the-dots-for-the-idcc-2017
We’ll email a link to the recording shortly.
Notas del editor
And The ability to feedback to EC and reviewers on how to approach evaluating ‘FAIR’ data
For the IDCC practice paper:
- H2020 / EC demands on open data
- Own aspiration to offer the best possible service and support for 4TU
- The ability to feedback to EC and reviewers on how to approach evaluating ‘FAIR’ data
Overall:
- providing insight and support for repositories to improve their information architecture and digital infrastructure to comply to H2020 and FAIR demands … (if they are not dependent on the funding benefits then pressure interoperability and community acknowledgment (in form of re-use).
- exploring options for setting where the archival version is stored in trusted and fit for LTS purposes repositories, but user-interface for researchers is kept available online (e.g TRAILS co-op with DANS).
- working with DANS and DTL towards a solution for NL
- mainly: improving the interoperability and re-use value of 4TU data by improving necessary metadata sections (and perhaps leading by best practice)
*** DO NOT PUBLISH SLIDES WITH THIS SLIDE FULL OF SCREENSHOT LOGOS ***
Sources:
DANS-EASY
https://easy.dans.knaw.nl/ui/home
EUDAT-B2Share
https://b2share.eudat.eu/
Zenodo
https://zenodo.org
PseudoBase
http://www.ekevanbatenburg.nl/PKBASE/PKB.HTML
OpenML
http://www.openml.org/
Profiles-Registry
http://www.profilesregistry.nl/
Mendeley-Data
https://data.mendeley.com/
4TU.Centre for Research Data
http://data.4tu.nl/
CancerData.org
https://www.cancerdata.org
DHS Data Access
http://www.dhsdata.nl
WorldClim
http://worldclim.org/
World Data Centre for Soil
http://www.isric.org/
Infrared Space Observatory
http://www.cosmos.esa.int/web/iso/access-the-archive
Longitudinal Aging Study Amsterdam
http://www.lasa-vu.nl/index.htm
Southeast Asian Climate Assessment
& Dataset
http://saca-bmkg.knmi.nl/
TRAILS
https://www.trails.nl/
ICOS Carbon Portal
https://www.icos-cp.eu/node/1
CESSDA
http://cessda.net/
SeaDataNet
http://www.seadatanet.org/
LISS
https://www.lissdata.nl/lissdata/
ORGIDS / RodRep
http://www.orgids.com/ / http://www.rodrep.com/
eartH20bserve
http://www.earth2observe.eu/
EDGAR
http://edgar.jrc.ec.europa.eu/
KNMI
https://data.knmi.nl/datasets
STITCH
http://stitch.embl.de/
ECA&D
http://www.ecad.eu/
Europeana
http://www.europeana.eu/portal/en
MycoBank
http://www.mycobank.org/
AlgaeBase
http://www.algaebase.org/
Amsterdam Cohort Studies
https://www.amsterdamcohortstudies.org/acsc/index.asp
ICTWSS
http://uva-aias.net/en/ictwss
Share ERIC
http://www.share-project.org/
LOVD3
http://databases.lovd.nl/whole_genome/genes
CARIBIC
http://www.caribic-atmospheric.com/
EIDA
http://www.orfeus-eu.org/data/eida/
Sound and Vision
http://www.beeldengeluid.nl/en
Figshare
https://figshare.com/
Accessing the metadata-records of closed, restricted, or unavailable data-sets improves the overall visibility of the stored research data.
Not many subjects have official and standardized ontologies
The minority of repositories provide multifaceted attributes in their metadata to support reusability.
- PID
- Open License / Usage License
- Metadata record (standardized and coherent)
- in general standardized information on every level: repository - data record - data (in best case using universal schemas)
- PID
- Open License / Usage License
- Metadata record (standardized and coherent)
- in general standardized information on every level: repository - data record - data (in best case using universal schemas)