FAIR principles and metrics for evaluation

FAIR principles
and metrics for evaluation
1
Michel Dumontier, Ph.D.
Distinguished Professor of Data Science
@micheldumontier::#DANSLOD:2017-05-01

Principles to enhance the value of all digital resources
and their metadata.
data, images, software, web services, repositories
http://www.nature.com/articles/sdata201618

Rapid Adoption of Principles
Developed and
endorsed by
researchers, publishers,
funding agencies,
industry partners.
As of May 2017,
100+ citations since
2016 publication
Included in G20
communique, EOSC,
H2020, NIH, and more…

Hypothesis
Improving the FAIRness of digital
resources will increase their reuse.

What is FAIRness?
FAIRness reflects the extent to which a digital
resource addresses the FAIR principles as per the
expectations defined by a community of
stakeholders.

How do we assess compliance to the
FAIR principles?
• Principles identify what needs to be there, but
they don’t tell what is necessary and/or
sufficient
• They also don’t tell you how to achieve FAIR
• Going beyond the principles requires some
thought about what constitutes FAIRness and
how do we measure it.

Fundamental Questions
• In what ways can we assess the FAIRness of a digital
resource?
• To what degree can we automate this assessment?
• Must we treat each type of digital resource differently?
• Who will use the metrics? The producers, the funders, or
the users?
• Can one resource be more FAIR than another?
• Will/should FAIRness assessments impact funding
decisions?
• Should only one organization define these metrics? Or can
anybody make their own metrics? What happens if a
digital resources scores well against one set of metrics, but
not another?

Horizon 2020: Data Management Plan
Section 2. FAIR data
1. Making data findable, including provisions for
metadata (5 questions)
2. Making data openly accessible (10 questions)
3. Making data interoperable (4 questions)
4. Increase data re-use (through clarifying
licenses - 4 questions)
Additional sections:
1. Data summary (6 questions, 5 of which also
cover aspects of FAIRness)
2. Allocation of resources (4 questions)
3. Data security (2 questions)
4. Ethical aspects (2 questions)
5. Other issues (2 questions)
Total of 23 + 16 = 39 questions!!
https://goo.gl/Strjua

FAIRness of repositories
• IDCC17 Practice Paper “Are the FAIR Data
Principles fair?” by Alastair Dunning,
Madelein de Smael, Jasmin Böhmer
• web-interfaces, help-pages and metadata-
records of over 40 data repositories were
examined to score the individual data
repository against the FAIR principles
• ~2 months of work
Data: http://dx.doi.org/10.4121/uuid:5146dd06-98e4-426c-9ae5-dc8fa65c549f
Paper: https://zenodo.org/record/321423#.WNFNrTvytm8

37 repositories

Scoring the resources

Overall evaluation

Summary of Study
• Impressive first attempt at a assessment of
FAIRness across repositories
• Issues
– Lack of fully described mechanism by which
repository owners can provide the necessary
information.
– Fully manual effort, but AFAIK inter-annotator
agreement not established.
– Not easy to scale, can we automate it?

Measures for Digital Repositories
• Data Seal of Approval
– 6 core requirements
– 16 criteria
• DIN31644: Information and documentation -
Criteria for trustworthy digital archives
– 10 core requirements
– 34 criteria
• ISO16363: : Audit and certification of trustworthy
digital repositories
– 100+ criteria

DSA
The data can be found on the Internet
The data are accessible (clear rights
and licences)
The data are in a usable format
The data are reliable
The data are identified in a unique and
persistent way so that they can be
referred to

DSA 16 requirements
1. mission to provide access to and preserve data
2. licenses covering data access and use and monitors compliance.
3. continuity plan
4. ensures that data created/used in compliance with norms.
5. adequate funding and qualified staff through clear governance
6. mechanism(s) for expert guidance and feedback
7. guarantees the integrity and authenticity of the data
8. accepts data and metadata to ensure relevance and understandability
9. applies documented processes in archival
10. responsibility for preservation that is documented.
11. expertise to address data and metadata quality
12. archiving according to defined workflows.
13. enables discovery and citation.
14. enables reuse with appropriate metadata.
15. infrastructure
16. infrastructure
https://www.datasealofapproval.org

Data Seal of Approval
• self-assessment in the DSA online tool. The
online tool takes you through the
16 requirements and provides you with
support.
• Once you have completed your self-
assessment you can submit it for peer review.

• Score data on each FAIR dimension (e.g. from
1 to 5)
• Total score of FAIRness as an indicator of data
quality
• Scoring can only be partly automatic, not all
principles can be established objectively:
– scoring at ingest by data archivists of TDR
– after reuse by data users (community review)
Peter Doorn: https://dans.knaw.nl/nl/actueel/PresentationP.D..pdf

DANS FAIR metrics proposal

http://www.w3.org/TR/hcls-dataset/

http://hw-swel.github.io/Validata/
VALIDATA DEMO
RDF constraint validation tool
Configurable to any profile
Declarative reusable schema description
Shape Expression (ShEx) constraints
Open source javascript implementation

NIH Commons Framework Working Group on
FAIR Metrics
Aim: To identify and prototype methods to
assess the FAIRness of a digital resource.
– Identify and include initial stakeholders
– Develop and discuss potential metrics
– Explore ways in which to report and assess
metrics.

What is a metric?
• A metric is a standard of measurement.
• It must provide clear definition of what is being
measured, why one wants to measure it.
• It must describe the process by which you
obtain a valid measurement result, so that it
can be reproduced by others. It needs to
specify what a valid result is.

Example of a FAIRness Metric
F1 (meta)data are assigned a globally unique and persistent
identifier
Aspect: Identifier Persistence
Rationale: An identifier can be used to find, access, and reuse a
resource. As such, it must be available to users in the longest term
possible otherwise we will not be able to perform those functions with
the identifier in hand.
Relevant FAIR Principles: F,A,I,R
Metric: Availability of data management plan, which includes a section
dealing with continuity and contingencies related to the persistence of
identifiers. The value of the metric is true or false.
Procedure: Check and verify the URL in the resource metadata points to
a data management plan with continuity section. Document should
follow a community standard, or recommend a basic structure.

Current Thinking:
FAIRness Index
• A FAIRness Index is a collection of metrics that
are aligned to the FAIR principles and can be
consistently and transparently evaluated.
• A community, comprised of clearly defined
stakeholders (researchers, publishers, users,
etc), may define their own FAIRness Index
that expresses what makes a digital resource
ideally or maximally FAIR.

Stakeholders
People worried about
– Findability
– Accessibility
– Interoperability
– Reuse
– Provenance
– Licensing
– Citation
– Value
People who are
- Potential users
- Resource creators
- Academics
- Publishers
- Industry
- The public
- Funding agencies

Ways can we gather information to
assess FAIRness
A) Self assessment
B) Self-appointed FAIR Assessment Team
C) Automated assessment
D) Crowdsourcing
E) All of the above

• Is there structured metadata describing the resource?
– Check for embedded metadata as microdata or linked data
– Check for hyperlinked documents with standardized formats: HCLS dataset
description/DCAT schema.org annotations, etc
• Are entries identified with a persistent identifier?
– Is there a DOI with scholarly publications?
– Is there a permanent URL for each item (w/out query parameters)
– Is there a resource type specified, does it use a well known vocabulary such
as EDAM, identifiers.org, etc.
• Can the resource be found in a recognized repository?
– E.g. a database in Biosharing
– E.g. a tool in Elixir bio.tools
– E.g. gene expression data in GEO
• Can the resource be found with a web search engine?
– What rank does the resource appear at when using the identifier or title in a
web search?
Sample Findable Metrics

Sample FAIR Metrics
Accessible metrics
• Are the (meta)data accessible by permanent URL?
• Can you obtain the resource as a standardized language (e.g. HTML, XML, JSON, JSON-LD)?
• Are the data downloadable in bulk or in part with an application programming interface
(API)? Is the API documented using Swagger, smartAPI, or follow the Hydra protocol?
Interoperable metrics
• Are the (meta)data described with a community vocabulary?
• Are the data and metadata linked to other datasets, vocabularies and ontologies?
• Are the data and metadata expressed in universal languages (e.g. XML, JSON, JSON-LD,
RDF/XML)
Reusable metrics
• Is there a license specified? Is it a standardized license? Is it linked to in the resource
metadata?
• Is it clear how the work should be cited? See the FORCE11 Data Citation Implementation
Pilot and bioCADDIE Working Group 5.
• Is there any indication of reuse beyond its original context and original creators?
• Is there any indication of access through published statistics?

michel.dumontier@maastrichtuniversity.nl
Website: http://maastrichtuniversity.nl/ids
Presentations: http://slideshare.com/micheldumontier
37
Early stages of thinking about assessing the FAIRness of
digital resources. Your input can help shape this emerging
phenomenon.
Questions:
1. Does it make sense to, and what are the implications of
assessing the FAIRness of digital resources?
2. What are the barriers to realizing the FAIR vision?
METRICS

FAIR principles and metrics for evaluation

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to FAIR principles and metrics for evaluation

Similar to FAIR principles and metrics for evaluation (20)

More from Michel Dumontier

More from Michel Dumontier (20)

Recently uploaded

Recently uploaded (20)

FAIR principles and metrics for evaluation

Editor's Notes