The international FAIR movement: thinkers, doers and dreamers
1. The international FAIR movement
Behind the brand:
thinkers, doers and dreamers
Susanna-Assunta Sansone
ORCiD: 0000-0001-5306-5690 | Twitter: @SusannaASansone
Oxford Open Access Week, 10 March 2020
Slides: https://www.slideshare.net/SusannaSansone
datareadiness.eng.ox.ac.uk
Associate Professor, Engineering Science
Associate Director, Oxford e-Research Centre
2. A set of principles to enhance the
value of all digital resources
2014
2016
Developed and endorsed by
researchers, service
providers, publishers, funding
agencies, industry partners
4. Findable
Accessible
Interoperable
Reusable
• Globally unique, resolvable, and persistent identifiers
▪ To retrieve and connect data
• Community defined descriptive metadata
▪ To enhance discoverability
• Common terminologies
▪ To use the same term mean the same thing
• Detailed provenance
▪ To contextualize the data and facilitate reproducibility
• Terms of access
▪ Open as possible, closed as necessary
• Terms of use
▪ Clear licences, ideally to enable innovation and reuse
Data for humans and for machines
5. Everybody needs data that are
• Discoverable by humans and machines
• Retrievable and structured in standard format(s)
• Self-described so that third parties can make sense of it
Better data = better science and more efficiently
Datasets SOPs Figures, Photos Workflows Slides Codes Tools DatabasesAlgorithmsDocument
10. The scholarly publishing
ecosystem is changing
Data-relates mandates by funders
and institutions are growing
Researchers must be responsibile
and accountable for their research, but
they also need recognition and credit
theconversation.com/how-robots-can-help-us-embrace-a-more-human-view-of-disability-76815
Human-machine collaboration is the
future, and AI-ready data is essential
Responding to needs and crisis
o 21% pharmacology data (doi.org/10.1038/nrd3439-c1)
o 11% cancer data (doi.org/10.1038/483531a)
o unsatisfactory in ML (openreview.net/pdf?id=By4l2PbQ-)
towardsdatascience.com/scientific-data-analysis-pipelines-and-reproducibility-75ff9df5b4c5
Reproducibility of published studies is still problematic
11. We research and develop methods and tools to
improve data reuse;
we work for data transparency, research integrity
and the evolution of scholarly publishing
Recognition and credit for ‘building, training and
driving changes’
12. www.gov.uk/government/publications/open-research-data-task-force-final-report
Prof. Pam Thomas,
Open Research Data Task Force Chair
https://twitter.com/SusannaASansone/status/1227885895426154497/photo/1
• Dialogue, engagement and commitment among
stakeholders
• Infrastructure and services as integral part of the
research data cycle
• Cooperation with activities outside UK, also with
industries and business including SMEs
• Keep under continual review
2018
14. Depends upon several stakeholders in the research ecosystem
actively playing their parts to:
• deliver research infrastructures, tools and standards,
policies, education and training
• address technical, social and cultural challenges
It is not simple and it requires long term investment
Making FAIR a reality
16. “Most metadata field names and their values
are not standardized or controlled”
“Even simple binary or numeric fields are
often populated with inadequate values of
different data types”
Better metadata for better data
https://doi.org/10.1038/sdata.2019.21
17. The importance of knowledge representation and
interoperability
The ability of computer systems or software to understand information
with sufficient accuracy to utilise (e.g. represent and exchange)
that information for an intelligent purpose
Formats Terminologies Guidelines Identifiers
ID
Conceptual model, conceptual
schema, exchange formats Controlled vocabularies,
thesauri, ontologies
Minimum information reporting
requirements, checklists
Unambiguous, persistent and context-
independent identifier schema
metadata
20. The road to FAIRness
Before FAIR
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
21. The road to FAIRness
Before FAIR
After FAIR
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
22. The road to FAIRness
Before FAIR
After FAIR
….from chaos,
comes order?
23. The EOSC layered cake of FAIR
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
https://www.eoscsecretariat.eu
And a growing number of FAIR-related EOSC-funded projects in all disciplines; for example:
24. Growing number of reports, surveys and
recommendations
http://blogs.nature.com/scientificdata/2019/10/22/the-layered-cake
https://www.eoscsecretariat.eu/events/webinar-coming-together-map-national-landscape-analysis-eosc-between-5-relevant-projects
26. €3.3 billion
programme
2014 - 2020
€300 million
programme
2018 - 2020
European
intergovernmental
organisation
23 member
countries and
over 180 research
organisations
Since 2014
1
2
3 Started in 2019
FAIR-enabling EU and USA biomedical infrastructure
programmes and projects, e.g.
Since in 2014, several programs:
2014-2017
2017-2018
27. Organization and structure
• Hub and (national) Nodes
• Community-driven and rooted
• Strong focus on interoperability
• SMEs and Industry links
• Cross-nodes funded activities
28. model and related formats
Initiated in
2003
Open source tools and formats to help researchers to:
describe multi-modal experiments
follow community-developed standards
curate, analyze, release, share and publish
31. Funded by
Part of the
ISA-InterMine project
Reproducibility – FAIR at the first mile
From curated, structured metadata to data paper
datascriptor.org
32. Academics from several ELIXIR Nodes, with Janssen, AstraZeneca, Eli Lilly,
GSK, Novartis, Bayer, Boehringer Ingelheim
Define, document and implement a data FAIRification process:
33. Rocca-Serra and Sansone.
Experiment design driven FAIRification of
omics data matrices, an exemplar
Scientific Data 2019.
https://doi.org/10.1038/s41597-019-0286-0
Practical examples: data FAIRifications recipe
https://fairplus.github.io/the-fair-cookbook
36. Implementation of FAIR by a company
Requirements
• Financial investment for technical infrastructure, training & education
• Make use of existing FAIR tools & best practices
• Show business value through transformation to digital drug discovery
A three-pronged approach
1. Build a data catalogue for Findability of all data assets in the company
Tip: Prioritise for datasets which will quickly demonstrate benefits
2. Bring data together into a virtual, federated, infrastructure, so that data with the
right credentials become instantaneously Accessible for human and machine
Tip: Build on previous efforts to move to cloud services
3. Make selected datasets Interoperable and Reusable on the metadata as well at
the individual data point level
Tip: Most costly, so important to determine likely return on business value
39. doi: 10.1126/science.1180598
doi:10.1038/nbt1346
OBO Portal and Foundry Portal and Foundrydoi: 10.1038/nbt.1411
Over 10+ years of community engagement
69 authors: adopters, collaborators and users
Open Access CC-BY
2007 20082009
2011 2019
doi.org/10.1038/s41587-019-0080-8
41. Formats Terminologies Guidelines Identifiers
ID
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
informative and educational resource
Curated inter-linked
descriptions
42. Formats Terminologies Guidelines Identifiers
ID
informative and educational resource
Curated inter-linked
descriptions
All records are manually curated
in-house, verified and claimed by the
community behind each resource
Ready for use, implementation, or recommendation
In development
Status uncertain
Deprecated as subsumed or superseded
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
43. Formats Terminologies Guidelines Identifiers
ID
REPOSITORIES,
databases and
knowledgebases
DATA POLICIES
by journals, funders, and
other organizations
COMMUNITY STANDARDS
for metadata and identifiers
informative and educational resource
Curated inter-linked
descriptions
We guide consumers to discover, select and use these
resources with confidence
We help producers to make their resources more visible, more
widely adopted and cited
50. “The interactive browser will allow us to discover which databases and standards
are not currently included in our author guidelines, enabling us to regularly
monitor and refine our policies as appropriate, in support of our mission to help
our authors enhance the reproducibility of their work.”
H. Murray. Publishing Editor, F1000Research
51. Researchers in
academia, industry,
government
Developers and
curators of resources
Journal publishers or
organizations with data
policy
Research data
facilitators, librarians,
trainers
Learned societies,
unions and
associations
Funders and data
policy makers
A flagship output of and a WG in:
Recommended by funders, e.g.:
53. Analysed the data policies by
journals/publishers, and the standards and
repositories they recommend
Working with journal editors and publishers
Participating
publishers:
54. Discrepancy in recommendation across the data policies
• some repositories are named, but very few standards are
• cautious approach due to the wealth of existing resources
Recommendations are often driven by
• the editor’s familiarity with one or more standards, notably for
journals or publishers focusing on specific disciplines
• the engagement with learned societies and researchers
actively supporting and using certain resources
Ø Consensus: FAIRsharing plays a key role in helping
editors to discover and recommend appropriate resources
What have we learned and are doing now
Participating
publishers:
55. Working on a set of criteria that journals and
publishers believe are important for the
identification and selection of data
repositories, which can be recommended to
researchers when they are preparing to
publish the data underlying their findings
Participating
publishers:
Data Repository Selection: Criteria That Matter
Pre-print:
https://osf.io/m2bce
Collaboration:
56. Participating
publishers:
Data Repository Selection: Criteria That Matter
Pre-print:
https://osf.io/m2bce
Harmonize journals and publishers’ data deposition
guidelines by defining a common set of criteria for
repository selection
Criteria include:
• Access conditions
• Reuse condition
• Deposition conditions
• Unique ID schema
• User support
• Curation
• …….
Collaboration:
57. Increase the number and the clarity of journals and funders
data policies by classifying the recommendations these policies contain
to improve their definition and guidance to researchers
Collaboration:
Workplan – phase 1:
Curate and assess their compliance to the Transparency and Openness Promotion
(TOP) guidelines and display the level in FAIRsharing