SlideShare una empresa de Scribd logo
1 de 51
The swings and roundabouts
of a decade of fun and games
with Research Objects
Carole Goble
The University of Manchester
Researchobject.org
carole.goble@manchester.ac.uk
DaMaLOS workshop 02 Nov 2020 at ISWC 2020
Special
Acknowledgement
Stian Soiland-Reyes
The University of Manchester, UK
Our RO start up – what, why and how…
ROs In the Large –TheVision
A new form of Scholarly Communication.
RDM support throughout the research cycle.
ROs in the Small –The Implementation
Packaging digital components.
Referencing physical components.
OurWorld of FAIRThematic Research Infrastructures
(aka Cyberinfrastructure) – Biology, Biodiversity
“facilities that provide resources and services for
research communities to conduct research and
foster innovation….they may be single-sited,
distributed, or virtual.
• major scientific equipment or sets of instruments
• collections, archives or scientific data
• computing systems and communication networks
• any other research and innovation infrastructure
of a unique nature which is open to external
users”
European Commission
Research
Software
Engineers
https://ec.europa.eu/info/research-and-innovation/strategy/european-research-infrastructures_en
FAIR Data Commons
Diverse Research Objects – models, data, pipelines, lab
protocols and SOPs, provenance... citable, exchangeable,
publishable, preserved, executable objects and
collections of objects.
Assemble and share large scale, multi-
element datasets. Secure referencing
and moving of sensitive data. Zoo of
catalogues & resources.
Across 13 Research Infrastructures.
Reproduce, port, share, and execute analytics & pipelines
NIH Data
Commons
FAIR Digital Objects
PIDs + Metadata
Sounds like Linked
Data!
The FAIR Guiding Principles for
Data Stewardship and
Management
Scientific Data 3, 160018 (2016)
doi:10.1038/sdata.2016.18
Motivation
in 2007.
Still is.
The different objects that are the research
…..documentation
Structured, interrelated objects in context
….documentation
The narrative paper The narrative paper
From Manuscripts to Research Objects
“An article about computational science in a scientific publication is not the scholarship itself, it is
merely advertising of the scholarship.The actual scholarship is the complete software
development environment, [the complete data] and the complete set of instructions which
generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
research outcomes more than just
publications
data software, models, workflows, SOPs,
lab protocols are first class citizens of
scholarship
added information required to make
research FAIR and Reproducible (FAIR+R) …
Do Research
Project Research
Infrastructure Services
Assemble
Methods, Materials
Analyse
Results
Quality
Assessment
Track and Credit
Disseminate
Deposit &
Licence
Public Marketplace
Services
Publish
Share
Results
Any
research
product
Selected
products
Manage
Results
ROs all the way along,
not just the end
Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015
Ainsworth and Buchan: e-Labs and Work Objects: Towards Digital Health Economies, EuropeComm 2009, pp 205-216
Experiment
ObserveSimulate
Digital Objects
electric butterflies
Digital twins
Actionable knowledge units
FAIR Digital Objects
courtesy Dimitris Koureas
Coordinator DiSSCo EU Research
Infrastructure
Specimen object image
courtesy of Alex Hardisty
Digital Objects as First Class Entities
FAIR Digital Object Framework
A KnowledgeGraph of FDOs
• Schwardmann (2020), DigitalObjects – FAIR DigitalObjects:Which ServicesAre Required? Data Science Journal
• EOSC Interoperability Framework Draft (2020)
• HardistyA, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280.
• DONA DigitalObjectArchitecture DigitalObject Interface Protocol (2018)
• https://fairdigitalobjectframework.org/
From Manuscripts to FAIR+R Research Objects
research objects related and
bundled together …
one shareable, cite-able,
exchangable resource that can be
versioned and snapshot …
metadata describing context
and content of objects
dependencies, versions,
relationships, provenance …
enough to be reproducible
virtual objects , links to physical objects (people, specimens, equipment)
integrated view over fragmented & scattered specialised repositories
internal content and references
Bigger on the inside
than the outside Packaging
internal content and references
From Manuscripts to FAIR+R Research Objects
Commons
Currency
Credit
Archival preservation
Reproducibility
Portability
Virtual Witnessing
Releasing
Living Objects
Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1
RDM role FAIR ROs
Analogous to software
FAIR Enough
Structure: Composite
Dynamic: Versioning
Executable: Portability
Virtual: References
Maintenance: Decay
PID resolution?
Metadata?
Access?
Licences?
Packaging of Digital Objects
Driver: ComputationalWorkflows
http://wf4ever.org/
Preservation of computational workflows in data-
intensive science
• Workflow-centric Research Objects
• Computational workflows, provenance of executions,
interconnections between workflows and related
resources (e.g., datasets, publications, etc.), social
aspects in the experiments.
• Wf-centric RO creation & management best practices
• analysis and management of decay in workflows.
Methods
(..)
De novo assembly and binning
Raw reads from each run were first assembled with SPAdes v.3.10.020 with option --meta21.
Thereafter, MetaBAT 215 (v.2.12.1) was used to bin the assemblies using a minimum contig
length threshold of 2,000 bp (option --minContig 2000) and default parameters. Depth of
coverage required for the binning was inferred by mapping the raw reads back to their
assemblies with BWA-MEM v.0.7.1645 and then calculating the corresponding read depths
of each individual contig with samtools v.1.546 (‘samtools view -Sbu’ followed by ‘samtools
sort’) together with the jgi_summarize_bam_contig_depths function from MetaBAT 2. The
QS of each metagenome-assembled genome (MAG) was estimated with CheckM v.1.0.722
using the lineage_wf workflow and calculated as: level of completeness − 5 ×
contamination. Ribosomal RNAs (rRNAs) were detected with the cmsearch function from
INFERNAL v.1.1.247 (options -Z 1000 --hmmonly --cut_ga) using the Rfam48 covariance
models of the bacterial 5S, 16S and 23S rRNAs. Total alignment length was inferred by the
sum of all non-overlapping hits. Each gene was considered present if more than 80% of the
expected sequence length was contained in the MAG. Transfer RNAs (tRNAs) were identified
with tRNAscan-s.e. v.2.049 using the bacterial tRNA model (option -B) and default
parameters. Classification into high- and medium-quality MAGs was based on the criteria
defined by the minimum information about a metagenome-assembled genome (MIMAG)
standards23 (high: >90% completeness and <5% contamination, presence of 5S, 16S and 23S
rRNA genes, and at least 18 tRNAs; medium: ≥ 50% completeness and <10% contamination).
(...)
(..)
Assignment of MAGs to reference databases
Four reference databases were used to classify the set of MAGs recovered from the human
gut assemblies: HR, RefSeq, GenBank and a collection of MAGs from public datasets. HR
comprised a total of 2,468 high-quality genomes (>90% completeness, <5% contamination)
retrieved from both the HMP catalogue (https://www.hmpdacc.org/catalog/) and the
HGG8. From the RefSeq database, we used all the complete bacterial genomes available (n
= 8,778) as of January 2018. In the case of GenBank, a total of 153,359 bacterial and 4,053
eukaryotic genomes (3,456 fungal and 597 protozoan genomes) deposited as of August
2018 were considered. Lastly, we surveyed 18,227 MAGs from the largest datasets publicly
available as of August 201813,16,17,18,19, including those deposited in the Integrated Microbial
Genomes and Microbiomes (IMG/M) database52. For each database, the function ‘mash
sketch’ from Mash v.2.053 was used to convert the reference genomes into a MinHash
sketch with default k-mer and sketch sizes. Then, the Mash distance between each MAG
and the set of references was calculated with ‘mash dist’ to find the best match (that is, the
reference genome with the lowest Mash distance). Subsequently, each MAG and its closest
relative were aligned with dnadiff v.1.3 from MUMmer 3.2354 to compare each pair of
genomes with regard to the fraction of the MAG aligned (aligned query, AQ) and ANI.
(..)
https://doi.org/10.1038/s41586-019-0965-1
Data pipeline & analysis
reporting & reproducibility
Data pipeline & analysis
reporting & reproducibility
https://doi.org/10.1038/s41586-019-0965-1
Lots of scripts &
datasets
EBI’s MGnify
metagenomics workflows as
workflows using aWfMS
Workflow description
Input data files
Command line tools,
containerised tools,
workflows
Output data files
Standards-based metadata framework for bundling resources (physically
and logically) with context into citable reproducible packages.
researchobject.org Linked Data!!
Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004
Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
A Research Object bundles and relates digital resources
of a scientific experiment/investigation + context
Data used and results produced in experimental study
Methods employed to produce and analyse that data
Provenance and settings for the experiments
People, specimens, equipment etc involved in the
investigation
Annotations about these resources, to improve
understanding and interpretation
Package
Manifest
Construction
Manifest
Identification
to locate things
Aggregates
to link things together
Annotations
about things & their
relationships
Package
Research Objects => Metadata Objects
Manifest
Description
Type Checklists
what should be there
Provenance
where it came from
Versioning
its evolution
Dependencies
what else is needed
Manifest
Base Profile,
Profiles for
RO types
We are in a SemanticWeb Conference….
Linked Data Middleware
• Manifests described using Linked Data
– Identifiers to resources, including people (orcid)
– OWL / RDF / SPARQL / JSON-LD
• Mismash of specialized ontologies
– Construct the manifest itself
• W3CWeb AnnotationVocabulary
• OAI Object Exchange and Reuse
– Describe manifest content
• Wf4Ever RO ontology,Wf4Ever ROEvo …
• Dublin Core, FOAF, SIOC, Creative Commons, PROV, PAV…
• RDF shapes (SHACL, ShEx)
– Capture requirements, expectations and validate profiles
– Hard to express checklists
https://view.commonwl.org/workflows/github.com/mnneveau/cancer-genomics-
Manifest
CWL
Annotations
Evidence!
Influence? Publishers…
Howard Ratner,
Chair STM Future Labs Committee, CEO EVP Nature Publishing Group
Director of Development for CHORUS
European Open Science Cloud
Interoperability Framework
EOSC Interoperability Framework (v1.0)
May 2020, Draft for community consultation
Chair:Oscar Corcho, UPM
2021 we start to combine
Used? Yes ….……
NIH DataCommons
transferring and archiving
very large HTS datasets
in a location-
independent way
keep the context of data
content together when
its scattered. Scalability
https://doi.org/10.1109/BigData.2016.7840618
A framework for
standardizing and
sharing computations
and analyses generated
from High-throughput
genome sequencing.
Standardized as IEEE
2791-2020
https://doi.org/10.1371/journal.pbio.3000099
Virtualized collaborative
working environment for
Earth Science researchers
to share resources (data,
workflows), ideas,
knowledge, and results.
NIH DataSTAGE RO
Composer to
exchange between
Seven Bridges
Platform genomics
platform and the
Mendeley Data
repository
https://doi.org/10.1016/j.future.2019.03.046
Used? Yes ….……
Exchange
Reproducibility
Archival
Active Objects
Activation &
Research
Championing
Adoption
Phase 1
2010 -
2015
Phase 2
2015 –
2018
Phase 3
2017 -
time to reflect….
Machine-processable
Standards
Low tech
Graceful degradation
Commodity tooling
Incremental
Multi-platform
Technology Independent
Keep it
simple
E X A M P L E S
Developer
friendliness
Desiderata & Norms
Balance and prioritise
• "just enough complexity" or “just
enough standards” so…
• sufficient extra benefits from what
already exists (Linked Data,
vocabularies, tooling, validation,
transformation
• without compromising the developer
entry-level experience so much that
they rather do their own thing.
Research ObjectTensions
Research Infrastructures sit in the middle
Academic
Viewpoint
Infrastructure
Viewpoint
Green field site
Theoretical purity
Use latest thing
Proof of concept
Sophistication
Narrow developer audience
Strive for super generic
The end
Exposing the tech
Pre-existing platforms
Practicality
Use things that work
Production
Simplicity
Wide developer audience
Several specific is ok!
The means
Hiding the tech
Ladder Model of OSS Adoption
(adapted from Carbone P., Value Derived from Open
Source is a Function of Maturity Levels)
[FLOSS@Sycracuse]
"it's better, initially, to
make a small number of
users really love you than
a large number kind of
like you"
Paul Buchheit
paulbuchheit.blogspot.com
Not really mortal developer friendly
• “Easy to make, hard to use…”
• Daunting Linked Data tech stack
• Being too clever
– Infer what is in the object and what
kind of object it is
– Massive reuse of ontologies
• Make developers (and researchers)
lives easier not more demanding….
Developer friendliness matters
Reinvent with fewer features
Easy to understand and simple
conceptually…
… with strong opinionated
guide to current best practices
… using software stacks
widely used on theWeb
Indeed.
Linked Data is not enough.
Research Infrastructures:
“digital technologies (hardware,
software), resources (data, services,
digital libraries, standards), comms
(protocols, access rights, networks),
people and organisational
structures”
CommunityUse Driver Tools
Developer friendly – so possible
Incentives – so rewarding
Adoption path – so acceptable
Metadata capture
Early benefit
Exchange, reproducibility, executable objects
Portability between platforms, Archiving
Platform & user buy-in & consensus
Passionate, dedicated leadership
Active engaged community, seed Support
Linked Data and a Spec is not enough
and sometimes too much
GuidesReference
examples
Research Object
Reboot
CommunitySponsored
RO2018, Amsterdam
e-Science Conference
BDBags Keynote
Swing Back to Basics
Peter Sefton
BagIT data profile
+ schema.org
+ JSON-LD annotations
Semantic Web world vs RealWorld
DataCrate
from the Open Repository community
"As a researcher…I’m a bit b****y fed up
with Data Management", Cameron Neylon
Be HumbleArchivist and library people know the
importance of metadata and standards…
… and for things to work 5, 10, 20 years later.
End-users need to have their own way to
“bypass the system”…
…. their field, repositories, institutions , journals
etc. will always be lagging behind the curve
Most who want to make their data is FAIR …
… do not have the resources or knowledge to
start championing all of this to all levels & need
tools and ramps.
http://www.lisbdnet.com/
https://ischools.org/
A RO-Crate Community!
https://www.researchobject.org/ro-crate/#contribute
https://github.com/researchobject/ro-crate/issues/1
A Merger
• A diverse set of people
• A variety of stakeholders
• A set of collective norms
• A open platform that facilitates
communication (GitHub, Google Docs,
monthly telcons)
RO-Crate
https://w3id.org/ro/crate
RO-Crate is a community effort to establish a lightweight approach to packaging research data
with their metadata.
It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal
metadata description accessible and practical for use in a wider variety of situations, from an
individual researcher working with a folder of data, to large data-intensive computational
research environments.
RO-Crate is the marriage of Research Objects with DataCrate. It aims to build on their respective
strengths, but also to draw on lessons learned from those projects and similar research data
packaging efforts. For more details, see RO-Crate background.
The RO-Crate specification details how to capture a set of files and resources as a dataset with
associated metadata – including contextual entities like people, organizations, publishers, funding,
licensing, provenance, workflows, geographical places, subjects and repositories.
A growing list of RO-Crate tools and libraries simplify creation and consumption of RO-Crates,
including the graphical interface Describo.
The RO-Crate community help shape the specification or get help with using it!
Reinvent as Lightweight Underware
Linked data but Developer Friendly
Easy to understand and simple
conceptually…
… with strong opinionated guide to
current best practices
… using software stacks widely used
on theWeb BagIT data profile , schema.org, JSON-LD, JSONSchema
Flattened compacted JSON-LD, no need for RDF libraries
Example driven rather than strict specification
Implementers add additional metadata using schema.org
types and properties
Data entities are files/directories or web resources
Boundness of elements is explicit
Single graph, data structure depth 1
Swung a bit far….
and swung back…
Tooling!
How can I use it?
While we’re mostly focusing on the specification, some tools already exist for working with
RO-Crates:
● Describo interactive desktop application to create, update and export RO-Crates for
different profiles. (~ beta)
● CalcyteJS is a command-line tool to help create RO-Crates and HTML-readable
rendering (~ beta)
● ro-crate - JavaScript/NodeJS library for RO-Crate rendering as HTML. (~ beta)
● ro-crate-js - utility to render HTML from RO-Crate (~ alpha)
● ro-crate-ruby Ruby library to consume/produce RO-Crates (~ alpha)
● ro-crate-py Python library to consume/produce RO-Crates (~ planning)
These applications use or expose RO-Crates:
● Workflow Hub imports and exports Workflow RO-Crates
● OCFL-indexer NodeJS application that walks the Oxford Common File Layout on the
file system, validate RO-Crate Metadata Files and parse into objects registered in
Elasticsearch. (~ alpha)
● ONI indexer
● ocfl-tools
● ocfl-viewer
● Research Object Composer is a REST API for gradually building and depositing
Research Objects according to a pre-defined profile. (RO-Crate support alpha)
● … (yours?)
https://uts-eresearch.github.io/describo/
• RDF and schema.org
but you don't need to
know.
• Extend RO-Crate ….
– Add your own schema.org
types and properties.
– Add in your own ontologies
…and it still works!
https://arkisto-platform.github.io/case-studies/
Under-ware
Driver! Profile for workflows
https://about.workflowhub.eu/Workflow-RO-Crate/
Infrastructure families
Driver! Profile for workflows
On-boarding developers
Web and dev friendly
Workflow-RO-Crate profiles
RO in practice External references – logically & physically contained –
versions, snapshots, multi-typed, active, multi-
stewarded, multi-authored, governance…
More than plain JSON, Just Enough Linked Data
Retain benefits of Linked Data in the toolbox
• querying, graph stores, vocabularies, clickable URI
as identifiers)
• customization and conventions
Plus all the other stuff a developer expects
• documentation, examples, libraries, tools
• simplifications rather than generalizations (less
flexibility frees up developers)
• “Just enough standards” cf. schema.org
Linked Data “exotics” there for when the time is right
if needed by the right people.
Keep your eye on the target…..
How do we make RO’s normative?
– Propaganda and incentive models to scientists,
target the Research Infrastructures to deliver.
– Digital library community allies!
Developer friendliness matters
– Underware, incremental, ramps, embed, metadata
automation, persuasive design
Linked Data has a role
– As a means but it is not an end.
– Simpler version of Linked Data makes an adoption
path (cf. Knowledge Graphs, schema.org, JSON-
LD)
FAIR principles for Research Objects….
– Unifying the vision with the practical
Acknowledgements
Barend Mons
Sean Bechhofer
Matthew Gamble
Raul Palma
Jun Zhao
Mark Robinson
AlanWilliams
Norman Morrison
Stian Soiland-Reyes
Tim Clark
Alejandra Gonzalez-Beltran
Philippe Rocca-Serra
Ian Cottam
Susanna Sansone
Kristian Garza
Daniel Garijo
Catarina Martins
Iain Buchan
Michael Crusoe
Rob Finn
StuartOwen
Finn Bacall
Bert Droebeke
Laura Rodríguez Navas
Ignacio Eguinoa
Carl Kesselman
Ian Foster
Kyle Chard
Vahan Simonyan
Ravi Madduri
Raja Mazumder
GilAlterovitz
Denis Dean II
DurgaAddepalli
Wouter Haak
Anita DeWaard
Paul Groth
Oscar Corcho
Peter Sefton
Eoghan Ó Carragáin
Frederik Coppens
Jasper Koehorst
Simone Leo
Nick Juty
LJ Garcia Castro
Karl Sebby
Alexander Kanitz
Ana Trisovic
http://researchobject.org
Gavin Kennedy
Mark Graves
José María Fernández Jose
Manuel Gomez-Perez
Jason A. Clark
Salvador Capella-Gutierrez
Alasdair J. G. Gray
Kristi Holmes
Giacomo Tartari
Hervé Ménager
Paul Walk
Brandon Whitehead
Erich Bremer
Mark Wilkinson
Jen Harrow
And many more....

Más contenido relacionado

Más de Carole Goble

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryCarole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational WorkflowsCarole Goble
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...Carole Goble
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows Carole Goble
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout Carole Goble
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsCarole Goble
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)Carole Goble
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpCarole Goble
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the FutureCarole Goble
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardCarole Goble
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsCarole Goble
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Carole Goble
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpCarole Goble
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerCarole Goble
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better ResearchCarole Goble
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsCarole Goble
 

Más de Carole Goble (20)

FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
EOSC-Life Workflow Collaboratory
EOSC-Life Workflow CollaboratoryEOSC-Life Workflow Collaboratory
EOSC-Life Workflow Collaboratory
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...FAIR Data Bridging from researcher data management to ELIXIR archives in the...
FAIR Data Bridging from researcher data management to ELIXIR archives in the...
 
FAIR Computational Workflows
FAIR Computational WorkflowsFAIR Computational Workflows
FAIR Computational Workflows
 
FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout FAIR Workflows and Research Objects get a Workout
FAIR Workflows and Research Objects get a Workout
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
RO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research ObjectsRO-Crate: A framework for packaging research products into FAIR Research Objects
RO-Crate: A framework for packaging research products into FAIR Research Objects
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
What is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can helpWhat is Reproducibility? The R* brouhaha and how Research Objects can help
What is Reproducibility? The R* brouhaha and how Research Objects can help
 
FAIR History and the Future
FAIR History and the FutureFAIR History and the Future
FAIR History and the Future
 
ELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR BoardELIXIR UK Node presentation to the ELIXIR Board
ELIXIR UK Node presentation to the ELIXIR Board
 
FAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research CommonsFAIRy stories: tales from building the FAIR Research Commons
FAIRy stories: tales from building the FAIR Research Commons
 
Let’s go on a FAIR safari!
Let’s go on a FAIR safari!Let’s go on a FAIR safari!
Let’s go on a FAIR safari!
 
Reproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects helpReproducible Research: how could Research Objects help
Reproducible Research: how could Research Objects help
 
Reflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic careerReflections on a (slightly unusual) multi-disciplinary academic career
Reflections on a (slightly unusual) multi-disciplinary academic career
 
Better Software, Better Research
Better Software, Better ResearchBetter Software, Better Research
Better Software, Better Research
 
Reproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trendsReproducibility (and the R*) of Science: motivations, challenges and trends
Reproducibility (and the R*) of Science: motivations, challenges and trends
 

Último

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedDelhi Call girls
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to VirusesAreesha Ahmad
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICEayushi9330
 

Último (20)

Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
Digital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptxDigital Dentistry.Digital Dentistryvv.pptx
Digital Dentistry.Digital Dentistryvv.pptx
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
Introduction to Viruses
Introduction to VirusesIntroduction to Viruses
Introduction to Viruses
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 

The swings and roundabouts of a decade of fun and games with Research Objects

  • 1. The swings and roundabouts of a decade of fun and games with Research Objects Carole Goble The University of Manchester Researchobject.org carole.goble@manchester.ac.uk DaMaLOS workshop 02 Nov 2020 at ISWC 2020
  • 3. Our RO start up – what, why and how… ROs In the Large –TheVision A new form of Scholarly Communication. RDM support throughout the research cycle. ROs in the Small –The Implementation Packaging digital components. Referencing physical components.
  • 4. OurWorld of FAIRThematic Research Infrastructures (aka Cyberinfrastructure) – Biology, Biodiversity “facilities that provide resources and services for research communities to conduct research and foster innovation….they may be single-sited, distributed, or virtual. • major scientific equipment or sets of instruments • collections, archives or scientific data • computing systems and communication networks • any other research and innovation infrastructure of a unique nature which is open to external users” European Commission Research Software Engineers https://ec.europa.eu/info/research-and-innovation/strategy/european-research-infrastructures_en
  • 5. FAIR Data Commons Diverse Research Objects – models, data, pipelines, lab protocols and SOPs, provenance... citable, exchangeable, publishable, preserved, executable objects and collections of objects. Assemble and share large scale, multi- element datasets. Secure referencing and moving of sensitive data. Zoo of catalogues & resources. Across 13 Research Infrastructures. Reproduce, port, share, and execute analytics & pipelines NIH Data Commons
  • 6. FAIR Digital Objects PIDs + Metadata Sounds like Linked Data! The FAIR Guiding Principles for Data Stewardship and Management Scientific Data 3, 160018 (2016) doi:10.1038/sdata.2016.18
  • 7. Motivation in 2007. Still is. The different objects that are the research …..documentation Structured, interrelated objects in context ….documentation The narrative paper The narrative paper
  • 8. From Manuscripts to Research Objects “An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship.The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995 research outcomes more than just publications data software, models, workflows, SOPs, lab protocols are first class citizens of scholarship added information required to make research FAIR and Reproducible (FAIR+R) …
  • 9. Do Research Project Research Infrastructure Services Assemble Methods, Materials Analyse Results Quality Assessment Track and Credit Disseminate Deposit & Licence Public Marketplace Services Publish Share Results Any research product Selected products Manage Results ROs all the way along, not just the end Science 2.0 Repositories: Time for a Change in Scholarly Communication Assante, Candela, Castelli, Manghi, Pagano, D-Lib 2015 Ainsworth and Buchan: e-Labs and Work Objects: Towards Digital Health Economies, EuropeComm 2009, pp 205-216 Experiment ObserveSimulate
  • 10. Digital Objects electric butterflies Digital twins Actionable knowledge units FAIR Digital Objects courtesy Dimitris Koureas Coordinator DiSSCo EU Research Infrastructure Specimen object image courtesy of Alex Hardisty
  • 11. Digital Objects as First Class Entities FAIR Digital Object Framework A KnowledgeGraph of FDOs • Schwardmann (2020), DigitalObjects – FAIR DigitalObjects:Which ServicesAre Required? Data Science Journal • EOSC Interoperability Framework Draft (2020) • HardistyA, et al (2020) Conceptual design blueprint for the DiSSCo digitization infrastructure RIO 6: e54280. • DONA DigitalObjectArchitecture DigitalObject Interface Protocol (2018) • https://fairdigitalobjectframework.org/
  • 12. From Manuscripts to FAIR+R Research Objects research objects related and bundled together … one shareable, cite-able, exchangable resource that can be versioned and snapshot … metadata describing context and content of objects dependencies, versions, relationships, provenance … enough to be reproducible virtual objects , links to physical objects (people, specimens, equipment) integrated view over fragmented & scattered specialised repositories internal content and references
  • 13. Bigger on the inside than the outside Packaging internal content and references From Manuscripts to FAIR+R Research Objects
  • 14. Commons Currency Credit Archival preservation Reproducibility Portability Virtual Witnessing Releasing Living Objects Goble, De Roure, Bechhofer, Accelerating KnowledgeTurns, DOI: 10.1007/978-3-642-37186-8_1 RDM role FAIR ROs Analogous to software FAIR Enough Structure: Composite Dynamic: Versioning Executable: Portability Virtual: References Maintenance: Decay PID resolution? Metadata? Access? Licences?
  • 15. Packaging of Digital Objects Driver: ComputationalWorkflows http://wf4ever.org/ Preservation of computational workflows in data- intensive science • Workflow-centric Research Objects • Computational workflows, provenance of executions, interconnections between workflows and related resources (e.g., datasets, publications, etc.), social aspects in the experiments. • Wf-centric RO creation & management best practices • analysis and management of decay in workflows.
  • 16. Methods (..) De novo assembly and binning Raw reads from each run were first assembled with SPAdes v.3.10.020 with option --meta21. Thereafter, MetaBAT 215 (v.2.12.1) was used to bin the assemblies using a minimum contig length threshold of 2,000 bp (option --minContig 2000) and default parameters. Depth of coverage required for the binning was inferred by mapping the raw reads back to their assemblies with BWA-MEM v.0.7.1645 and then calculating the corresponding read depths of each individual contig with samtools v.1.546 (‘samtools view -Sbu’ followed by ‘samtools sort’) together with the jgi_summarize_bam_contig_depths function from MetaBAT 2. The QS of each metagenome-assembled genome (MAG) was estimated with CheckM v.1.0.722 using the lineage_wf workflow and calculated as: level of completeness − 5 × contamination. Ribosomal RNAs (rRNAs) were detected with the cmsearch function from INFERNAL v.1.1.247 (options -Z 1000 --hmmonly --cut_ga) using the Rfam48 covariance models of the bacterial 5S, 16S and 23S rRNAs. Total alignment length was inferred by the sum of all non-overlapping hits. Each gene was considered present if more than 80% of the expected sequence length was contained in the MAG. Transfer RNAs (tRNAs) were identified with tRNAscan-s.e. v.2.049 using the bacterial tRNA model (option -B) and default parameters. Classification into high- and medium-quality MAGs was based on the criteria defined by the minimum information about a metagenome-assembled genome (MIMAG) standards23 (high: >90% completeness and <5% contamination, presence of 5S, 16S and 23S rRNA genes, and at least 18 tRNAs; medium: ≥ 50% completeness and <10% contamination). (...) (..) Assignment of MAGs to reference databases Four reference databases were used to classify the set of MAGs recovered from the human gut assemblies: HR, RefSeq, GenBank and a collection of MAGs from public datasets. HR comprised a total of 2,468 high-quality genomes (>90% completeness, <5% contamination) retrieved from both the HMP catalogue (https://www.hmpdacc.org/catalog/) and the HGG8. From the RefSeq database, we used all the complete bacterial genomes available (n = 8,778) as of January 2018. In the case of GenBank, a total of 153,359 bacterial and 4,053 eukaryotic genomes (3,456 fungal and 597 protozoan genomes) deposited as of August 2018 were considered. Lastly, we surveyed 18,227 MAGs from the largest datasets publicly available as of August 201813,16,17,18,19, including those deposited in the Integrated Microbial Genomes and Microbiomes (IMG/M) database52. For each database, the function ‘mash sketch’ from Mash v.2.053 was used to convert the reference genomes into a MinHash sketch with default k-mer and sketch sizes. Then, the Mash distance between each MAG and the set of references was calculated with ‘mash dist’ to find the best match (that is, the reference genome with the lowest Mash distance). Subsequently, each MAG and its closest relative were aligned with dnadiff v.1.3 from MUMmer 3.2354 to compare each pair of genomes with regard to the fraction of the MAG aligned (aligned query, AQ) and ANI. (..) https://doi.org/10.1038/s41586-019-0965-1 Data pipeline & analysis reporting & reproducibility
  • 17. Data pipeline & analysis reporting & reproducibility https://doi.org/10.1038/s41586-019-0965-1 Lots of scripts & datasets
  • 18. EBI’s MGnify metagenomics workflows as workflows using aWfMS Workflow description Input data files Command line tools, containerised tools, workflows Output data files
  • 19. Standards-based metadata framework for bundling resources (physically and logically) with context into citable reproducible packages. researchobject.org Linked Data!! Bechhofer et al (2013)Why linked data is not enough for scientists https://doi.org/10.1016/j.future.2011.08.004 Bechhofer et al (2010) Research Objects:Towards Exchange and Reuse of Digital Knowledge, https://eprints.soton.ac.uk/268555/
  • 20. A Research Object bundles and relates digital resources of a scientific experiment/investigation + context Data used and results produced in experimental study Methods employed to produce and analyse that data Provenance and settings for the experiments People, specimens, equipment etc involved in the investigation Annotations about these resources, to improve understanding and interpretation Package
  • 21. Manifest Construction Manifest Identification to locate things Aggregates to link things together Annotations about things & their relationships Package Research Objects => Metadata Objects Manifest Description Type Checklists what should be there Provenance where it came from Versioning its evolution Dependencies what else is needed Manifest Base Profile, Profiles for RO types
  • 22. We are in a SemanticWeb Conference…. Linked Data Middleware • Manifests described using Linked Data – Identifiers to resources, including people (orcid) – OWL / RDF / SPARQL / JSON-LD • Mismash of specialized ontologies – Construct the manifest itself • W3CWeb AnnotationVocabulary • OAI Object Exchange and Reuse – Describe manifest content • Wf4Ever RO ontology,Wf4Ever ROEvo … • Dublin Core, FOAF, SIOC, Creative Commons, PROV, PAV… • RDF shapes (SHACL, ShEx) – Capture requirements, expectations and validate profiles – Hard to express checklists
  • 24. Influence? Publishers… Howard Ratner, Chair STM Future Labs Committee, CEO EVP Nature Publishing Group Director of Development for CHORUS
  • 25. European Open Science Cloud Interoperability Framework EOSC Interoperability Framework (v1.0) May 2020, Draft for community consultation Chair:Oscar Corcho, UPM 2021 we start to combine
  • 26. Used? Yes ….…… NIH DataCommons transferring and archiving very large HTS datasets in a location- independent way keep the context of data content together when its scattered. Scalability https://doi.org/10.1109/BigData.2016.7840618 A framework for standardizing and sharing computations and analyses generated from High-throughput genome sequencing. Standardized as IEEE 2791-2020 https://doi.org/10.1371/journal.pbio.3000099 Virtualized collaborative working environment for Earth Science researchers to share resources (data, workflows), ideas, knowledge, and results. NIH DataSTAGE RO Composer to exchange between Seven Bridges Platform genomics platform and the Mendeley Data repository https://doi.org/10.1016/j.future.2019.03.046
  • 28. Activation & Research Championing Adoption Phase 1 2010 - 2015 Phase 2 2015 – 2018 Phase 3 2017 -
  • 30. Machine-processable Standards Low tech Graceful degradation Commodity tooling Incremental Multi-platform Technology Independent Keep it simple E X A M P L E S Developer friendliness Desiderata & Norms Balance and prioritise • "just enough complexity" or “just enough standards” so… • sufficient extra benefits from what already exists (Linked Data, vocabularies, tooling, validation, transformation • without compromising the developer entry-level experience so much that they rather do their own thing.
  • 31. Research ObjectTensions Research Infrastructures sit in the middle Academic Viewpoint Infrastructure Viewpoint Green field site Theoretical purity Use latest thing Proof of concept Sophistication Narrow developer audience Strive for super generic The end Exposing the tech Pre-existing platforms Practicality Use things that work Production Simplicity Wide developer audience Several specific is ok! The means Hiding the tech
  • 32. Ladder Model of OSS Adoption (adapted from Carbone P., Value Derived from Open Source is a Function of Maturity Levels) [FLOSS@Sycracuse] "it's better, initially, to make a small number of users really love you than a large number kind of like you" Paul Buchheit paulbuchheit.blogspot.com
  • 33. Not really mortal developer friendly • “Easy to make, hard to use…” • Daunting Linked Data tech stack • Being too clever – Infer what is in the object and what kind of object it is – Massive reuse of ontologies • Make developers (and researchers) lives easier not more demanding….
  • 34. Developer friendliness matters Reinvent with fewer features Easy to understand and simple conceptually… … with strong opinionated guide to current best practices … using software stacks widely used on theWeb
  • 35. Indeed. Linked Data is not enough. Research Infrastructures: “digital technologies (hardware, software), resources (data, services, digital libraries, standards), comms (protocols, access rights, networks), people and organisational structures”
  • 36. CommunityUse Driver Tools Developer friendly – so possible Incentives – so rewarding Adoption path – so acceptable Metadata capture Early benefit Exchange, reproducibility, executable objects Portability between platforms, Archiving Platform & user buy-in & consensus Passionate, dedicated leadership Active engaged community, seed Support Linked Data and a Spec is not enough and sometimes too much GuidesReference examples
  • 39. Swing Back to Basics Peter Sefton BagIT data profile + schema.org + JSON-LD annotations Semantic Web world vs RealWorld DataCrate from the Open Repository community "As a researcher…I’m a bit b****y fed up with Data Management", Cameron Neylon
  • 40. Be HumbleArchivist and library people know the importance of metadata and standards… … and for things to work 5, 10, 20 years later. End-users need to have their own way to “bypass the system”… …. their field, repositories, institutions , journals etc. will always be lagging behind the curve Most who want to make their data is FAIR … … do not have the resources or knowledge to start championing all of this to all levels & need tools and ramps. http://www.lisbdnet.com/ https://ischools.org/
  • 41. A RO-Crate Community! https://www.researchobject.org/ro-crate/#contribute https://github.com/researchobject/ro-crate/issues/1 A Merger • A diverse set of people • A variety of stakeholders • A set of collective norms • A open platform that facilitates communication (GitHub, Google Docs, monthly telcons)
  • 42. RO-Crate https://w3id.org/ro/crate RO-Crate is a community effort to establish a lightweight approach to packaging research data with their metadata. It is based on schema.org annotations in JSON-LD, and aims to make best-practice in formal metadata description accessible and practical for use in a wider variety of situations, from an individual researcher working with a folder of data, to large data-intensive computational research environments. RO-Crate is the marriage of Research Objects with DataCrate. It aims to build on their respective strengths, but also to draw on lessons learned from those projects and similar research data packaging efforts. For more details, see RO-Crate background. The RO-Crate specification details how to capture a set of files and resources as a dataset with associated metadata – including contextual entities like people, organizations, publishers, funding, licensing, provenance, workflows, geographical places, subjects and repositories. A growing list of RO-Crate tools and libraries simplify creation and consumption of RO-Crates, including the graphical interface Describo. The RO-Crate community help shape the specification or get help with using it!
  • 43.
  • 44. Reinvent as Lightweight Underware Linked data but Developer Friendly Easy to understand and simple conceptually… … with strong opinionated guide to current best practices … using software stacks widely used on theWeb BagIT data profile , schema.org, JSON-LD, JSONSchema Flattened compacted JSON-LD, no need for RDF libraries Example driven rather than strict specification Implementers add additional metadata using schema.org types and properties Data entities are files/directories or web resources Boundness of elements is explicit Single graph, data structure depth 1 Swung a bit far…. and swung back…
  • 45. Tooling! How can I use it? While we’re mostly focusing on the specification, some tools already exist for working with RO-Crates: ● Describo interactive desktop application to create, update and export RO-Crates for different profiles. (~ beta) ● CalcyteJS is a command-line tool to help create RO-Crates and HTML-readable rendering (~ beta) ● ro-crate - JavaScript/NodeJS library for RO-Crate rendering as HTML. (~ beta) ● ro-crate-js - utility to render HTML from RO-Crate (~ alpha) ● ro-crate-ruby Ruby library to consume/produce RO-Crates (~ alpha) ● ro-crate-py Python library to consume/produce RO-Crates (~ planning) These applications use or expose RO-Crates: ● Workflow Hub imports and exports Workflow RO-Crates ● OCFL-indexer NodeJS application that walks the Oxford Common File Layout on the file system, validate RO-Crate Metadata Files and parse into objects registered in Elasticsearch. (~ alpha) ● ONI indexer ● ocfl-tools ● ocfl-viewer ● Research Object Composer is a REST API for gradually building and depositing Research Objects according to a pre-defined profile. (RO-Crate support alpha) ● … (yours?) https://uts-eresearch.github.io/describo/
  • 46. • RDF and schema.org but you don't need to know. • Extend RO-Crate …. – Add your own schema.org types and properties. – Add in your own ontologies …and it still works! https://arkisto-platform.github.io/case-studies/ Under-ware
  • 47. Driver! Profile for workflows https://about.workflowhub.eu/Workflow-RO-Crate/
  • 48. Infrastructure families Driver! Profile for workflows On-boarding developers Web and dev friendly Workflow-RO-Crate profiles RO in practice External references – logically & physically contained – versions, snapshots, multi-typed, active, multi- stewarded, multi-authored, governance…
  • 49. More than plain JSON, Just Enough Linked Data Retain benefits of Linked Data in the toolbox • querying, graph stores, vocabularies, clickable URI as identifiers) • customization and conventions Plus all the other stuff a developer expects • documentation, examples, libraries, tools • simplifications rather than generalizations (less flexibility frees up developers) • “Just enough standards” cf. schema.org Linked Data “exotics” there for when the time is right if needed by the right people.
  • 50. Keep your eye on the target….. How do we make RO’s normative? – Propaganda and incentive models to scientists, target the Research Infrastructures to deliver. – Digital library community allies! Developer friendliness matters – Underware, incremental, ramps, embed, metadata automation, persuasive design Linked Data has a role – As a means but it is not an end. – Simpler version of Linked Data makes an adoption path (cf. Knowledge Graphs, schema.org, JSON- LD) FAIR principles for Research Objects…. – Unifying the vision with the practical
  • 51. Acknowledgements Barend Mons Sean Bechhofer Matthew Gamble Raul Palma Jun Zhao Mark Robinson AlanWilliams Norman Morrison Stian Soiland-Reyes Tim Clark Alejandra Gonzalez-Beltran Philippe Rocca-Serra Ian Cottam Susanna Sansone Kristian Garza Daniel Garijo Catarina Martins Iain Buchan Michael Crusoe Rob Finn StuartOwen Finn Bacall Bert Droebeke Laura Rodríguez Navas Ignacio Eguinoa Carl Kesselman Ian Foster Kyle Chard Vahan Simonyan Ravi Madduri Raja Mazumder GilAlterovitz Denis Dean II DurgaAddepalli Wouter Haak Anita DeWaard Paul Groth Oscar Corcho Peter Sefton Eoghan Ó Carragáin Frederik Coppens Jasper Koehorst Simone Leo Nick Juty LJ Garcia Castro Karl Sebby Alexander Kanitz Ana Trisovic http://researchobject.org Gavin Kennedy Mark Graves José María Fernández Jose Manuel Gomez-Perez Jason A. Clark Salvador Capella-Gutierrez Alasdair J. G. Gray Kristi Holmes Giacomo Tartari Hervé Ménager Paul Walk Brandon Whitehead Erich Bremer Mark Wilkinson Jen Harrow And many more....

Notas del editor

  1. More information on our website: https://zbmed.github.io/damalos The swings and roundabouts of a decade of fun and games with Research Objects Over the past decade we have been working on the concept of Research Objects http://researchobject.org). We worked at the 20,000 feet level as a new form of FAIR+R scholarly communication and at the 5 feet level as a packaging approach for assembling all the elements of a research investigation, like the workflows, data and tools associated with a publication. Even in our earliest papers [1] we recognised that there was more to success than just using Linked Data. We swing from fancy RDF models and SHACL that specialists could use to simple RO-Crates and JSON-LD that mortal programmers could use. Success depends on a use case need, a community and tools which we found particularly in the world of computational workflows right from the outset. All along there have been politics and that has recently got even hotter. All fun and games! [1] Sean Bechhofer, et al (2013) Why Linked Data is Not Enough for Scientists FGCS 29(2)599-611 https://doi.org/10.1016/j.future.2011.08.004 v
  2. I stopped living in research land a long time ago… RIs not academic …
  3. https://datascience.nih.gov/CancerResearchDataCommons We make the rocket launchers for the rocket science We are the microscope, telescope, datascope instrument makers
  4. Use Linked data Documentation and reporting
  5. https://www.dona.net/sites/default/files/2018-11/DOIPv2Spec_1.pdf https://www.eoscsecretariat.eu/sites/default/files/eosc-interoperability-framework-v1.0.pdf
  6. The Data-Driven Open Science, e-Laboratories Inside and outside Elsewhere, on-date Within, during Reproducible Research Environment Integrated infrastructure for producing and working with reproducible research. Reproducible Research Publication Environment Distributing and reviewing; academic credit; legal licensing etc. Such a merge between research infrastructure and research marketplace would overcome known "cultural barriers" and the "methodological barriers" mentioned above. In the RI scope, research products publishing would: be facilitated by the very ICT services that are generating them; take advantage of research activity awareness and product interlinking; support access rights issues that fit the need of the community; and subsume the major costs of storing and curating products. In other words, by bringing marketplace features within the RI, researchers can finally achieve their best effort in terms of scholarly communication. Science 2.0 Repositories: Time for a Change in Scholarly Communication Massimiliano Assante, Leonardo Candela, Donatella Castelli, Paolo Manghi and Pasquale Pagano Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Italy {assante, candela, castelli, manghi, pagano}@isti.cnr.it DOI: 10.1045/january2015-assante
  7. Beyond packaging and not just at the end! A knowledge graph of the butterfly
  8. FDO Framework PIDs Byte-stream described by metadata https://www.dona.net/digitalobjectarchitecture https://fairdigitalobjectframework.org/ FDO types and type definition registries https://www.dona.net/sites/default/files/2018-11/DOIPv2Spec_1.pdf https://www.eoscsecretariat.eu/sites/default/files/eosc-interoperability-framework-v1.0.pdf https://www.rd-alliance.org/group/gede-group-european-data-experts-rda/wiki/gede-digital-object-topic-group
  9. A Knowledge Graph of the investigation
  10. No just encapsulated References
  11. RO in practice Interpretation, Comparison, Portability, Reuse, Credit, Citation Interpretation, Comparison, Preservation, Repair, Portability, Reuse, Execution Active Research, Evolving codes, New data, Portability, Reuse, Execution
  12. A knowledge graph about a workflow.
  13. Most workflows in papers are ad-hoc, artisan So workflows are freedom …. Move fast, break grow it Bring enterprise scale Its so brittle, data science
  14. So we came up with ROs
  15. ROs combine containers and incremental metadata Like DataONE Packages Its all about the metadata
  16. The first implementation of Research Objects in 2009 for sharing workflows in myExperiment https://doi.org/10.1093/nar/gkq429 was based on RDF ontologies http://eprints.soton.ac.uk/id/eprint/267787, building on Dublin Core, FOAF, SIOC, Creative Commons, OAI-ORE and (later) DBPedia to form myExperiment ontologies for describing social networking, attribution and creditation, annotations, aggregation packs, experiments, view statistics, contributions, and workflow components. http://web.archive.org/web/20091115080336/http://rdf.myexperiment.org/ontologies Programmatic access to Research Objects was facilitated with an RDF endpoint that exposed individual myExperiment resources, also queriable from a SPARQL endpoint, both using the myExperiment vocabularies and RDF formats RDF/XML and Turtle. Re-use of existing vocabularies. Making lots of new vocabularies. - too many?? Stack becomes very large to learn and use. RDF as its own worst enemy. Need for simplicity. Cater for "web developer". Re-use of existing vocabularies. Making lots of new vocabularies. - too many?? Stack becomes very large to learn and use. RDF as its own worst enemy. Need for simplicity. Cater for "web developer".
  17. JSON-LD
  18. Howard Ratner, Chair STM Future Labs Committee, CEO EVP Nature Publishing Group Director of Development for CHORUS (Clearinghouse for the Open Research of US) STM Innovations Seminar 2012 projects
  19. But really friends, and 1 platform and 1 project
  20. Championing through bubble projects
  21. Desiderata, norms But did we really do this? But it's about prioritizing, so say we use a production-ready database but put a bleeding-edge research object application on top, like how Peter Sefton's team are putting things quickly together. but of course it's way simplified kind of "just enough complexity" (or just enough standards!) to get sufficient extra benefits from what already exists (Linked Data, vocabularies, tooling, validation, transformation etc) without compromising the developer entry-level experience too much that they rather do their own thing (another triangle wheel).
  22. We started in theory, but we work in practice ROs sit in the middle
  23. Inspection states Conform to a profile – STILL do not have profile validation… Re-use of existing vocabularies. Making lots of new vocabularies. - too many?? Stack becomes very large to learn and use. RDF as its own worst enemy. Need for simplicity. Cater for "web developer".
  24. Heath Robinson Base Type nested ROs? specialised Workflow-RO-Crate profiles for different WfMS RO-Crate -> WRO-Crate -> Galaxy-WRO-Crate JSon schema Data Crate examples…. From you can point to also how RDF graph have been reinvented as knowledge graphs - just the same thing with less features, and then suddenly it took off 14:45 Linked Data and Semantic web are the same Stian, 14:45 as it became understandable! Stian, 14:45 and schema.org vs. all the mismash of specialized ontologies
  25. Sometimes its too much
  26. For Practice, A Village Sustainability – get some traction….
  27. Sponsored by BioExcel
  28. University of Technology Sydney "As a researcher…I’m a bit bloody fed up with Data Management" https://cameronneylon.net/blog/as-a-researcher-im-a-bit-bloody-fed-up-with-data-management/
  29. Hence desktop solutions like Describo are also important. Not sure if you mentioned Describo! Library & Information Science Network
  30. We did oversimplify to just the datacrate ! Data structure Depth 1 Adopt software stacks widely used on the Web BagIT data profile , schema.org, JSON-LD, JSONSchema Flattened JSON-LD, no need for RDF libraries A self-described container using LD as a core principle Data entities are files/directories or web resources Easy to understand and simple conceptually Data structure depth limited to 1 Boundness of elements is explicit Strong opinionated guide to current best practices  Example driven rather than strict specification Implementers can add additional metadata using other schema.org types and properties KGs are simplified and then could be adopted. We swung too far and threw away the remote referencing – hence RO-Crate 1.1 Next RO-Crate and containers… Conceptual Definition A key premise of RO-Crate is the existence of a wide variety of resources on the Web that can help describe research. As such RO-Crate relies on concepts from the Web and in particular linked data.  Linked Data as a core principle Linked Data [cite] is a core principle of RO-Crate, and IRIs are therefore used to identify the RO-Crate, its constituent parts and metadata descriptions, as well as to the properties and classes used in the metadata. IRIs are a generalization of URIs (which include well-known http/https URLs), permitting international Unicode characters without %-encoding [RFC3987].  Using Linked Data, where consumers can follow the links for more (ideally both human- or machine-readable) information, the RO-Crate can then be sufficiently self-described and related using global identifiers, without needing to recursively fully describe every referenced entity. The base of Linked Data and shared vocabularies also means multiple RO-Crates and other Linked Data resources can be indexed, combined, queried, integrated or transformed using existing Semantic Web technologies like SPARQL and knowledge graph triple stores. RO-Crate is a self-described container An RO-Crate is defined as a self-described Root Data Entity, which describes and contains data entities, which are further described using contextual entities.  A  data entity is primarily a file (bytes on disk somewhere) or a directory (set of named files); while a contextual entity exists outside the information system (e.g. a Person) and its bytes are primarily metadata. The Root Data Entity is a directory, the RO-Crate Root, identified by the presence of the RO-Crate Metadata File ro-crate-metadata.json. This is a JSON-LD file that describes the RO-Crate, its content and related metadata using Linked Data.  The minimal requirements for the root data entity metadata is name, description and datePublished, as well as a contextual entity identifying its license; but additional metadata is frequently added depending on the purpose of the particular RO-Crate. Because the RO-Crate is just a set of related files and directories, it can be stored, transferred or published in multiple ways, such as BagIt [RFC8493], Oxford Common File Layout [OCFL], downloadable ZIP archives in Zenodo or dedicated online repositories, as well as published directly on the web, e.g. using GitHub Pages. Combined with Linked Data identifiers this caters for a diverse set of storage and access requirements across different scientific domains, e.g. metagenomics workflows producing ~100 GB of genome data, or cultural heritage records with access restrictions for personally identifiable data. Data Entities are described using Contextual Entities RO-Crate distinguishes between data and contextual entities in a similar way to HTTP terminology information (data) and non-information (contextual) resource [HttpRange-14]. While the data entities in the most common scenario are files and directories located by relative IRI references within the RO-Crate Root, they can also be web resources identified with absolute http/https IRIs. As both types of entities are identified by IRIs, their distinction is allowed to be blurry; data entities can be located anywhere and be complex, and the contextual entities can have a Web presence beyond their description inside the RO-Crate. For instance  <https://orcid.org/0000-0002-1825-0097> is primarily an identifier for a person, but is secondarily also a web page that describes that person and their academic work.  Any particular IRI might appear as a contextual entity in one RO-Crate, and a data entity in another; their distinction is that data entities can be considered to be contained or captured by the RO-Crate, while the contextual entities are mainly explaining the RO-Crate and its entities. It follows that an RO-Crate should have one or more data entities, although this is not a formal requirement. Best Practice Guide rather than strict specification RO-Crate builds on Linked Data standards and community-evolved vocabularies like JSON-LD and schema.org, but rather than enforce additional constraints and limits using perhaps intimidating technologies like RDF shapes (SHACL, ShEx), RO-Crate aims instead to build a set of best practice guides on how to apply the existing standards in a common way to describe research outputs and their provenance. As such the RO-Crate specification [RO-Crate 1.1] can be seen as an opinionated and example-driven guide to writing schema.org metadata as JSON-LD, and leaves it open for implementers to add additional metadata using other schema.org types and properties, or even using other Linked Data vocabularies/ontologies or their own ad-hoc terms. This does not mean that RO-Crate has not got constraints, crucially the specification is quite strict on the style of the flattened JSON-LD to facilitate lightweight developer consumption and generation of the RO-Crate Metadata file without the need for RDF libraries. Checking Simplicity As previously discussed, one aim of RO-Crate is to be conceptually simple. This simplicity has been repeatable checked and confirmed through a community process. See for example discussion at (GitHub issue #71 on ad-hoc vocabularies).  To further verify this idea, we have formalized the RO-Crate definition (see Appendix X). An important result of this exercise is that the underlying data structure of RO-Crate is a depth limited tree of 1 level. The formalization also emphasizes the boundness of the structure, namely,  that elements are specifically identified as part of the RO-Crate or are typed as being references external to the structure (i.e. ContextualEntities).
  31. Active objects, versioning, nesting, references
  32. STILL A LONG WAY TO GO! Still need better guides Two draft profiles ComputationalWorkflow v 0.5 10 mandatory, 15 recommended, 7 optional fields FormalParameter v 0.1 3 mandatory, 1 recommended, 3 optional fields Two draft types ComputationalWorkflow v 0.1 FormalParameter v 0.1 Mappings to DataCite & OpenAIRE
  33. What I would like to see in the end is how we are NOT throwing out the Linked Data foundation - and that we want to retain all the benefits, e.g. querying, graph stores, vocabularies, clickable URI as identifiers - but they are mere things we want to keep in our toolbox rather than a goal in itself - and as such the toolbox also have the other things developers expects; documentation, examples, libraries, simplifications rather than generalizations [removing unneeded degrees of freedom like multiple RDF formats]; showing them how to do the Right Thing (tm) almost by accident. Then they can - if they want to and when time is right - pick up some of those shiny tools in the bottom of the toolbox. Just like in cars - they benefit a lot from reusing standards - like communication buses, agreeing on mounts and screws, almost interchangable parts so that manufacturers can mix and match when they put it together. But you as a consumer don't care until you realize your exhaust pipe is broken and the mechanic now can choose how to fix it rather than be restrained to one single manufacturer. some kind of "Why not just plain JSON?" point perhaps. We are not after the exotic features that you will hear about in the rest of the conference, but the bog standard tooling that arguably you could also do in a NoSQL database with enough customization and conventions. But rather than invent our own C&C we retain the Linked Data conventions - "Just enough standards" - following the lead of schema.org
  34. Keep your eye on the target How do we persuade Scientists? We do not That’s why we need to persuade the infrastructure people What other things do we need to do Small Picture – you don’t Developer Friendliness Matters Its underware – customers are infrastructures Marrying the FDO vision with the RO Embed in Research Infrastructures