Overview of standards/stakeholders in life science (RDA Engagement Interest Group)
1. Data Consultant,
Honorary Academic Editor
Susanna-Assunta Sansone, PhD
Associate Director,
Principal Investigator
RDA Engagement IG, Sept, 2013
Mapping the landscape of stakeholders and
standards in the life sciences
@
2. § Researchers and bioinformaticians in both
academic and commercial arenas, along with
funding agencies and publishers, embrace
the concept that community-developed, open,
common reporting standards are pivotal to
structure and enrich the annotation of
• entities of interest (e.g., genes,
metabolites, phenotypes) and
• experimental steps (e.g.,
provenance of study materials,
technology and measurement types)
Standards for describing and reporting datasets
3. A ‘general mobilization’ to develop standards, e.g.:
report the same core,
essential information
use the same word and
refer to the same ‘thing’allow data to flow from
one system to another
4. A ‘general mobilization’ to develop standards…..BUT
§ Fragmentation of the standards is a major issue !
• Being focused on particular communities’ interests, be their individual technologies
or biological/biomedical disciplines, leads to duplication of effort, and more
seriously, the development of (largely arbitrarily) different standards
• This severely hinders the interoperability of databases and tools and ultimately the
integration of datasets
5. Growing number of reporting standards
+ 130
Estimated
+ 150
Source:MIBBI,
EQUATOR
+ 303
Source:BioPortal
Databases,
annotation,
curation
tools
miame!
MIAPA!
MIRIAM!
MIQAS!
MIX!
MIGEN!
CIMR!
MIAPE!
MIASE!
MIQE!
MISFISHIE….!
REMARK!
CONSORT!
MAGE-Tab!
GCDML!
SRAxml!
SOFT!
FASTA!
DICOM!
MzML!
SBRML!
SEDML…!
GELML!
ISA-Tab!
CML!
MITAB!
AAO!
CHEBI!
OBI!
PATO! ENVO!
MOD!
BTO!
IDO…!
TEDDY!
PRO!
XAO!
DO
VO!
To track
provenance of
the information
and ensure
richness of data
and experimental
metadata
descriptions, to
maximize
reusability
7. • A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
8. • A coherent, curated and searchable registry of standards for describing
and reporting experiments in life science, environmental, biomedical and
biotechnological domains
• Progressively associate standards to data policies and databases
• Develop assessment criteria for usability and popularity of standards
• Help stakeholders to make informed decisions on e.g. what standards or
databases to use or recommend
9. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
9
10. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
10
11. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
11
Users can claim
entries and
maintain them
12. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
www.ebi.ac.uk/net-project
12
13. § Existence of a formal specification, with:
• good level of documentation, with scope and use cases
• ease of implementation
• human and machine readability
§ Broad adoption and implementation, outside the initial group by:
• community databases (hence existence of standards-annotated datasets)
• software (e.g. for reporting, editing, curating, submitting to databases)
§ Active user community, also providing:
• support
• responsiveness to community requests
• examples
§ Interoperability with and extensibility to other standards, ranging from:
• compatibility with other standards
• flexibility to cover new domains
• conversion and mapping, if applicable
§ Openness
Criteria to be used in evaluating standards for adoption:
Jessica D. Tenenbaum
Duke Translational Medicine Institute
Melissa Haendel
OHSU Library
Susanna-Assunta Sansone
University of Oxford
also as part of the NIH Clinical and Translational
Science Award (CTSA) program
14. § Database name
§ Main resource URL
§ Contact information
§ Date resource established (year)
§ Conditions of use (free, or type of license)
§ Scope: data types captured, curation polic
§ Standards implemented: checklists, terminologies, formats
§ Taxonomic coverage
§ Data accessibility/output options
§ Data release frequency
§ Versioning period and access to historical files
§ Documentation available
§ User support options
§ Data submission policy
§ Relevant publications
§ Tools available
Core attributes to describe databases and assist in
evaluating scope and relevance as well as access to data:
Gaudet et al. NAR Database, 2011
15. Beside grass-roots initiatives and formal
standardization initiatives,
which other stakeholders are relevant and
operative in the data area?
17. § Pharma R&D has invested heavily in procedures and tools that integrate external
information with their own data to enhance the decision-making process
§ Now pre-competitive initiatives and private-public partnerships are blooming as
solutions towards reducing costs, associated to data management and curation,
and maximize data interoperability
Pre-competitive initiative
18. Big Life
Science
Company
Yesterday Today Tomorrow
Yesterday Today Tomorrow
Innovation
Model
Innovation inside Searching for Innovation Heterogeneity of collaborations; part of
the wider ecosystem
IT Internal apps & data Struggling with change
security and trust
Cloud, services
Data Mostly inside In and out Distributed
Portfolio Internally driven and owned Partially shared Shared portfolio
Credit to: Pistoia Alliance
Big Life
Science
Company
Proprietary
content
provider
Public
content
provider
Academic
group
Software vendor
CRO
Service provider
Regulatory
authorities
The information landscape in the industrial sector
…evolving…
19. Our industry needs a Disruptive Innovation.
That Disruption...is Pistoia
Credit to: Pistoia Alliance
If you want to go fast, go alone
If you want to go far, go together