SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
Policies and standards for reproducible research:
                from theory to practice
§  How do we make standards-compliant data sharing culture
    functional and efficient?
   •  Several data management, sharing policies and plans have
      emerged; the number of data journals is growing and guidelines to
      authors for reporting data are being enriched; there are thousands of
      biological databases and a wealth of community standards

   •  Although, funders, journal editors, data producers, consumers
      and service providers agree in principle that shared, annotated
      research data and methods offers new discovery opportunities,
      compliance is challenging in practice

§  Starting from the genomics domain and extending to other areas
    of life-science, we are looking to highlight the success stories and
    existing problems
About this session - speakers
§  Representatives from stakeholders involved in complete cycle of data
    •  from funding and regulation, to production, release and re-use
§  Setting the scene:
    •  Susanna-Assunta Sansone, University of Oxford, UK
    •  Scott Edmunds, GigaScience BGI Shenzhen, China
§  Funders
    •  Rita Colwell, University of Maryland, USA
    •  Paula J. Olsiewski, Sloan Foundation
§  Service providers and/or data producers
    •  Philippe Rocca-Serra, University of Oxford, UK
    •  Folker Meyer, Argonne National Laboratory, USA
    •  Srikrishna Subramanian, IMTECH, India
§  Editors
    •  Clare Garvey, Genome Biology/BioMed Central
    •  Craig Mak, Nature Biotechnology
About this session - topics
§  Data management, preservation and sharing policies – view points
   •  formulation and enforcement, or
   •  uptake and compliance
§  Reporting standards – experiences and challenges
   •  evolutions of standards, costs of compliance, reward for complying
      etc.
   •  usability of standards when working across disciplines, also they all
      have differing community norms
   •  challenges in integrating data types and how standards can help
§  Tackling the challenges – approaches and lessons learned
   •  balance needs and expectations (data producers, consumers,
      reviews, service providers etc.)
   •  potential role of each stakeholder
   •  new way forwards
the evolving portfolio of data sharing enablers

           Susanna-Assunta Sansone, PhD


                    University of Oxford,
          Oxford e-Research Centre, Oxford, UK

            http://uk.linkedin.com/in/sasansone


            GSC13th, Shenzhen, China, March 5-7, 2012
From reusable data to reproducible research




To make the datasets comprehensible, interoperable and reusable,
underpinning future investigations, we need common ways to report and
share the experimental details and the associated results.

Consistent reporting will have a positive and long-lasting impact on the value
of collective scientific outputs.
    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                   www.ebi.ac.uk/net-project
A ‘general mobilization’ to develop standards, e.g.:




                               use the same word and
     allow data to flow from                               report the same core,
                               refer to the same ‘thing’
     one system to another                                 essential information
A ‘general mobilization’ to develop standards…..BUT




§  Fragmentation of the standards is a major issue !
    •    Being focused on particular communities’ interests, be their individual technologies
         or biological/biomedical disciplines, leads to duplication of effort, and more
         seriously, the development of (largely arbitrarily) different standards
    •    This severely hinders the interoperability of databases and tools and ultimately the
         integration of datasets
Growing number of reporting standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
Growing number of reporting standards
                                                      + 303




                                                                                    + 150
                          + 130




                                                                                            Source: MIBBI,
                                                              Source: BioPortal




                                                                                                   EQUATOR
                                  Estimated




                                                                                                               Databases,
                                                                                                               annotation,
                                                                                                                curation
                                                                                                                  tools
                       MAGE-Tab!                AAO!                              miame!
                     GCDML!                                                            MIAPA!
                                                   CHEBI!
                       SRAxml!                  OBI!                              MIRIAM!
                                                     VO!
             SOFT!                                                                          MIQAS!
                   FASTA!                     PATO!                                   MIX!
      CML!                                              ENVO!                                      REMARK!
               DICOM!                                                                    MIGEN!
     GELML!                                    MOD!
                 SBRML!                                                               MIAPE!                 MIQE!
                                                     TEDDY!
 MITAB!     MzML!                             XAO!                                            CIMR! CONSORT!
                                                          BTO!
ISA-Tab! SEDML…!             DO                PRO!       IDO…!                             MIASE! MISFISHIE….!
But how much do we know about these standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
But how much do we know about these standards
            Which tools and     I use high throughput
              databases       sequencing technologies,
           implement which    which one are applicable
              standards?                to me?

                                            How can I get
    What are the
                                             involved to
criteria to evaluate
                                               propose
 their status and
                                            extensions or
       value?
                                            modifications?



          Which one are              I work on plants,
         mature enough for           are these just for
           me to use or                 biomedical
           recommend?                  applications?
Often
               Which tools and   not muchI use high throughput
                                           …
                 databases                sequencing technologies,
              implement which             which one are applicable
                 standards?                       to me?


                                                          How can I get
    What are the
                                                           involved to
criteria to evaluate
                                                             propose
 their status and
                                                          extensions or
       value?
                                                          modifications?



           Which one are                              I work on plants,
Several policy documentations and guidelines are inconsistent just for
                                                      are these and/or
         mature enough for
unclear when recommending use of standards, e.g.:        biomedical
            me to use or
“..recommend use of appropriate standards...where these exists…....mature,
                                                        applications?
            recommend?
stable efforts....MIAME format…..standards from accredited standards
organizations…..deposition to public repositories, supporting these
standards…...”
2009




14
15
A coherent, curated and searchable catalogue of data sharing resources that
(collaboratively) works to:

2. Centralizes community-developed bioscience standards and make them
discoverable; linking to:
     •    data sharing, preservation and management policies
     •    other portals e.g. MIBBI, NCBO’s BioPortal, NIF, BioSiteMaps, OBO foundry
     •    related open access, published material e.g. BioMedCentral, Nature Precedings, F1000
     •    tools and databases implementing the standards e.g. collaboration with NAR Database

3. Identifies and maintain a set of (implicit) criteria for assessing usability and
popularity of the standards, including:
     •  implementations by tools and databases
     •  availability of standards-compliant, public datasets
     •  relations among standards

3. Fosters communication among groups, in particular to:
     •  address overlaps and duplication of efforts and enhance interoperability of standards
     •  produce ‘best practice’ guidelines starting new, or contributing to existing efforts

 Ø  Will allow stakeholders (funders, journals, service providers and
16   researchers) toSystems Biologyinformed decision on standards
      The International Conference on make (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
                                                      www.ebi.ac.uk/net-project
17   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Over 400 entries
                                          (public and in curation)




18   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Smith et al, 2007




The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                               www.ebi.ac.uk/net-project
Smith et al, 2007




Taylor, Field, Sansone et al, 2008

    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                   www.ebi.ac.uk/net-project
List of databases, linked to standards a collaboration with                                                 Database Issue




21   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




22   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
List of databases, linked to standards a collaboration with                                                 Database Issue




23   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
Define groups and relations among standards




                                                                                                                CREDIT:
 The relationship among popular standard formats for pathway information                                        Demir, et al., The BioPAX
 BioPAX and PSI-MI are designed for data exchange to and from databases and                                     community standard for
 pathway and network data integration. SBML and CellML are designed to                                          pathway data sharing,
 support mathematical simulations of biological systems and SBGN represents                                     2010.
 pathway diagrams.
24   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
E.g. in the genomics context:
          resources from GSC and other communities…




                                                                               INSDC
                          GCDML
                                                        EnvO                   GOLD
                          SRAxml
       MixS                                          EnvO-light             MG-RAST
                          ISA-Tab
                                                         OBI                 CAMERA
                            BIOM                         etc…                   SILVA
                        (data matrices)
                                                                                etc…
Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
E.g. in the genomics context:
          resources from GSC and other communities…




                                                                               INSDC
                          GCDML
                                                        EnvO                   GOLD
                          SRAxml
       MixS                                          EnvO-light             MG-RAST
                          ISA-Tab
                                                         OBI                 CAMERA
                            BIOM                         etc…                   SILVA
                        (data matrices)
                                                                                etc…
Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
E.g. in the genomics context:
          resources from GSC and other communities…




                                                                               INSDC
                          GCDML
                                                        EnvO                   GOLD
                          SRAxml
       MixS                                          EnvO-light             MG-RAST
                          ISA-Tab
                                                         OBI                 CAMERA
                            BIOM                         etc…                   SILVA
                        (data matrices)
                                                                                etc…
Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
E.g. in the genomics context:
          resources from GSC and other communities…




                                                                               INSDC
                          GCDML
                                                        EnvO                   GOLD
                          SRAxml
       MixS                                          EnvO-light             MG-RAST
                          ISA-Tab
                                                         OBI                 CAMERA
                            BIOM                         etc…                   SILVA
                        (data matrices)
                                                                                etc…
Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
E.g. in the genomics context:
          resources from GSC and other communities…




                                                                               INSDC
                          GCDML
                                                        EnvO                   GOLD
                          SRAxml
       MixS                                          EnvO-light             MG-RAST
                          ISA-Tab
                                                         OBI                 CAMERA
                            BIOM                         etc…                   SILVA
                        (data matrices)
                                                                                etc…
Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
Acknowledgements:
Philippe Rocca-Serra (University of Oxford)
Eamonn Maguire (University of Oxford)
Annapaola Santarsiero (University of Oxford)
Susanna Sansone (University of Oxford)
Chris Taylor (EMBL-EBI)
Dawn Field (NERC-NEBC)
with contributions from members of our communities and
individuals.

Más contenido relacionado

Más de GigaScience, BGI Hong Kong

Más de GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Último

Último (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Susanna-Assunta Sansone: An Overview of the Evolving Portfolio of Data Sharing Enablers: BioSharing

  • 1. Policies and standards for reproducible research: from theory to practice §  How do we make standards-compliant data sharing culture functional and efficient? •  Several data management, sharing policies and plans have emerged; the number of data journals is growing and guidelines to authors for reporting data are being enriched; there are thousands of biological databases and a wealth of community standards •  Although, funders, journal editors, data producers, consumers and service providers agree in principle that shared, annotated research data and methods offers new discovery opportunities, compliance is challenging in practice §  Starting from the genomics domain and extending to other areas of life-science, we are looking to highlight the success stories and existing problems
  • 2. About this session - speakers §  Representatives from stakeholders involved in complete cycle of data •  from funding and regulation, to production, release and re-use §  Setting the scene: •  Susanna-Assunta Sansone, University of Oxford, UK •  Scott Edmunds, GigaScience BGI Shenzhen, China §  Funders •  Rita Colwell, University of Maryland, USA •  Paula J. Olsiewski, Sloan Foundation §  Service providers and/or data producers •  Philippe Rocca-Serra, University of Oxford, UK •  Folker Meyer, Argonne National Laboratory, USA •  Srikrishna Subramanian, IMTECH, India §  Editors •  Clare Garvey, Genome Biology/BioMed Central •  Craig Mak, Nature Biotechnology
  • 3. About this session - topics §  Data management, preservation and sharing policies – view points •  formulation and enforcement, or •  uptake and compliance §  Reporting standards – experiences and challenges •  evolutions of standards, costs of compliance, reward for complying etc. •  usability of standards when working across disciplines, also they all have differing community norms •  challenges in integrating data types and how standards can help §  Tackling the challenges – approaches and lessons learned •  balance needs and expectations (data producers, consumers, reviews, service providers etc.) •  potential role of each stakeholder •  new way forwards
  • 4. the evolving portfolio of data sharing enablers Susanna-Assunta Sansone, PhD University of Oxford, Oxford e-Research Centre, Oxford, UK http://uk.linkedin.com/in/sasansone GSC13th, Shenzhen, China, March 5-7, 2012
  • 5. From reusable data to reproducible research To make the datasets comprehensible, interoperable and reusable, underpinning future investigations, we need common ways to report and share the experimental details and the associated results. Consistent reporting will have a positive and long-lasting impact on the value of collective scientific outputs. The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 6. A ‘general mobilization’ to develop standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  • 7. A ‘general mobilization’ to develop standards…..BUT §  Fragmentation of the standards is a major issue ! •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  • 8. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 9. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 10. But how much do we know about these standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 11. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved to criteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  • 12. Often Which tools and not muchI use high throughput … databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved to criteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, Several policy documentations and guidelines are inconsistent just for are these and/or mature enough for unclear when recommending use of standards, e.g.: biomedical me to use or “..recommend use of appropriate standards...where these exists…....mature, applications? recommend? stable efforts....MIAME format…..standards from accredited standards organizations…..deposition to public repositories, supporting these standards…...”
  • 13.
  • 15. 15
  • 16. A coherent, curated and searchable catalogue of data sharing resources that (collaboratively) works to: 2. Centralizes community-developed bioscience standards and make them discoverable; linking to: •  data sharing, preservation and management policies •  other portals e.g. MIBBI, NCBO’s BioPortal, NIF, BioSiteMaps, OBO foundry •  related open access, published material e.g. BioMedCentral, Nature Precedings, F1000 •  tools and databases implementing the standards e.g. collaboration with NAR Database 3. Identifies and maintain a set of (implicit) criteria for assessing usability and popularity of the standards, including: •  implementations by tools and databases •  availability of standards-compliant, public datasets •  relations among standards 3. Fosters communication among groups, in particular to: •  address overlaps and duplication of efforts and enhance interoperability of standards •  produce ‘best practice’ guidelines starting new, or contributing to existing efforts Ø  Will allow stakeholders (funders, journals, service providers and 16 researchers) toSystems Biologyinformed decision on standards The International Conference on make (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 17. 17 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 18. Over 400 entries (public and in curation) 18 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 19. Smith et al, 2007 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 20. Smith et al, 2007 Taylor, Field, Sansone et al, 2008 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 21. List of databases, linked to standards a collaboration with Database Issue 21 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 22. List of databases, linked to standards a collaboration with Database Issue 22 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 23. List of databases, linked to standards a collaboration with Database Issue 23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone
  • 24. Define groups and relations among standards CREDIT: The relationship among popular standard formats for pathway information Demir, et al., The BioPAX BioPAX and PSI-MI are designed for data exchange to and from databases and community standard for pathway and network data integration. SBML and CellML are designed to pathway data sharing, support mathematical simulations of biological systems and SBGN represents 2010. pathway diagrams. 24 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 25. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  • 26. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  • 27. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  • 28. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  • 29. E.g. in the genomics context: resources from GSC and other communities… INSDC GCDML EnvO GOLD SRAxml MixS EnvO-light MG-RAST ISA-Tab OBI CAMERA BIOM etc… SILVA (data matrices) etc… Disclaimer: draft for illustrative purpose; this is a dynamic environment, work in progress…
  • 30. Acknowledgements: Philippe Rocca-Serra (University of Oxford) Eamonn Maguire (University of Oxford) Annapaola Santarsiero (University of Oxford) Susanna Sansone (University of Oxford) Chris Taylor (EMBL-EBI) Dawn Field (NERC-NEBC) with contributions from members of our communities and individuals.