SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
www.slideshare.net/SusannaSansone




Delivering reproducible bioscience data by enabling
                biocuration at the source


              Susanna-Assunta Sansone, PhD
              Principal Investigator and Team Leader,
       University of Oxford e-Research Centre, Oxford, UK

        Academic Consultant, Open Access Data Products,
                      Nature Publishing Group


                         Data Curation Centre (DCC)
    13th Regional Data Management Roadshow, London, 20 November 2012
University of Oxford e-Research Centre
University of Oxford e-Research Centre



                      Providing research
                      computing, high-
                      performance
                      computing
                              Integrating with
                              national and
                              international
                              infrastructure

                      Supporting leading
                      edge facilities through
                      education and training
University of Oxford e-Research Centre



                   Collaborating with European and wider
                   international groups in, e.g.:
                      •    energy,
                      •    radio astronomy,
                      •    biological data federation,
                      •    life sciences simulation,
                      •    biodiversity,
                      •    computational chemistry,
                      •    neuroscience,
                      •    digital humanities tools,
                      •    digital music analysis
                      •    visualization
                      •    …
My team’s activities and stakeholders we work with
     data management and biocuration, collaborative development of software
                   and database, standards and ontology


•    environmental genomics                            •    stem cell discovery
•    metabolomics                                      •    system biology
•    metagenomics                                      •    transcriptomics
•    nanotechnology                                    •    toxicogenomics
•    proteomics                                        •    environmental health
Outline



        “The buzz around reproducible bioscience data:
                    the communities and the standards”


“The reality from the buzz:
challenges and exemplar project”
http://www.flickr.com/photos/notbrucelee/8016189356/   CC BY
C                                  E
      O M            R            H
                                              N
              P             E                          S
                                                           I

                                                           B
                                                               L
                                                                   E
http://www.flickr.com/photos/notbrucelee/8016189356/           CC BY
C                                  E
        O M           R           H
                                              N
              P             E                          S
I
    N             R               E                        I
        T E            O P              R
                                                A
                                                           B
                                                               L
                                                                   E
http://www.flickr.com/photos/notbrucelee/8016189356/           CC BY
C                                  E
        O M           R           H
                                              N
              P               E                        S
I
    N             R               E                        I
        T E            O P              R
                                                A
                R         E       U      S                 B
                                                               L
                                                                   E
http://www.flickr.com/photos/notbrucelee/8016189356/           CC BY
experimental design

     sample characteristic(s)

     experimental variable(s)

            technology(s)

           measurement(s)

             protocols(s)

              data file(s)

                   ......




11      The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                       www.ebi.ac.uk/net-project
§  We must strike a balance
    between
     •  depth and breadth of
        information; and
     •  sufficient information
        required to reuse the data




                                                                                                    §  Capture all salient features
                                                                                                        of the experimental
                                                                                                        workflow
                                                                                                    §  Make annotation explicit
                                                                                                        and discoverable
                                                                                                    §  Structure the descriptions
                                                                                                        for consistency, tracking
12   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Growing, worldwide movement for reproducible research

 esoteric formats
                                                                    comprehensible?
 lack of sufficient
     contextual                                                       interoperable?
    information
hoc or proprietary                                                        reusable?
  terminologies
                                            Source: http://ebbailey.wordpress.com

§  Researchers and bioinformaticians in both academic and commercial
    science, along with funding agencies and publishers, embrace the
    concept that community-developed standards are pivotal to structure
    and enrich the annotation of
         •  entities of interest (e.g., genes, metabolites, phenotypes) and
         •  experimental steps (e.g., provenance of study materials,
            technology and measurement types)
Community mobilization to develop standards, e.g.:




                               use the same word and
     allow data to flow from                               report the same core,
                               refer to the same ‘thing’
     one system to another                                 essential information
Is this general mobilization good or bad?




                                        use the same word and
              allow data to flow from                               report the same core,
                                        refer to the same ‘thing’
              one system to another                                 essential information


§  Fragmentation of the standards is a major issue
     •  Being focused on particular communities’ interests, be their individual
        technologies or biological/biomedical disciplines, leads to duplication of effort,
        and more seriously, the development of (largely arbitrarily) different standards
     •  This severely hinders the interoperability of databases and tools and ultimately
        the integration of datasets
Growing number of reporting standards




                       MAGE-Tab!     AAO!            miame!
                     GCDML!                               MIAPA!
                                        CHEBI!
                       SRAxml!       OBI!            MIRIAM!
                                          VO!
             SOFT!                                            MIQAS!
                   FASTA!          PATO!                MIX!
      CML!                                  ENVO!                    REMARK!
               DICOM!                                      MIGEN!
     GELML!                         MOD!
                 SBRML!                                 MIAPE!     MIQE!
                                        TEDDY!
 MITAB!     MzML!                XAO!                         CIMR! CONSORT!
                                             BTO!
ISA-Tab! SEDML…!             DO     PRO!     IDO…!          MIASE! MISFISHIE….!
Growing number of reporting standards
                                                      + 303




                                                                                    + 150
                          + 130




                                                                                            Source: MIBBI,
                                                              Source: BioPortal




                                                                                                   EQUATOR
                                  Estimated




                                                                                                               Databases,
                                                                                                               annotation,
                                                                                                                curation
                                                                                                                  tools
                       MAGE-Tab!                AAO!                              miame!
                     GCDML!                                                            MIAPA!
                                                   CHEBI!
                       SRAxml!                  OBI!                              MIRIAM!
                                                     VO!
             SOFT!                                                                          MIQAS!
                   FASTA!                     PATO!                                   MIX!
      CML!                                              ENVO!                                      REMARK!
               DICOM!                                                                    MIGEN!
     GELML!                                    MOD!
                 SBRML!                                                               MIAPE!                 MIQE!
                                                     TEDDY!
 MITAB!     MzML!                             XAO!                                            CIMR! CONSORT!
                                                          BTO!
ISA-Tab! SEDML…!             DO                PRO!       IDO…!                             MIASE! MISFISHIE….!
But how much do we know about these standards

            Which tools and        I use high throughput
              databases          sequencing technologies,
           implement which       which one are applicable
              standards?                   to me?

                                               How can I get
    What are the
                                                involved to
criteria to evaluate
                                                  propose
 their status and
                                               extensions or
       value?
                                               modifications?



          Which one are                 I work on plants,
         mature enough for              are these just for
           me to use or                    biomedical
           recommend?                     applications?
A catalogue to map the
                                                                                  landscape of standards and the
                                                                                  systems implementing them:
                                                                                  Over 400 bio-standards
                                                                                  (public and in curation)
                                                                                        Field*, Sansone* et al., Omics data sharing. Science
19   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                        326, 234-36 (2009) doi:0.1126/science.1180598
                                                                                    www.ebi.ac.uk/net-project
•    A coherent, curated and searchable catalogue of data sharing resources
•    Bioscience standards and associated data-sharing policies, publications, tools and databases
•    Assessment criteria for usability and popularity of standards
•    Relationships among standards
•    Encouragement for communication & interaction among groups
•    Promoting interoperability & informed decisions about standards
Social engineering




22   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Ownership of open standards
                                          can be problematic in broad,
                                           grass-root collaborations; it
                                          requires improved models, to
                                        encourage maintenance of and
                                         contributions to these efforts,
                                           supporting their evolutions




23   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
The extensive community
                                         liaison needs to be managed
                                            and funded; rewards and
                                        incentives need to be identified
                                               for all contributors




24   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
The cost of implementing a
                                           standards-supported data
                                        sharing vision is as large as the
                                         number of stakeholders that
                                         must operate synchronously




25   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Funders are actively developing data policies
Similar trend in the regulatory arena…
… and in the commercial sector
….the rise of data-driven journals, e.g.:




                                            partnering with:
core organization in the

                              UK node
                           work in progress

        UK Node
reasoning visualization
            analysis browsing integration
               exchange retrieval

Community                                   Software
Standards                                    Tools
                 Well-annotated &
                 Structured Data


               Reproducible &
                  Reusable
             Bioscience Research
An exemplar approach to the status quo

§  A grass-root collaborative that works to facilitate collection, curation
    and sharing of experiments using a common, structured representation
    of the experiments that
     •  transcends individual biological and technological domains and
     •  can be ‘configured’ to implement (several of) the community
         standards
metadata tracking framework




                                                           user community



The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                               www.ebi.ac.uk/net-project
collection, curate and sharing of bioscience experiments
A growing ecosystem of over 30 public and internal resources using the
ISA metadata tracking framework to facilitate standards-compliant
collection, curation, management and reuse of investigations in an
increasingly diverse set of life science domains, including:

•    environmental health                                 •    stem cell discovery
•    environmental genomics                               •    system biology
•    metabolomics                                         •    transcriptomics
•    metagenomics                                         •    toxicogenomics
•    nanotechnology                                       •    also by communities working to build
•    proteomics,                                               a library of cellular signatures



            TOWARDS INTEROPERABLE BIOSCIENCE DATA                            Feb 2012
            Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W,
            Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA,
            Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J,
            Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S,
            Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A,
            Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
Implementations at Harvard




  Importance of a local community
Implementations at Harvard




data sharing
 in ISA-Tab




               Importance of a local community
Implementations at Harvard




data sharing
 in ISA-Tab




               Importance of a local community
Implementation at the EBI




40   The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                    www.ebi.ac.uk/net-project
Extensions of the



          Nanotechnology
     Informatics Working Group




41    The International Conference on Systems Biology (ICSB), 22-28 August, 2008   Susanna-Assunta Sansone
                                                                                     www.ebi.ac.uk/net-project
We must increase the level of annotation




   Notes in Lab Books       Spreadsheets and Tables   Facts as RDF statements
   (information for humans) ( the compromise)         (information for machines)

•  Invest in curating and manage data at the source using:
    •  a common metadata tracking framework, such as ISA
    •  publicly available and community-developed terminologies
    •  recording sufficient contextual information of the experimental steps
§  Progressively datasets will become more comprehensible, interoperable,
    reproducible and (re)usable, underpinning future investigations
Collaborative approaches are highly valuable but take time


Community involvement and uptake!
1st ISA-Tab workshop! 3rd ISA-Tab workshop!      User workshops/visits - start!   1st public instance: !
       2nd ISA-Tab workshop!                              Other tools implement ! Harvard Stem Cell ! Growing number of
                                                          ISA-Tab!                Discovery Engine! systems starts to adopt
                                                                                                         ISA framework!



Core developments!
                                                                                  Conversions to !                Links to
                                                                                  Pride-XML/SRA-XML/!             analysis tools
Strawman ISA-Tab spec!                            ISA software v1!                MAGE-Tab and more!              starts!
                      Final ISA-Tab spec!            Database instance !
                                                     at EBI!                                      RDF format starts!



Publications!
                                                                                                       Stem Cell !
                                                                           ISA-Tab and !               Discovery ! ISA Commons!
                                               Omics data sharing!
            Workshop reports!                                              ISA software suite!         Engine!
                                              (Science)!                                                           (Nature Genetics)!
                                                                           (Bioinformatics)!           (NAR)!




2007              2008                      2009                     2010                        2011                    2012
Development timeline
Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source

Más contenido relacionado

Similar a Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source

Towards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesTowards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesMathieu d'Aquin
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptNicolas Bertrand
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinSimon Jupp
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Paolo Missier
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldiBruce Heterick
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldiBruce Heterick
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicDavid De Roure
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowEric Stephan
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesAnita de Waard
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Ross Mounce
 
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...dolleyj
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONIJwest
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION dannyijwest
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Susanna-Assunta Sansone
 
Where are we going and how are we going to get there?
Where are we going and how are we going to get there?Where are we going and how are we going to get there?
Where are we going and how are we going to get there?David De Roure
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingSemantic Web Company
 
Love for science or 'Academic Prostitution' - ERC talk version
Love for science or 'Academic Prostitution' - ERC talk versionLove for science or 'Academic Prostitution' - ERC talk version
Love for science or 'Academic Prostitution' - ERC talk versionLourdes Verdes-Montenegro
 
G. Regalbuto Bentley Dissertation
G. Regalbuto Bentley DissertationG. Regalbuto Bentley Dissertation
G. Regalbuto Bentley DissertationGloria Bentley
 

Similar a Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source (20)

Towards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologiesTowards an ecosystem of data and ontologies
Towards an ecosystem of data and ontologies
 
Poster Semantic data integration proof of concept
Poster Semantic data integration proof of conceptPoster Semantic data integration proof of concept
Poster Semantic data integration proof of concept
 
Ontologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlinOntologies neo4j-graph-workshop-berlin
Ontologies neo4j-graph-workshop-berlin
 
Doctoring Strange Results
Doctoring Strange ResultsDoctoring Strange Results
Doctoring Strange Results
 
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009Invited talk at the GeoClouds Workshop, Indianapolis, 2009
Invited talk at the GeoClouds Workshop, Indianapolis, 2009
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldi
 
2011.11.03.charleston.ldi
2011.11.03.charleston.ldi2011.11.03.charleston.ldi
2011.11.03.charleston.ldi
 
The Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and MusicThe Evolution of e-Research: Machines, Methods and Music
The Evolution of e-Research: Machines, Methods and Music
 
The Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and WorkflowThe Symbiotic Nature of Provenance and Workflow
The Symbiotic Nature of Provenance and Workflow
 
Towards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data ServicesTowards Incidental Collaboratories; Research Data Services
Towards Incidental Collaboratories; Research Data Services
 
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
 
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
ICBO 2018 Poster - Current Development in the Evidence and Conclusion Ontolog...
 
A Clean Slate?
A Clean Slate?A Clean Slate?
A Clean Slate?
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATIONONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION ONTOLOGY SERVICE CENTER: A DATAHUB FOR  ONTOLOGY APPLICATION
ONTOLOGY SERVICE CENTER: A DATAHUB FOR ONTOLOGY APPLICATION
 
Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014Oxford DTP - Sansone curation tools - Dec 2014
Oxford DTP - Sansone curation tools - Dec 2014
 
Where are we going and how are we going to get there?
Where are we going and how are we going to get there?Where are we going and how are we going to get there?
Where are we going and how are we going to get there?
 
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge ModellingTaxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
Taxonomies and Ontologies – The Yin and Yang of Knowledge Modelling
 
Love for science or 'Academic Prostitution' - ERC talk version
Love for science or 'Academic Prostitution' - ERC talk versionLove for science or 'Academic Prostitution' - ERC talk version
Love for science or 'Academic Prostitution' - ERC talk version
 
G. Regalbuto Bentley Dissertation
G. Regalbuto Bentley DissertationG. Regalbuto Bentley Dissertation
G. Regalbuto Bentley Dissertation
 

Más de Susanna-Assunta Sansone

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRSusanna-Assunta Sansone
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesSusanna-Assunta Sansone
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookSusanna-Assunta Sansone
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessSusanna-Assunta Sansone
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features Susanna-Assunta Sansone
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseSusanna-Assunta Sansone
 

Más de Susanna-Assunta Sansone (20)

FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
FAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdfFAIRsharing-Standards-4-GSC-Aug23.pdf
FAIRsharing-Standards-4-GSC-Aug23.pdf
 
FAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdfFAIR-4-GSC-Sansone-Aug23.pdf
FAIR-4-GSC-Sansone-Aug23.pdf
 
FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023FAIRsharing & FAIRcookbook at RDA 2023
FAIRsharing & FAIRcookbook at RDA 2023
 
NFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIRNFDI Physical Sciences Colloquium - FAIR
NFDI Physical Sciences Colloquium - FAIR
 
Metadata Standards
Metadata StandardsMetadata Standards
Metadata Standards
 
FAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-SingaporeFAIRcookbook: GSRS22-Singapore
FAIRcookbook: GSRS22-Singapore
 
FAIR Cookbook
FAIR Cookbook FAIR Cookbook
FAIR Cookbook
 
FAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipesFAIR, community standards and data FAIRification: components and recipes
FAIR, community standards and data FAIRification: components and recipes
 
FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook FAIRsharing and the FAIR Cookbook
FAIRsharing and the FAIR Cookbook
 
FAIRsharing for EOSC
FAIRsharing for EOSC FAIRsharing for EOSC
FAIRsharing for EOSC
 
FAIR: standards and services
FAIR: standards and servicesFAIR: standards and services
FAIR: standards and services
 
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR CookbookFAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
FAIRification is a Team Sport: FAIRsharing and the FAIR Cookbook
 
FAIRsharing: what we do for policies
FAIRsharing: what we do for policiesFAIRsharing: what we do for policies
FAIRsharing: what we do for policies
 
FAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRnessFAIRsharing: how we assist with FAIRness
FAIRsharing: how we assist with FAIRness
 
ELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - ExamplarsELIXIR FAIR Activities - Examplars
ELIXIR FAIR Activities - Examplars
 
FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features FAIRsharing - focus on standards and new features
FAIRsharing - focus on standards and new features
 
FAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 responseFAIR data and standards for a coordinated COVID-19 response
FAIR data and standards for a coordinated COVID-19 response
 
FAIRsharing poster
FAIRsharing posterFAIRsharing poster
FAIRsharing poster
 
The FAIR Cookbook poster
The FAIR Cookbook posterThe FAIR Cookbook poster
The FAIR Cookbook poster
 

Sa sansone dccroadshow-nov2012Delivering reproducible bioscience data by enabling biocuration at the source

  • 1. www.slideshare.net/SusannaSansone Delivering reproducible bioscience data by enabling biocuration at the source Susanna-Assunta Sansone, PhD Principal Investigator and Team Leader, University of Oxford e-Research Centre, Oxford, UK Academic Consultant, Open Access Data Products, Nature Publishing Group Data Curation Centre (DCC) 13th Regional Data Management Roadshow, London, 20 November 2012
  • 2. University of Oxford e-Research Centre
  • 3. University of Oxford e-Research Centre Providing research computing, high- performance computing Integrating with national and international infrastructure Supporting leading edge facilities through education and training
  • 4. University of Oxford e-Research Centre Collaborating with European and wider international groups in, e.g.: •  energy, •  radio astronomy, •  biological data federation, •  life sciences simulation, •  biodiversity, •  computational chemistry, •  neuroscience, •  digital humanities tools, •  digital music analysis •  visualization •  …
  • 5. My team’s activities and stakeholders we work with data management and biocuration, collaborative development of software and database, standards and ontology •  environmental genomics •  stem cell discovery •  metabolomics •  system biology •  metagenomics •  transcriptomics •  nanotechnology •  toxicogenomics •  proteomics •  environmental health
  • 6. Outline “The buzz around reproducible bioscience data: the communities and the standards” “The reality from the buzz: challenges and exemplar project”
  • 8. C E O M R H N P E S I B L E http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  • 9. C E O M R H N P E S I N R E I T E O P R A B L E http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  • 10. C E O M R H N P E S I N R E I T E O P R A R E U S B L E http://www.flickr.com/photos/notbrucelee/8016189356/ CC BY
  • 11. experimental design sample characteristic(s) experimental variable(s) technology(s) measurement(s) protocols(s) data file(s) ...... 11 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 12. §  We must strike a balance between •  depth and breadth of information; and •  sufficient information required to reuse the data §  Capture all salient features of the experimental workflow §  Make annotation explicit and discoverable §  Structure the descriptions for consistency, tracking 12 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 13. Growing, worldwide movement for reproducible research esoteric formats comprehensible? lack of sufficient contextual interoperable? information hoc or proprietary reusable? terminologies Source: http://ebbailey.wordpress.com §  Researchers and bioinformaticians in both academic and commercial science, along with funding agencies and publishers, embrace the concept that community-developed standards are pivotal to structure and enrich the annotation of •  entities of interest (e.g., genes, metabolites, phenotypes) and •  experimental steps (e.g., provenance of study materials, technology and measurement types)
  • 14. Community mobilization to develop standards, e.g.: use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information
  • 15. Is this general mobilization good or bad? use the same word and allow data to flow from report the same core, refer to the same ‘thing’ one system to another essential information §  Fragmentation of the standards is a major issue •  Being focused on particular communities’ interests, be their individual technologies or biological/biomedical disciplines, leads to duplication of effort, and more seriously, the development of (largely arbitrarily) different standards •  This severely hinders the interoperability of databases and tools and ultimately the integration of datasets
  • 16. Growing number of reporting standards MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 17. Growing number of reporting standards + 303 + 150 + 130 Source: MIBBI, Source: BioPortal EQUATOR Estimated Databases, annotation, curation tools MAGE-Tab! AAO! miame! GCDML! MIAPA! CHEBI! SRAxml! OBI! MIRIAM! VO! SOFT! MIQAS! FASTA! PATO! MIX! CML! ENVO! REMARK! DICOM! MIGEN! GELML! MOD! SBRML! MIAPE! MIQE! TEDDY! MITAB! MzML! XAO! CIMR! CONSORT! BTO! ISA-Tab! SEDML…! DO PRO! IDO…! MIASE! MISFISHIE….!
  • 18. But how much do we know about these standards Which tools and I use high throughput databases sequencing technologies, implement which which one are applicable standards? to me? How can I get What are the involved to criteria to evaluate propose their status and extensions or value? modifications? Which one are I work on plants, mature enough for are these just for me to use or biomedical recommend? applications?
  • 19. A catalogue to map the landscape of standards and the systems implementing them: Over 400 bio-standards (public and in curation) Field*, Sansone* et al., Omics data sharing. Science 19 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone 326, 234-36 (2009) doi:0.1126/science.1180598 www.ebi.ac.uk/net-project
  • 20. •  A coherent, curated and searchable catalogue of data sharing resources •  Bioscience standards and associated data-sharing policies, publications, tools and databases •  Assessment criteria for usability and popularity of standards •  Relationships among standards •  Encouragement for communication & interaction among groups •  Promoting interoperability & informed decisions about standards
  • 21.
  • 22. Social engineering 22 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 23. Ownership of open standards can be problematic in broad, grass-root collaborations; it requires improved models, to encourage maintenance of and contributions to these efforts, supporting their evolutions 23 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 24. The extensive community liaison needs to be managed and funded; rewards and incentives need to be identified for all contributors 24 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 25. The cost of implementing a standards-supported data sharing vision is as large as the number of stakeholders that must operate synchronously 25 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 26. Funders are actively developing data policies
  • 27. Similar trend in the regulatory arena…
  • 28. … and in the commercial sector
  • 29. ….the rise of data-driven journals, e.g.: partnering with:
  • 30.
  • 31. core organization in the UK node work in progress UK Node
  • 32. reasoning visualization analysis browsing integration exchange retrieval Community Software Standards Tools Well-annotated & Structured Data Reproducible & Reusable Bioscience Research
  • 33. An exemplar approach to the status quo §  A grass-root collaborative that works to facilitate collection, curation and sharing of experiments using a common, structured representation of the experiments that •  transcends individual biological and technological domains and •  can be ‘configured’ to implement (several of) the community standards
  • 34. metadata tracking framework user community The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 35. collection, curate and sharing of bioscience experiments
  • 36. A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including: •  environmental health •  stem cell discovery •  environmental genomics •  system biology •  metabolomics •  transcriptomics •  metagenomics •  toxicogenomics •  nanotechnology •  also by communities working to build •  proteomics, a library of cellular signatures TOWARDS INTEROPERABLE BIOSCIENCE DATA Feb 2012 Sansone SA, Rocca-Serra P, Field D, Maguire E, Taylor C, Hofmann O, Fang H, Neumann S, Tong W, Amaral-Zettler L, Begley K, Booth T, Bougueleret L, Burns G, Chapman B, Clark T, Coleman LA, Copeland J, Das S, de Daruvar A, de Matos P, Dix I, Edmunds S, Evelo C, Forster M, Gaudet P, Gilbert J, Goble C, Griffin J, Jacob D, Kleinjans J, Harland L, Haug K, Hermjakob H, Sui S, Laederach A, Liang S, Marshall S, Merrill E, McGrath A, Reilly D, Roux M, Shamu C, Shang C, Steinbeck C, Trefethen A, Williams-Jones B, Wolstencroft K, Xenarios J, Hide W.
  • 37. Implementations at Harvard Importance of a local community
  • 38. Implementations at Harvard data sharing in ISA-Tab Importance of a local community
  • 39. Implementations at Harvard data sharing in ISA-Tab Importance of a local community
  • 40. Implementation at the EBI 40 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 41. Extensions of the Nanotechnology Informatics Working Group 41 The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
  • 42. We must increase the level of annotation Notes in Lab Books Spreadsheets and Tables Facts as RDF statements (information for humans) ( the compromise) (information for machines) •  Invest in curating and manage data at the source using: •  a common metadata tracking framework, such as ISA •  publicly available and community-developed terminologies •  recording sufficient contextual information of the experimental steps §  Progressively datasets will become more comprehensible, interoperable, reproducible and (re)usable, underpinning future investigations
  • 43. Collaborative approaches are highly valuable but take time Community involvement and uptake! 1st ISA-Tab workshop! 3rd ISA-Tab workshop! User workshops/visits - start! 1st public instance: ! 2nd ISA-Tab workshop! Other tools implement ! Harvard Stem Cell ! Growing number of ISA-Tab! Discovery Engine! systems starts to adopt ISA framework! Core developments! Conversions to ! Links to Pride-XML/SRA-XML/! analysis tools Strawman ISA-Tab spec! ISA software v1! MAGE-Tab and more! starts! Final ISA-Tab spec! Database instance ! at EBI! RDF format starts! Publications! Stem Cell ! ISA-Tab and ! Discovery ! ISA Commons! Omics data sharing! Workshop reports! ISA software suite! Engine! (Science)! (Nature Genetics)! (Bioinformatics)! (NAR)! 2007 2008 2009 2010 2011 2012 Development timeline