Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

FAIR Data and Model Management for Systems Biology (and SOPs too!)

MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015

FAIR Data and model management for Systems Biology

Over the past 5 years we have seen a change in expectations for the management of all the outcomes of research – that is the “assets” of data, models, codes, SOPs and so forth. Don’t stop reading. Yes, data management isn’t likely to win anyone a Nobel prize. But publications should be supported and accompanied by data, methods, procedures, etc. to assure reproducibility of results. Funding agencies expect data (and increasingly software) management retention and access plans as part of the proposal process for projects to be funded. Journals are raising their expectations of the availability of data and codes for pre- and post- publication. And the multi-component, multi-disciplinary nature of Systems Biology demands the interlinking and exchange of assets and the systematic recording of metadata for their interpretation.
Data and model management for the Systems Biology community is a multi-faceted one including: the development and adoption appropriate community standards (and the navigation of the standards maze); the sustaining of international public archives capable of servicing quantitative biology; and the development of the necessary tools and know-how for researchers within their own institutes so that they can steward their assets in a sustainable, coherent and credited manner while minimizing burden and maximising personal benefit.
The FAIRDOM (Findable, Accessible, Interoperable, Reusable Data, Operations and Models) Initiative has grown out of several efforts in European programmes (SysMO and EraSysAPP ERANets and the ISBE ESRFI) and national initiatives (de.NBI, German Virtual Liver Network, SystemsX, UK SynBio centres). It aims to support Systems Biology researchers with data and model management, with an emphasis on standards smuggled in by stealth.
This talk will use the FAIRDOM Initiative to discuss the FAIR management of data, SOPs, and models for Sys Bio, highlighting the challenges multi-scale biology presents.
http://www.fair-dom.org
http://www.fairdomhub.org
http://www.seek4science.org

  • Sé el primero en comentar

FAIR Data and Model Management for Systems Biology (and SOPs too!)

  1. 1. FAIR Data and Model Management for Systems Biology (and SOPs too!) Prof Carole Goble The University of Manchester The Software Sustainability Institute ELIXIR UK, SynBioChem Centre carole.goble@manchester.ac.uk MultiScale Biology Network Springboard meeting, Nottingham, UK, 1 June 2015
  2. 2. • Project-centric data and model management • Respect & expects other systems • Forged in fire of national & international projects • PhDs/postgrads/PIs • Context • FAIRDOM Initiative • Challenges http://www.fair-dom.org http://www.fairdomhub.org
  3. 3. republic of science* regulation of science institutions libraries *Merton’s four norms of scientific behaviour (1942) public archives cloud services
  4. 4. Reproducibility Nature, April 2015
  5. 5. https://sems.uni-rostock.de/reproducible-and-citable-data-and-models/
  6. 6. Publishers • Reproducibility • New publishable assets • New business models and services Funders, Managers • Capitalising • Skills • Justification, Audit & Compliance
  7. 7. UK Funder Data Policies http://www.dcc.ac.uk/resources/policy-and-legal/overview-funders-data-policies
  8. 8. Tools, Standards, Formats, Reporting, Policies, Practices, Initiatives
  9. 9. Data Models SOPs consistency, comparability Samples… ‘omics images, reaction kinetics, samples, specimens… Small: spreadsheets, files… Big: NGS, Mass Spec, specialist repositories… ODE, SBML, Native Matlab, PDE, Fortran, CellML… versioning, provenance tracking, parameter tracking, citation tracking, links to articles STANDARDS Asset Management
  10. 10. public archives cloud services 88 Public-Centric Asset Management
  11. 11. public archives cloud services Public-Centric Asset Management
  12. 12. Challenge: Most quantitative databases provide kinetic constants for enzymes, sometimes binding constants…. Little to help building quantitative descriptions, i.e. concentrations, sizes, diffusions…. Exceptions: gene expression data, proteomics, metabolomics. Localisation: The average concentration of a protein in a piece of brain is of limited use (mix of tissues and subcellular compartments) [Nicolas Le Novere, 2015] Public-Centric Asset Management public archives
  13. 13. FAIR for the Researcher Collaborative, data/model-driven science Publication Local and Public Resources Skills and Productivity Compliance
  14. 14. Collaboration, asset management Pop-up projects Dynamic groups Internal / external visibility
  15. 15. Pop-up projects Dynamic groups Internal / external visibility Collaboration, asset management
  16. 16. 18
  17. 17. Project-Centric Asset Management Is this data available? What SOP was used for this sample? Where is the validation data for this model? • Retain results beyond a project / the PhD student • Exchange & find assets. • Share, disseminate and publish assets sensitively • Consistent reporting for interpretation, interop & comparison • Promote standardised metadata practices. • Organise and link assets • Reuse results
  18. 18. Find Data, models, protocols, projects, people Catalogued and linked assets Link studies to assets Control sharing, versioning, gateway to scattered public/local archives Access Interoperate Standards (SBML, SED- ML…), vocabs, formats, ids harvesting, export, API Reuse Download assets Run models with exp’mtl data DOI citation
  19. 19. The Neylon Equation
  20. 20. FAIRDOM Provenance 2008 2010 2014 de.NBI 2019
  21. 21. SEEK: Science Commons Web-based Cataloguing and Rich web interface for describing, finding, linking and promoting ongoing research and outcomes. Small files, aggregates across data archives. openBIS: Scaled local LIMS and analytics Extract,Transform and Load tooling direct from the instrumentation, data analysis pipelines.Automatic archiving. Handles large data. FAIRDOM Suite
  22. 22. Personal Data Local Stores LIMS External Databases Articles Models Standards SOPs AggregatedCommons Infrastructure Über metadata, cataloguing Stores SOPs, Models, data files
  23. 23. NGS Proteomics LIMS iPortal BeeWM
  24. 24. https://doi.org/10.15490/seek.1.investigation.56
  25. 25. [Snoep, 2015] https://doi.org/10.15490/seek.1.investigation.56
  26. 26. StandardOperating Procedures Challenge: Machine processable SOPs
  27. 27. Models simulate and annotate in browser
  28. 28. Metadata standards & templates to link studies and link assets Just Enough Results Model Describes common elements and relationships between things produced and used in experiments. Structured descriptions for consistency and comparison
  29. 29. NuML [Adapted, Le Novere]
  30. 30. FAIRDOM Suite Resource FAIRDOMHub Self-managed, customised local installation. Independent, self- managed private space on shared, hosted installation. Publisher Companion Site FAIRDOMHub.org
  31. 31. FAIRDOM Suite Resource FAIRDOMHub FAIRDOM Initiative Facilities Community Networks Forums Workshops Tools Standards Support Sustainability de.NBI
  32. 32. Sys Bio Developers Foundry, Oct 2014 Heidelberg, Germany EraSysAPP meeting, April 2015, Berlin, Germany
  33. 33. PALs
  34. 34. http://seek.virtual-liver.de/ • Navigation • Single standards at one scale • Multi-type hosting “To integrate the detailed knowledge that we have at the molecular level up to the functional level at tissue/organ/whole body level “ Multi-scale? Multi-silos ….
  35. 35. Handling/converting data of different levels of detail to make the model run. Representing in the SBML model the DNA bindings at the level of detail that had been measured in the experiments Whole Cell model by Jonathan Karr (Rostock Summer School, DagmarWaltemath) Support for aggregating data to find the appropriate level of representation for a given model. Karr JR, Sanghvi JC, Macklin DN, et al. AWhole-Cell Computational Model Predicts Phenotype from Genotype. Cell. 2012;150(2):389-401. doi:10.1016/j.cell.2012.05.044.
  36. 36. Challenge: mismatches • Systems on different scales – incompatible time scales, data may be too sparse or need to be aggregated to work with another module • Different levels of complexity – comparing results from different modelling approaches. • Linking models needs thinking and standards – connecting the single standards – interfacing between the different scales – connecting (experimental/simulation) data to models
  37. 37. Challenge: model evolution BiVeS tool: diff in versions of computational models Provenance,Versioning, Parameter tracking Releasing updated versions into the literature Identifying, Interpreting, and CommunicatingChanges in XML-encoded Models of Biological Systems Scharm et. al. 2015, under revision at BIOINFORMATICS Haus et al, BMC Systems Biology, 2011, 5:10 Solvent production by Clostridium acetobutylicum [Martin Scharm]
  38. 38. F1000Research Living Figures, versioned articles, in-article data manipulation R Lawrence Force2015, Vision Award Runner Up http://f1000.com/posters/browse/summary/1097482 Simply data + code Can change the definition of a figure, and ultimately the journal article Colomb J and Brembs B. Sub-strains of Drosophila Canton-S differ markedly in their locomotor behavior [v1; ref status: indexed, http://f1000r.es/3is] F1000Research 2014, 3:176 Other labs can replicate the study, or contribute their data to a meta-analysis or disease model - figure automatically updates. Data updates time-stamped. New conclusions added via versions.
  39. 39. Challenge: reproducibility bridging from research to FAIR publishing Bergmann, Rodriguez, Le Novère. COMBINE archive specification. <http://identifiers.org/combine.specifications/omex.version-1> (2014) Describe Access Port
  40. 40. Challenge: reproducibility bridging from research to FAIR publishing DepositModel simulation Differentiated data
  41. 41. Challenge: Samples Descriptions SOP-Centric
  42. 42. Challenge: Releasing
  43. 43. Challenge: Releasing SysMO Projects (2009-2014) me ME my team close colleagues • Self-publication & Journal companionship. • Staged & Selective Hugging & Flirting. Reciprocity. • Tribal &Trading behaviours • Forgetfulness, Embargos • Resources, Benefit • Individuals more likely to share than consortia • Post-hoc rationalised Data/Model Cycles
  44. 44. Challenges: (meta)data wrangling Offseting curation debt http://rightfield.org.uk
  45. 45. FAIRDOM Challenge: Sustainability Free. Like a Free Puppy.
  46. 46. Enabling multi-scale modelling in systems medicine 1. Exploit existing data for multi-scale modelling 2. Develop SOPs and quality standards for systematic collection of quantitative data and information. 3. Identify required standards and ontologies for models and data repositories in systems medicine. 4. Develop modelling workflows for the integration of data and models; support data management, model construction and analysis. 5. Develop mathematical formalism to analyze and compare multi-scale models (parameter estimation, sensitivity analysis, identifiability analysis and image analysis). Wolkenhauer et al, Enabling multiscale modeling in systems medicine, 2014, Genome Medicine 6(3)
  47. 47. Carole Goble Stuart Owen Finn Bacall Jacky Snoep Wolfgang Mueller Olga Krebs Quyen Nguyen Natalie Stanford KatyWolstencroft Peter Kunzst Bernd Rinn fairdom@fair-dom.org fair-dom@fair-dom.org http://www.fair-dom.org http://www.fairdomhub.org http://seek4science.org http://www.rightfield.org.uk http://jjj.biochem.sun.ac.za http://sybit.net/software/openBIS Donal FellowsAlanWilliams Rostyslav Kuzyakiv Jakub Straszewski Chandrasekhar Ramakrishnan Caterina Barillari Norman Morrison

×