Más contenido relacionado La actualidad más candente (20) Similar a Data and model management in Systems Biology (20) Más de University Medicine Greifswald (18) Data and model management in Systems Biology1. Data and model management
in Systems Biology
Dagmar Waltemath
University of Rostock, Germany
Kinetics on the move – Happy 10th
anniversary to SABIO-RK!
Heidelberg, 31st
May, 2016
http://www.slideshare.net/dagwa/data-and-model-management-in-systems-biology
2. 2
Junior research group: Management of
simulation studies in systems biology
Tool development: SBGN-ED for the
graphical representation of networks
Infrastructure: Data management for
systems biology in Germany
Standards and tools for model management
www.sems.uni-rostock.de
3. © 2009 UNIVERSITÄT ROSTOCK 3
NBI-SysBio: Data management for systems biology in Germany
3
●
Sustainable infrastructure for data management
● Access to documented and reproducible results
● Systems Biology Standards
●
Tool Development
● Education
www.denbi.de (training – services – jobs)
4. © 2009 UNIVERSITÄT ROSTOCK 4
Photo: NY - http://nyphotographic.com (CC BY-SA 3.0) Photo: janneke staaks on flickr
Fig. courtesy 10.1371/journal.pbio.1001779
TM
5. © 2009 UNIVERSITÄT ROSTOCK 5
Data management is …
●
Data management describes procedures and actions that
help to store, preserve, organize and control the data
generated during a (research) project.
●
Aspects of data management include:
– Data Ownership;
– Metadata Compilation;
– Data Lifecycle Control;
– Data Quality;
– Data Access and Dissemination Photo: NY - http://nyphotographic.com (CC BY-SA 3.0)
6. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 6
●
Data about data
●
Improved understanding of encoded data items
●
Descriptive details
●
Discovery and search for existing data, online browsing of data
●
Standardized and structured information
– Purpose, origin, time references, geographic location, creator, access conditions,
and terms of use of your data collection
●
Often encoded in ontologies
https://www.libraries.psu.edu/psul/pubcur/what_is_dm.html#data-management
Metadata
7. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 7
●
Well-structured, controlled vocabularies
●
Capture and convey commonly agreed definitions and concepts in a domain
●
Communication across people and software tools
●
Enable reuse of domain knowledge
●
Make implicit domain knowledge explicit and queryable
●
Bio-ontologies
– Gene Ontology, ChEBI, UniProt
– Systems Biology Ontology (concepts and terminology for modeling)
Ontologies
8. 8
Example: Definition of „cell growth“ in the Gene Ontology
5/31/16
id: GO:0016049
name: cell growth
namespace: biological_process
def: "The process in which a cell
irreversibly increases in size over
time by accretion and biosynthetic
production of matter similar to that
already present."
synonym: "cell expansion" RELATED []
synonym: "cellular growth" EXACT []
synonym: "growth of cell" EXACT []
is_a: GO:0009987 ! cellular process
is_a: GO:0040007 ! Growth
relationship: part_of GO:0008361 !
regulation of cell size
© 2009 UNIVERSITÄT ROSTOCK
9. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 9
●
Increased confidence and trust in the data
●
Better understanding of how to use the data, and of the data itself
●
Better data quality
●
Coherent data when standards are used
●
Improved business processes (saving time, guaranteeing high quality)
●
Improved access to data and improved reproducibility
●
Better exploitation of data through easier data exchange and
integration
Advantages of careful & planned data management
10. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 10
●
Reusable
●
Exchangeable
●
Interoperable
●
Long-term available (in open repositories)
●
Curateable
●
Shareable
Advantages of standardised data
11. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 11
Photo: janneke staaks on flickr
12. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 12
Research data in the modeling life cycle
Models
equations,
parameters,
data tables
Ideas
text,
drawings
Experimental
results
text,
data tables
Publications
text,
figures
Analyses
configuration files,
data tables
Fig. courtesy Martin Scharm (adapted)
13. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 13
Research data in the modeling life cycle
●
Mathematical formulae
●
Networks, diagrams
●
Image data
●
Publications
●
Experiment descriptions
●
Experimental results (both lab and simulation)
●
Definitions of things (e.g., gene functions, chemical structures...)
Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748;
(3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651
14. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 14
●
Heterogenuous
●
Highly connected
●
Context-dependent
●
Distributed
●
Big
Research data in the modeling life cycle
Figures top to bottom: (1) By Noah A. Rosenberget al. Slightly modified by User:Wobble. - Public Library of Science, CC BY 3.0,
https://commons.wikimedia.org/w/index.php?curid=2839383; (2) By http://rsb.info.nih.gov/ij/images/, Public Domain, https://commons.wikimedia.org/w/index.php?curid=655748;
(3) BIOM005, generated using CellDesigner 4, (4,5) PMID:18669651
15. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 15
The model
●
Mathematical equations
●
Biological entities
●
Kinetic information
●
Encoding: & semantic annotations
TM
<bqmodel:isDescribedBy>
<rdf:Bag>
<rdf:li rdf:resource="http://identifiers.org/pubmed/18669651"/>
</rdf:Bag>
</bqmodel:isDescribedBy>
<parameter id="parameter_49" name="L" metaid="metaid_0000078" value="20670"/>
16. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 16
SBML – Standard for model encoding
●
Systems Biology Markup Language
●
Community-driven de-facto Standard
●
Free & open source: www.sbml.org
●
Supported by many organizations and tools
●
Encodes computational models of biological processes
(compartments – species – reactions - parameters)
17. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 17
SBGN – Standard for visual representation
●
Systems Biology Graphical Notation
●
Standardised glyphs for biological entities
●
Three levels
– SBGN-AF | SBGN-ER | SBGN-PD
●
Free & open source: www.sbgn.org
●
Tool support
●
Interpretable Format: SBGN-ML
Fig.: http:sbgn.org
18. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 18
Fig.: SBGN map for BIOM183, CellDesigner
SBGN – Standard for visual representation
Fig.: SBGN map for BIOM005, CellDesigner
19. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 19
●
Reproduce behaviour of the model
●
Publish and share virtualexperiments
– Simulation setup / conditions
– Pre- and post-processing
– Observations
●
Encoding: & & result data in Excel, CSV files
<listOfSimulations>
<uniformTimeCourse id="sim1" initialTime="0" outputStartTime="0"
outputEndTime="100" numberOfPoints="100">
<algorithm kisaoID="KISAO:0000019"/> </uniformTimeCourse>
</listOfSimulations>
The analysis
Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/
20. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 20
SED-ML – Standard for model analysis
●
Links to models used in an analysis
●
Pre- and Post-processing of models
●
Type of simulation
●
Definition of output
●
Free an open source: www.sed-ml.org
●
Tool support
→Showcase your tool support online ←
21. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 21
SED-ML – Standard for model analysis
Fig. M. Stefan et al, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2596252/
Simulation of BIOM183 in SED-ML Web Tools without simulation description
22. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 22
m n
Coordinate annual meetings
Simulation
GuidelinesOntologies
- Next HARMONY:
Auckland, June 7-11, 2016
- Next COMBINE:
Newcastle, Sep 19-23, 2016
Coordinate standards development
- Common procedures
- Interoperable software tools
- Discussion forums, mailing lists...
Represent community
- Funders
- Other communities
Provide standards resources
- Single entry point
- Resolvable URI
- Web infrastructure
23. Standard-compliant software tools for modeling
5/31/16 © 2009 UNIVERSITÄT ROSTOCK 23
The path2models project integrated data from different databases into
more than 140.000 SBML models.
Fig.: Büchel et al BMC Sys Biol (2013)http://www.ebi.ac.uk/biomodels-main/path2models
24. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 24
The Systems Biology Workbench is a software framework to help
heterogeneous application components communicate with each other.
Modeling
Editing
Simulating
Analysinghttp://sbw.sourceforge.net
Standard-compliant software tools for modeling
25. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 25
The decision whether and how to share data often rests with researchers. Roche DG, Lanfear R, Binning SA, Haff TM, Schwanz LE, et al. (2014)
Troubleshooting Public Data Archiving: Suggestions to Increase Participation. PLoS Biol 12(1): e1001779. doi:10.1371/journal.pbio.1001779
26. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 26
●
Bundling files
●
Shipping results
●
Exchanging data
●
Keeping provenance
●
Encoding: zip-like file with a manifest (meta-data)
●
Generate, modify & share through WebCAT
COMBINE Archive
27. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 27
COMBINE Archive
Original
publication
SBGN map
SBML model versions
SED-ML files
Open in Webcat
Open in SEEK
28. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 28
Model curation & publication
29. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 29
Model curation & publication
30. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 30
Model curation, simulation & publication
31. 5/31/16 © 2009 UNIVERSITÄT ROSTOCK 31
Introduction to SEEK & FAIRDOM by Olga Krebs.
32. 32
Thank you for your attention.
http://www.denbi.de/ @SemsProject
m nhttp://co.mbine.org