Model management tools for improved reproducibility in systems biology
1. Model management tools for improved
reproducibility in systems biology
Dagmar Waltemath,
on behalf of the SEMS team
University of Rostock, Germany
10th
International CellML Workshop
Auckland, June 2016
2. 2
On models and simulations
Model Simulation
Figs: BioModels (top) and DOI: 10.1073/pnas.88.16.7328 (bottom)
3. 3
Most scientific discoveries rely on previous findings.
Model
Fig.: Tyson 2001 (BIOM195)
Fig.: Tyson 1991 (BIOM005)
Successor
Fig.: History of Cell Cycle models in BioModels
4. 4
Can we rely on findings that we ourselves
cannot evaluate? (Probably not!)
“only in ~20–25% of the projects were the relevant published data completely
in line with our in-house findings (Fig. 1c). In almost two-thirds of the projects,
there were inconsistencies [..] that either considerably prolonged the duration
of the target validation process or, in most cases, resulted in termination of the
projects because the evidence [..] was insufficient to justify further investments
into these projects.” Prinz et al (2011)
5. 5
We identified key challenges of reproducibility in
systems biology and systems medicine.
Lack of data standards – Lack of data quality and quantity – Lack of data availability – Lack of transparency
6. 6
A lack of data availability makes it impossible
for researchers to reproduce results.
● Model code in BioModels, including
supplemental with a how-to reproduce
the figures given in the original paper
● Online tool makes data available
and browseable
TriplexRNA
Recon 2Recon 2
● Publication backed up with a website
containing the supplemental material
● Model code in (non-curated) BioModels
● Visualisation of the model can easily
be explored
● References to original works
How can we support scientists
who wish to share model-based results?
Issues
– Simulation studies comprise
of several files
– Data is heterogeneous,
distributed, complex
– Data changes over time
– Documentation of the how
the study was performed
often missing
7. 7
A lack of data availability makes it impossible
for researchers to reproduce results.
How can we support scientists
who wish to share model-based results?
Issues
– Simulation studies comprise
of several files
– Data is heterogeneous,
distributed, complex
– Data changes over time
– Documentation of the how
the study was performed
often missing
Our solutions
– Tool support for the
COMBINE Archive –
lowering the effort to share
reproducible models
– Graph-based storage of
model-related files –
integrated & searchable
virtual experiments
– Model version control –
towards a provenance of
models
8. 8
The COMBINE archive bundles all files necessary
to reproduce a simulation study.
COMBINE archive toolkit
● manage COMBINE archives
– Explore
– Edit
– Share
– Publish
● Used in: PMR 2, JWS Online,
SED-ML Web Tools, OpenCor …
WebCAT, Scharm et al 2014
9. 9
STON, SED-ML DB & MASYMOS
Integrated storage & retrieval
system (MASYMOS)
doi: 10.1093/database/bau130
doi: 10.1186/s13326-015-0014-4
Search across heterogeneous data,
ontologies, and structures→poster
Tailor-made storage systems
(STON, SED-ML DB)
Using graph databases to integrate
standardised model-based data
https://dx.doi.org/10.6084/m9.figshare.3382993.v1
SED-ML DB in JWS Online
BioModelsPhysiome Model repository
10. 10
BiVeS & COMODI
Model version control
(BiVeS, COMODI)
Provenance-to-be (COMODI)
Tracking the evolution of a
CellML/SBML model over time
doi: 10.1093/bioinformatics/btv484
Tracking the evolution of simulation
studies and biological systems.
https://dx.doi.org/10.6084/m9.figshare.2543059.v5
Physiome Model repository
doi: 10.1093/bioinformatics/btv484
11. 11
What's next? Models for the clinic, or: Bridging the gap
between standards for systems biology & systems medicine
Fig. courtesy Atalag et al (2015) http://hdl.handle.net/2292/27911
12. Thank you for your attention.
m n @SemsProject
Martin Scharm
BiVeS, COMODI,
COMBINE Archive
Video master
Fabienne Lambusch
Pattern & structure search
in SBML models
Mariam Nassar
Rank aggregation
Tom Gebhardt
SBGN-compliant diffs
Martin Peters
M2CAT, COMBINE Archive,
SED-ML database
Vasundra Toure
STON, SBGN-ED,
SBGN symbol of the month
Ron Henkel
MASYMOS, MORRE
www.sems.uni-rostock.de
13. References
Atalag et al (2015) http://hdl.handle.net/2292/27911
Bergmann et al. (2014) F.T. Bergmann, R. Adams, S. Moodie, J. Cooper, M. Glont et al.: COMBINE archive
and OMEX format: one file to share all information to reproduce a modeling project. BMC Bioinformatics
(2014)
Prinz et al. (2011) Prinz, Florian, Thomas Schlange, and Khusru Asadullah. "Believe it or not: how much
can we rely on published data on potential drug targets?." Nature reviews Drug discovery 10.9 (2011): 712-
712.
Schmitz et al. (2014) Schmitz, Ulf, et al. "Cooperative gene regulation by microRNA pairs and their
identification using a computational workflow." Nucleic acids research (2014): gku465.
Thiele et al. (2013) Thiele, Ines, et al. "A community-driven global reconstruction of human metabolism."
Nature biotechnology 31.5 (2013): 419-425.
Waltemath & Scharm (2014) D. Waltemath and M. Scharm: Extracting reproducible simulation studies
from model repositories using the CombineArchive Toolkit. Workshop on Data Management for the Life
Sciences (2014), Hamburg, BTW 2014.
Waltemath & Wolkenhauer (2016) D. Waltemath and O. Wolkenhauer: How modeling standards, software,
and initiatives support reproducibility in systems biology and systems medicine. IEEE Transactions on
Biomedical Engineering (2016) in the press.