Lecture from whole-cell modeling summer school March 3-9, 2015 at the University of Rostock. See http://sites.google.com/site/vwwholecellsummerschool/ for more information.
15. Whole-cell modeling
A grand challenge of the 21st century
– Masaru Tomita
Biology urgently needs a theoretical basis to unify it
– Sydney Brenner
The ultimate test of understanding a simple cell,
more than being able to build one, would be to build
a computer model of the cell
– Clyde Hutchison
32. 2. Choose model scope
• Explicitly represent each metabolite, gene, RNA, and protein species
• Explicitly model the function of every characterized gene product
• Account for the metabolic cost of every uncharacterized gene product
• Represent important, well-characterized molecules individually
45. •Large parameter space
•Stochastic model
•Large computational cost
•Heterogeneous data
•Little dynamic, single cell data
5. Identify parameters
46. Model reduction enables parameter identification
3. Manually tune parameters
using full model
1. Reduce model
Time
ModelExperiment
Molecule
Molecule
2. Identify reduced model
parameters using
traditional methods
61. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
62. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
63. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
64. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
66. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
67. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Model validated by experiments and theoryValidate model against experiments and theory
68. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Model validated by experiments and theoryValidate model against experiments and theory
69. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
71. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Model validated by experiments and theoryValidate model against experiments and theory
72. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
73. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
74. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
75. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
76. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
77. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
78. Matches training data
Cell mass, volume
Biomass composition
RNA, protein expression, half-lives
Superhelicity
Matches published data
Metabolite concentrations
DNA-bound protein density
Gene essentiality
Matches new data
Wild-type growth rate
Disruption strain growth rates
Matches theory
Mass conservation
Central dogma
Cell theory
Evolution
No obvious errors
Plot model predictions
Manually inspect data
Compare to known biology
Software stable
Simulation code is stable
Tests passing
Validate model against experiments and theory
87. •How can we model more complex physiology?
• Transcriptional regulation
• Translational regulation
• Stochastic death, failure modes
• Higher-order meta-stable states
• Resource distribution
• Aging
• Evolution
• Populations
•How can we model more complex organisms?
• Larger bacteria
• Eukaryotes
• Multicellularity
• Humans
•How can we use models to direct engineering?
Open challenges
88. Whole-cell modeling course
1. Teach whole-cell modeling
• Model biological systems
• Construct dynamical models
• Integrate models
2. Improve implementation
• Reusable
• Standard
• Open
3. Improve methodology
90. • Karr JR et al. (2012) A Whole-Cell Computational Model Predicts Phenotype from
Genotype. Cell, 150, 389-401.
• Macklin DN, Ruggero NA, Covert MW (2014) The future of whole-cell
modeling. Curr Opin Biotechnol, 28C, 111-115.
• Shuler ML, Foley P, Atlas J (2012). Modeling a minimal cell. Methods Mol
Biol, 881, 573-610.
• Joyce AR, Palsson BØ (2007). Toward whole cell modeling and simulation:
comprehensive functional genomics through the constraint-based approach. Prog
Drug Res 64, 267-309.
• Tomita M (2001). Whole-cell simulation: a grand challenge of the 21st century.
Trends Biotechnol 6, 205-10.
• Surovtsev IV et al. (2009) Mathematical modeling of a minimal protocell with
coordinated growth and division. J Theor Biol, 260, 422-9.
Recommended reading
91. • Thiele I et al. (2009). Genome-scale reconstruction of Escherichia coli's
transcriptional and translational machinery: a knowledge base, its mathematical
formulation, and its functional characterization. PLoS Comput Biol. 5, e1000312.
• Orth JD, Thiele I, Palsson BØ (2010). What is flux balance analysis? Nat
Biotechnol, 28, 245-8.
• Covert MW et al (2008). Integrated Flux Balance Analysis Model of Escherichia coli.
Bioinformatics 24, 2044–50.
• Covert MW et al (2004). Integrating high-throughput and computational data
elucidates bacterial networks. Nature, 429, 92-6.
Recommended reading: FBA
Notas del editor
Toward this goal we have built a gene-complete, computational model of a single bacterial cell which
Integrates all cellular processes into a single computational model, providing a unified understanding of cellular physiology, which
Predicts the dynamics of every molecule and process,
Can be used to guide experimental design and inform data analysis
Hopefully in the future, in concert with emerging genome-scale DNA synthesis techniques, can be used to guide rational engineering of biological systems
Protein expression: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2778844/figure/F4/
A central challenge in cell biology is to understand the molecular basis of cellular behaviors, that is:
to understand how cellular behaviors arise from the interactions of biomolecules,
to understand how cellular processes are controlled and coordinated at the molecular level,
to understand how cells allocate limited molecular resources for growth and maintainence,
and more.
Lack of data
Time scales
Heterogeneity of experimental and computational methods across cell biology subfields
Lack of correlated measurements (cell properties measured independtely
Measurements are bulk (not single cells) (also obscures correlations, causation)
Lack of data
Time scales
Heterogeneity of experimental and computational methods across cell biology subfields
Lack of correlated measurements (cell properties measured independtely
Measurements are bulk (not single cells) (also obscures correlations, causation)
Havugimana et al., 2012; Yan et al., 2010; Sachs et al., 2005; Orth et al., 2010
Lack of data
Time scales
Heterogeneity of experimental and computational methods across cell biology subfields
Lack of correlated measurements (cell properties measured independtely
Measurements are bulk (not single cells) (also obscures correlations, causation)
M. genitalium is a tractable model organism
Modular architecture integrates 28 processes
Model broadly predicts single-cell physiology
Model reproduces previously observed data
Model provides insights into complex phenotypes
M. genitalium is a tractable model organism
Modular architecture integrates 28 processes
Model broadly predicts single-cell physiology
Model reproduces previously observed data
Model provides insights into complex phenotypes
http://www.pbs.org/wgbh/nova/sciencenow/dispatches/images/050707-mgenitalium.jpg
Tractable genome
75% annotated
Little overlapping function
Genomic synthesis
Genome-scale datasets
To maximize the tractability of our model we made a few simplifying assumptions. First, we chose to model Mycoplasma genitalium, the smallest-known freely living organism which is believed to have evolved by a massive degenerative evolution from gram positive bacteria to tractably sized genome of 580 kb containing just 525 genes, 75% of which are functionally annotated.
To maximize the tractability of our model we made a few simplifying assumptions. First, we chose to model Mycoplasma genitalium, the smallest-known freely living organism which is believed to have evolved by a massive degenerative evolution from gram positive bacteria to tractably sized genome of 580 kb containing just 525 genes, 75% of which are functionally annotated.
Second, recognizing the modularity of biology and the separation of time scales of biological processes, we built our model by composition enabling us at short time scales to model each cellular process independently, using the most appropriate mathematical representation and experimental data for each cellular process.
- Separation of time scales
- Choose appropriate representations and parameterizations
- Creatively decouple representations
Computational reconcile and decouple parameters
A module is:
Independent physiologic function
Independent enzyme complement
Internal time scale faster than time scales of interactions with other modules
Factorization of state space and transfer functions
Module methods:
Time evolution
Interface to core simulation -- references to metabolites, enyzmes; RNAs, proteins, etc.
Resource (energy) requirement during simulation
Initial conditions
Fit growth rate
Expected resource requirements (contribution metabolism objective)
Pre-processing, memory allocation
Options, parameters, indices, pre-processed data, predicted time courses
Plotting, printing, saving, loading
Our model is then executed by first executing 28 individual models of cellular processes at a 1 s time scale, second integrating the inputs and outputs of the individual models, for example, and finally repeating this process tens of thousands of times across the length of the Mycoplasma genitalium cell cycle.
Our model is then executed by first executing 28 individual models of cellular processes at a 1 s time scale, second integrating the inputs and outputs of the individual models, for example, and finally repeating this process tens of thousands of times across the length of the Mycoplasma genitalium cell cycle.
Our 27 models are each based on extensive curation of the literature and are implemented separately using different mathematical representations. For example, we constructed the replication module by first considering the molecular mechanism of replication, interactions between replication and DNA, the enzymes involved in catalyzing replication, and the metabolic resources required for replication.
This motivated us to build a model of replication which would account for:
The formation of the replication bubble and DnaA complex disassembly at the oriC
DNA unwinding, replication bubble progression, and leading strand polymerization toward the terC
Discontinuous lagging strand primer and DNA polymerization
Okazaki fragment ligation
Interactions between DNA polymerase and other DNA-bound proteins and DNA damage
Next we built data structures which enable us to represent the specific location and size of all of the replication machinery and other DNA-bound proteins and DNA modifications and damages at each point in the cell cycle.
Finally we modeled the initiation, progression, and termination of replication as a set of rules governing the evolution of the state of the DNA, proteins, and metabolites.
50 million ATP / cell
80 atto mol ATP / cell
60 m mol ATP / gDCW
M. genitalium is a tractable model organism
Modular architecture integrates 28 processes
Model broadly predicts single-cell physiology
Model reproduces previously observed data
Model provides insights into complex phenotypes
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
3151 simulations (192 wt, 2959 deletions)
M. genitalium is a tractable model organism
Modular architecture integrates 28 processes
Model broadly predicts single-cell physiology
Model reproduces previously observed data
Model provides insights into complex phenotypes
A central challenge in cell biology is to understand the molecular basis of cellular behaviors, that is:
to understand how cellular behaviors arise from the interactions of biomolecules,
to understand how cellular processes are controlled and coordinated at the molecular level,
to understand how cells allocate limited molecular resources for growth and maintainence,
and more.
A central challenge in cell biology is to understand the molecular basis of cellular behaviors, that is:
to understand how cellular behaviors arise from the interactions of biomolecules,
to understand how cellular processes are controlled and coordinated at the molecular level,
to understand how cells allocate limited molecular resources for growth and maintainence,
and more.