The document proposes a new Markov Chain Monte Carlo (MCMC) based method to generate randomized metabolic networks that impose biochemical and functional constraints. The method successively constrains the networks by (1) fixing the number of reactions, (2) fixing the number of metabolites, (3) excluding blocked reactions, and (4) requiring growth on a specified environment. Imposing these constraints causes the randomized networks to more closely match properties of real metabolic networks like E. coli. The approach generates an ensemble of diverse yet meaningful randomized networks to help identify design principles in metabolic networks.
1. Laboratoire de Physique Théorique et Modèles Statistiques (LPTMS), CNRS and Univ Paris-Sud, Orsay, France and Max Planck Institute for Mathematics in the Sciences, Leipzig, Germany Areejit Samal Randomizing genome-scale metabolic networks
2. Architectural features of real-world networks Structure of real-world networks is very different from random graphs Small Average Path Length & High Local Clustering Watts and Strogatz (1998) Small-World Ravasz et al (2002) Power law degree distribution & Hierarchical modularity Barabasi and Albert (1999) Scale-free and Hierarchical Organization Scale-free graphs can be generated using a simple model incorporating: Growth & Preferential attachment scheme
3. Network motifs and evolved design Shen-Orr et al (2002); Milo et al (2002,2004) Sub-graphs that are over-represented in real networks compared to randomized networks with same degree sequence. Certain network motifs were shown to perform important information processing tasks. For example, the coherent Feed Forward Loop (FFL) can be used to detect persistent signals. The preponderance of certain sub-graphs in a real network may reflect evolutionary design principles favored by selection.
4.
5. Edge-exchange algorithm: widely used null model Randomized networks preserve the degree at each node and the degree sequence. Investigated Network After 1 exchange After 2 exchanges Reference: Shen-Orr et al (2002); Milo et al (2002,2004); Maslov & Sneppen (2002) …
6.
7. Edge-exchange algorithm: case of bipartite metabolic networks ASPT: asp-L -> fum + nh4 CITL: cit -> oaa + ac ASPT*: asp-L -> ac + nh4 CITL*: cit -> oaa + fum Preserves degree of each node in the network But generates fictitious reactions that violate mass , charge and atomic balance satisfied by real chemical reactions!! Note that fum has 4 carbon atoms while ac has 2 carbon atoms in the example shown. Example: Guimera & Amaral (2005); Samal et al (2006) Biochemically meaningless randomization inappropriate for metabolic networks ASPT CITL fum asp-L cit ac oaa nh4 ASPT* CITL* fum asp-L cit ac oaa nh4
8. Null models for metabolic networks: Blind-watchmaker network Blind to basic constraints satisfied by biochemical reactions!!
9. Motivation of this work We propose a new Markov Chain Monte Carlo (MCMC) based method to generate meaningful randomized ensembles for metabolic networks by successively imposing biochemical and functional constraints. Ensemble R Given number of valid biochemical reactions Ensemble Rm Additionally, fix the No. of metabolites Ensemble uRm Additionally, exclude blocked reactions Ensemble uRm-V1 Additionally, functional constraint of growth on a specified environments Increasing level of constraints Constraint I Constraint II Constraint III Constraint IV
10.
11. Global Reaction Set + Biomass composition The E. coli biomass composition of iJR904 was used to determine viability of genotypes. Nutrient metabolites The set of external metabolites in iJR904 were allowed for uptake/excretion in the global reaction set. KEGG Database + E. coli iJR904 We can implement Flux Balance Analysis (FBA) within this database. Reference: Rodrigues and Wagner (2009)
12.
13. g6p glc-D f6p fdp glc-D g6p f6p fdp HEX1: atp + glc-D -> adp + g6p + h PGI: g6p ↔ f6p PFK: atp + f6p -> adp + fdp + h Currency Metabolite Other Metabolite Reversible Reaction Irreversible Reaction (a) Bipartite Representation (b) Unipartite Representation Degree Distribution Clustering coefficient Average Path Length Largest Strong Component Different Graph-theoretic Representations of the Metabolic Network HEX1 PGI atp adp h PFK
14. Ensemble R : Random networks within KEGG with same number of reactions as E. coli The E. coli metabolic network or any random network of reactions within KEGG can be represented as a bit string of length N. Present - 1 Absent - 0 N = 5870 reactions within KEGG Bit string representation of metabolic networks Ensemble R of random networks To compare with E. coli , we generate an ensemble R of random networks: Pick random subsets with exactly r (~900) reactions within KEGG database of N (~6000) reactions. r is the no. of reactions in E. coli . Huge no. of networks possible within KEGG with exactly r reactions ~ Fixing the no. of reactions is equivalent to fixing genome size. Contains r reactions (1’s in the bit string) KEGG with N reactions Random networks with r reactions KEGG Database E. coli R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 . . . .
15. Metabolite degree distribution Complete KEGG: = 2.31 E. coli : = 2.17; Most organisms: is 2.1-2.3 Reference: Jeong et al (2000); Wagner and Fell (2001) Random networks in ensemble R : = 2.51 Any (large enough) random subset of reactions in KEGG has Power Law degree distribution !! A degree of a metabolite is the number of reactions in which the metabolite participates in the network.
16.
17. Network Properties: Given number of valid biochemical reactions No. of metabolites in E. coli < No. of metabolites in any random network
18. Constraint II: Fix the number of metabolites Direct sampling of randomized networks with same no. of reactions and metabolites as in E. coli is not possible. Simple MCMC algorithm to sample networks within KEGG which have same no. of reactions and metabolites as that in E. coli . Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli . Erase memory of initial network Saving Frequency Ensemble Rm KEGG Database R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 0 1 . . N = 5870 reactions within KEGG reaction universe 1 1 0 0 1 . . 1 0 1 1 0 . . 1 1 1 0 0 . . 0 0 1 0 1 . . E. coli Accept Reject Step 1 Swap Swap Reject 10 6 Swaps Step 2 10 3 Swaps store store Reaction swap keeps the number of reactions in network constant
19. Network Properties: After fixing number of metabolites Randomized network properties become closer to E. coli
20.
21. MCMC Sampling: Exclusion of blocked reactions Restrict the reaction swaps to the set of unblocked reactions within KEGG. Accept/Reject Criterion of reaction swap: Accept if the no. of metabolites in the new network is less than or equal to E. coli . Erase memory of initial network Saving Frequency Ensemble uRm KEGG Database R 1 : 3pg + nad -> 3php + h + nadh R 2 : 6pgc + nadp -> co2 + nadph + ru5p-D R 3 : 6pgc -> 2ddg6p + h2o . . . R N : --------------------------------- 1 0 1 0 1 . . N = 5870 reactions within KEGG reaction universe 1 1 0 0 1 . . 1 0 1 1 0 . . 1 1 1 0 0 . . 0 0 1 0 1 . . E. coli Accept Reject Step 1 Swap Swap Reject 10 6 Swaps Step 2 10 3 Swaps store store Reaction swap keeps the number of reactions in network constant
22. Network Properties: After excluding blocked reactions Randomized network properties become still closer to E. coli
23. Constraint IV: Growth Phenotype 0 1 1 . . . . 0 0 1 . . . . G A G B Flux Balance Analysis (FBA) Viable genotype Unviable genotype Bit string and equivalent network representation of genotypes Genotypes Ability to synthesize Viability biomass components Growth No Growth Definition of Phenotype For FBA Kauffman et al (2003) for a review x Deleted reaction
24.
25. Network Properties: Growth phenotype in one environment Further improvement obtained by imposing phenotypic constraint on randomized networks
26. Network Properties: Growth phenotype in multiple environments Still further improvement obtained by increasing phenotypic constraints on randomized networks
27. Global structural properties of real metabolic networks: Consequence of simple biochemical and functional constraints Increasing level of constraints Global network structure could be an indirect consequence of biochemical and functional constraints Samal and Martin, PLoS ONE (2011) 6(7): e22295 Conjectured by: A. Wagner (2007) B. Papp et al. (2008)
28. High level of genetic diversity in our randomized ensembles Any two random networks in our most constrained ensemble differ in ~ 60% of their reactions.
29. Necessity of MCMC sampling Constraint of having super-essential reactions The convex curve indicates that the effect on viability of successive reaction removals becomes more and more severe. Reference: Samal, Rodrigues, Jost, Martin, Wagner BMC Systems Biology (2010)
30.
31.
32. Neutral networks: A many-to-one genotype to phenotype map Genotype Phenotype RNA Sequence Secondary structure of pairings folding Reference: Fontana & Schuster (1998) Lipman & Wilbur (1991) Ciliberti, Martin & Wagner(2007) Neutral network or genotype network refers to the set of (many) genotypes that have a given phenotype, and their “organization” in genotype space. Protein Sequence Three dimensional structure Gene network Gene expression levels
33. Common approaches in the study of design principles of metabolic networks See also:
34. Acknowledgement CNRS & Max Planck Society Joint Program in Systems Biology (CNRS MPG GDRE 513) for a Postdoctoral Fellowship. Reference Related Work Genotype networks in metabolic reaction spaces Areejit Samal , João F Matias Rodrigues , Jürgen Jost , Olivier C Martin * and Andreas Wagner * BMC Systems Biology 2010, 4: 30 Environmental versatility promotes modularity in genome-scale metabolic networks Areejit Samal , Andreas Wagner * and Olivier C Martin * BMC Systems Biology 2011, 5: 135 Funding Andreas Wagner João Rodrigues University of Zurich Jürgen Jost Max Planck Institute for Mathematics in the Sciences Olivier C. Martin Université Paris-Sud Thanks for your attention!! -- Questions??