We developed BSGrid, an application to simulate the behavior of bacterial populations using stochastic methods, using high performance computing infrastructures (HPCIs) as cluster and/or grid computing.
Mario Jose Villamizar CanoCloud Computing Specialist / Head of Digital Products en Universidad de los Andes
BacteriumSimulatorGrid (BSGrid) - Tool for Simulating the Behavior of the Bacillus thuringiensis
1. 2009
Mesoscale Modeling of the Bacillus thuringiensis Sporulation Network
Based on Stochastic Kinetics and Its Application for in Silico Scale-down
Harold Castro, Andrés González, Sergio Orduz
Mario Villamizar, Nicolás Cuervo, School of Biosciences
Gabriel Lozano, Silvia Restrepo Universidad Nacional de Colombia
Departments of Chemical Engineering, Medellín, Colombia
Biological Sciences and Systems and
Computing Engineering
Universidad de los Andes
Bogotá, Colombia
2. Introduction to Bacillus thuringiensis
Bacillus thuringiensis is a gram positive bacterium widely known by its
capacity of synthesizing δ-endotoxins (parasporal crystal proteins) during the
sporulation process, which are used as biopesticides.
This δ-endotoxins are used in some products and no toxic effects of B.
thuringiensis on humans have been detected in its years of use.
3. Motivation
These biopesticides are used in countries that require the use of organic
agriculture.
For instance, in Colombia they can be used for a typical problem in the insect
control of maize crops. A B. thuringiensis subspecies as kurstaki can
contribute to combat lepidoptera in this kind of crops.
4. Problem
This kind of biopesticides represents 90% of the total biopesticide market
and they just participate in the 5% of the total pesticide market.
Industrial-scale fermentation cannot obtain a high concentration of the δ-
endotoxins, so the production of biopesticides have a high cost.
The δ-endotoxins are produced during the sporulation process of B.
thuringiensis.
It is necessary to analyze the relationship between the sporulation
process and the δ-endotoxin production of the δ-endotoxins to determine the
optimum conditions under which the δ-endotoxins are produced.
The sporulation process is affected by intrinsic and extrinsic variables
which can not be modeled using deterministic models.
5. Project objectives
Develop a mesoscale stochastic model that predicts the sporulation
process in B. thuringiensis so it allows to analyze the relationship between
the sporulation process and the δ-endotoxins production, in order to
increase, by fermentation processes, the δ-endotoxins production at
industrial levels.
Determine the effect of oxygen oscillations on the sporulation process in
order to analyze the evolution of the protein synthesis on industrial scale
(scale-down in silico).
Validate the stochastic model results with experimental results.
6. Work Areas
Definition of a mesoscale stochastic
model for B. thuringiensis
BSGrid - An application for executing
simulations using stochastic algorithms
UnaGrid – An Opportunistic High
Performance Computing Infrastructure
Comparisons with experimental data
7. Work Areas
Definition of a mesoscale stochastic
model for B. thuringiensis
BSGrid - An application for executing
simulations using stochastic algorithms
UnaGrid – An Opportunistic High
Performance Computing Infrastructure
Comparisons with experimental data
8. A mesoscale stochastic model for B. thuringiensis
Five proteins are considered: SigmaH, AbrB, KinA, Spo0A and phosporylated
Spo0A.
The evolution of these proteins is determined based on 27 events classified
in four categories (gene transcription, protein transduction, protein
degradation, degradation of messenger RNA).
Messenger RNA expression is regulated with the use of the Hill equation.
In the stochastic simulations the Stochastic Simulation Algorithm (SSA) of
Gillespie is used.
B. thuringiensis has a bimodal behavior, the planktonic population and the
spore-forming population (include spore population).
9. Sporulation regulatory network and the Spo0A-P role
The phosphorylated Spo0A protein plays an important role because when
reaches high concentrations, it activates the whole sporulation process,
therefore we considered that when the protein reaches a threshold value it is
highly probable that the sporulation process begin
10. Sporulation regulatory network - Bimodal population
The simulations results seem to predict a bimodal population.
For finding the distribution of the populations we developed a simple
Montecarlo simulation based on a probability function.
f 1 , 1 , 2 , 2 , p 1 p N 1 , 1 pN 2 , 2
We used reverse engineering to find the parameters of this distribution
through the development of an algorithm based on sum squares
minimization.
Each time t was analyzed for parameter regression using Microsoft Excel
2007® solver tool
11. Work Areas
Definition of a mesoscale stochastic
model for B. thuringiensis
BSGrid - An application for executing
simulations using stochastic algorithms
UnaGrid – An Opportunistic High
Performance Computing Infrastructure
Comparisons with experimental data
12. BSGrid – Operation on Personal Computers
An application useful for executing simulations using stochastic methods.
Java J2SE.
Friendly with the final user.
1. Bacterium Structure
Definition through GUIs
13. BSGrid – Operation on Personal Computers
2. Configuration and
Execution of the
Simulations through
GUIs
14. BSGrid – Operation on Personal Computers
3. Visualization and
analysis of results, so
he/she can decide to
modify the bacterium
structure and run
simulations again.
15. BSGrid – Problems for Larger Simulations on PCs
1 Individual
≈ 63 seconds
150000 Individuals
≈ 54 Days ≈ 2 Months
¿Simulations with big populations
require larger processing capabilities?
16. Solution: BSGrid as a Grid-Enabled application
Cluster/Grid Infrastructure
Independent Jobs
Master XML Document
Submitting BSGrid Jobs to the
Cluster/Grid Infraestructure
Batch Process 1. Bacterium Structure
Definition through GUIs
Slave 1 2. Configuration and
Slave N
….. Execution of Simulations
3. Visualization and
analysis of results
17. Solution: BSGrid as a Grid-Enabled application (2)
Cluster/Grid Infrastructure
Independent Jobs
BSGrid job
BSGrid job
BSGrid job Master XML Document
Submitting BSGrid Jobs to the
Cluster/Grid Infraestructure
Batch Process
Much time to display the global statistics
Slave 1 Slave N
BSGrid ….. BSGrid
job BSGrid job
job
User User
…..
Analysis 1 Analysis N
Relational
Database
Server
18. Solution: BSGrid as a Grid-Enabled application (3)
Cluster/Grid Infrastructure
Independent Jobs
BSGrid job
BSGrid job
BSGrid job Master XML Document
Submitting BSGrid Jobs to the
Cluster/Grid Infraestructure
Batch Process
The time is reduced from minutes to seconds
Slave 1 Slave N
BSGrid ….. BSGrid
job BSGrid job
job
User
User
Analysis 1 …..
Analysis N
Relational
Tables
Relational
Database Materialized
Server Views
20. Tools of the BSGrid Application
BSGrid
GUI Results
Stochastic Algorithms PC Execution
GUI Definition
Bacterium
Structure Model Execution of
Output Data
Simulations
RAM Memory
In PCs
XML Bacterium
Structure Model Execution of
Output Data
Input File for Simulations
Database Server
BSGrid In Grid/Cluster
GUI Results
Grid/Cluster
Execution
21. Work Areas
Definition of a mesoscale stochastic
model for B. thuringiensis
BSGrid - An application for executing
simulations using stochastic algorithms
UnaGrid – An Opportunistic High
Performance Computing Infrastructure
Comparisons with experimental data
22. A High Performance Computing Infrastructure (HPCI)
This type of simulations requires large processing capabilities.
Cluster and grid infrastructures regularly have dedicated computational
resources so its implementation requires large financial investments.
23. A High Performance Computing Infrastructure (2)
Dedicated infrastructures are an unviable option in organizations or
countries with low financial resources. However, these organizations have
many computer labs which are not fully utilized by employees or university
students.
24. Solution: Opportunistic virtual clusters
X X
Cores Cores
Linux Linux
Processing Processing
Virtual Machine Virtual Machine
Physical Machine of a Physical Machine of a
Computer Room Computer Room
a. When there is an End User using b. When there is not an End User
the physical machine using the physical machine
A virtual cluster is a set of commodity and interconnected desktops
executing virtual machines (VMs) in background and low-priority through
virtualization technologies, these VMs take advantage of the available idle
processing capabilities in computer labs on an university campus.
25. Solution: Opportunistic virtual clusters (2)
Computer lab
VM VM VM
VM VM VM
VM VM VM
A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
26. Solution: Opportunistic virtual clusters (2)
Computer lab
VM VM VM
VM VM VM
VM VM VM
Computers in the computer lab – Virtual Cluster Slaves
A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
27. Solution: Opportunistic virtual clusters (2)
Computer lab
VM VM VM
VM VM VM
Master
Dedicated computer
outside the computer lab
VM VM VM
Computers in the computer lab – Virtual Cluster Slaves
A virtual machine is executed on each computer of a lab and it supports
the role of a cluster slave and all of these virtual machines on execution
make up a virtual processing cluster. A dedicated node is necessary for a
virtual cluster and it supports the role of the cluster master.
28. Opportunistic virtual clusters - Features
Virtual Cluster
Research Group C
Cluster/Grid User
Virtual Cluster Slave Slave
Research Group A
Cluster/Grid User
Master Slave Slave
Slave Slave
Virtual Cluster
Master Research Group B
Slave Slave
Slave Slave
Master Slave Slave
A virtual infrastructure composed by virtual clusters.
The virtual clusters take advantage of the unused physical resources.
An infrastructure for general purpose – Not only for biological simulations
29. Opportunistic virtual clusters – Features (2)
GRID COMMUNITY
Virtual Cluster
Research Group B
Cluster/Grid User
Certificate
Virtual Cluster Authority (CA)
Research Group A Slave Slave
Cluster/Grid User
Master Slave Slave
Middleware
Slave Slave
Grid
Virtual Cluster
Master Research Group C
Slave Slave
Cluster/Grid User
Slave Slave
Master Slave Slave
Each research group can define its own virtual clusters with custom
application environments (middlewares, applications, configurations, etc)
A grid solution (several virtual clusters) can be deployed for supporting
the processing capabilities required by some applications.
30. Opportunistic Grid Virtual Infrastructure Proposed
Our strategy solves the problems associated with the lack or sub-utilization of
preexisting computer laboratories and promotes new opportunities:
The collaborative work among research groups
The development of research projects that requires large processing
capabilities at low cost.
Limitations
Best effort approach.
No quality of service (QoS) is guaranteed.
The capabilities of a virtual cluster depend of its configuration.
Bag of tasks application.
31. Opportunistic Grid Virtual Infrastructure Deployed
Cluster/Grid Cluster/Grid Cluster/Grid
Three computer labs, each User User User
Job Submission Job Submission Job Submission
one with 35 computers and VMWare ESX Server
windows XP as the base Globus Globus
operating system. Middleware Middleware
Virtual Machine Virtual Machine Virtual Machine
Master Cluster Turing Master Cluster Wuaira1 Master Cluster Wuaira2
Core 2 Duo processor
Computer Labs
(1,86GHz) and 4 GB of RAM.
Cluster Virtual Turing Cluster Virtual Wuaira Cluster Virtual Wuaira
Computer Lab Computer Lab Computer Lab
Three virtual clusters.
Condor scheduler. How to deploy the virtual machines?
VMware virtualization If the virtual machines are always in execution,
software. they will be always consuming energy including
when there are not cluster/grid users using the
virtual infrastructure.
Globus middleware.
A green solution it is necessary.
32. Opportunistic Grid Virtual Infrastructure Deployed
Three computer labs, each Cluster/Grid
User
Cluster/Grid
User
Cluster/Grid
User
Job Submission
one with 35 computers and Job Submission Job Submission
VMWare ESX Server
windows XP as the base
Globus Globus
operating system. Middleware Middleware
Virtual Machine Virtual Machine Virtual Machine
Master Cluster Turing Master Cluster Wuaira1 Master Cluster Wuaira2
Core 2 Duo processor
(1,86GHz) and 4 GB of RAM. Computer Labs
Cluster Virtual Turing Cluster Virtual Wuaira Cluster Virtual Wuaira
Computer Lab Computer Lab Computer Lab
Three virtual clusters.
Condor scheduler. Data Center
Domain Controller Domain Controller
Windows 2008 Server Windows 2003 Server
VMware virtualization
software. GUMA
Admin. ADMONSIS Web Server Admin.
Domain CAPRICA Domain
Globus middleware.
Cluster/Grid Cluster/Grid Cluster/Grid
User User User
33. Deployment on Demand of the Virtual Infrastructure
The deployment of virtual clusters
is executed on demand through
GUMA.
This application allows to execute
and manage virtual clusters on
demand and it provides multiple
services for managing the grid from
light clients. It allows the monitoring
of the physical and virtual machines.
34. Work Areas
Definition of a mesoscale stochastic
model for B. thuringiensis
BSGrid - An application for executing
simulations using stochastic algorithms
UnaGrid – An Opportunistic High
Performance Computing Infrastructure
Comparisons with experimental data
35. Experimental tests
Three fermentations were carried out and the B. thuringiensis subsp.
kurstaki HD1-1999 were used.
One single colony was inoculated in 50 mL culture at 30 oC for 72 h.
Oxygen was controlled by adding a mix of air-pure oxygen. pH and
temperature were maintained at 6.5 and 30 oC respectively.
The population of planktonic, spore-forming and spores populations were
evaluated using phase contrast microscope.
36. Experimental results
Our results seem to indicate that the sporulation process is triggered around
the 20th hour possibly influenced by intrinsic and extrinsic noise, and due to
poor oxygen transfer in Bogotá (2600 AMSL) we believe that the spore
content did not pass over 60%, contrary to several reports.
37. In silico results - Bimodal population
The model was run for 150000 cells. The analysis was carried out for 2900
cells up to 80000 seconds. In order to save computational resources, results
were saved every 500 s.
In order to assure the presence of two subpopulations in the proposed
mesoscale model, we adjust our histograms to continue Gaussian
distribution curves and the bimodal population describes the presence of
planktonic cells (low Spo0AP) and spores (high Spo0A-P) along the time.
38. In silico results
Interestingly, high Spo0A-P population increases when augmenting time
clearly indicating the augmenting of spores until reaching steady state (right
figure). These results describe a similar dynamics compared to the spore
concentration in the fermentor (left figure).
Our analysis in silico predicts that the sporulation process takes around 8 h
to be completed while the experimental results display that the process takes
within 20 h. A deeper study is required.
39. System response to oxygen oscillations
Keep into account that Oxygen tension partially controls KinA activity
therefore affecting Spo0A phosphorylation rate described by:
Spo 0 A
Spo 0 A P
c
A sin 2 t + d
n
KinA
c KMsp *
KinA n K n T
kasp
The stochastic kinetic constant
A : Wave amplitude
c was modified according to:
T : Oscilation period
d : M ean value of the sinusoidal function
Parameters
Simulation
A T d
Five hundred simulations were 1 0,5 0,5 0,5
performed for each of these 2 0,5 1,0 0,5
conditions. 3 0,625 1,0 0,625
4 0,25 1,0 1,0
5 0,5 1,0 1,0
40. Spo0A-P response to oscillations in the oxygen tension
The results of these simulations with oscillations in the oxygen tension
predict a reduction in the size of the high Spo0A-P population demonstrating
the effects of the industrial-scale oscillations on the sporulation process.
41. Results of processing time and data generated
Processing time required on a personal computer:
Amount of Time required for each CPU Total time
Model name
bacteria bacterium (sec) numbers (days)
B. thruring. 150000 63 2 54,69
Processing time required on the opportunistic virtual cluster infrastructure:
Amount of Time required for each CPU Total time
Model name
bacteria bacterium (sec) numbers (days)
B. thruring. 150000 111 70 2,75
These results confirm the benefits of our strategy and performance tests
confirm the transparency of our model.
We found that 10GB were generated by the model simulated.
42. Conclusions
Stochastic model
In the model developed we demonstrate the presence of multistability for B.
thuringiensis and we also can demonstrate that cycling the oxygen decreases the
population of spore-forming cells.
BSGrid application
BSGrid application is a tool for simulating biological systems using stochastic
methods and algorithms in PCs and HPCIs.
Virtual infrastructure and parallel computing
Parallel computing provides advantages for this type of simulations through the
generation of a large number of independent jobs.
The infrastructure proposed allows the execution of this and other applications using
an opportunistic strategy (cost close to zero).
43. Future work
Stochastic model
The proposed model predicts an elapsed time of 8 h for the sporulation
process. Nevertheless our experimental results indicate a longer process
therefore more studies are required in order to understand the triggering
process.
Analysis with new parameters in the model are required for analyzing the
relationship between the sporulation process and the δ-endotoxins
production.
Experimental results
In the fermentation process were not possible to differentiate between spores
populations and spore-forming populations so an analysis more detailed
should be used for validating the mesoscale model using reporter genes
related with the sporulation.
44. Future work
BSGrid application
Adapt and publish BSGrid as an open source application.
Given its modular design, BSGrid is ready to be extended to handle new
stochastic methods and algorithms.
Infrastructure
Researchers want to work now with larger populations, more complex
structures and get more accurate answers.