Apollon - 22/5/12 - 09:00 - User-driven Open Innovation Ecosystems
World-wide In Silico Drug Discovery on Grids Against Malaria and Avian Flu
1. World-wide in silico drug discovery
against neglected and emerging
diseases on grid infrastructures
Dr Nicolas jacq
HealthGrid association
Credit : the WISDOM collaboration
http://wisdom.healthgrid.org
International Symposium on Grids for Science and Business
12 June 2007
www.healthgrid.org
2. The HealthGrid
association
• The vision of HealthGrid is the deployment of e-infrastructures
able to interoperate geographically distributed repositories of
health-related data and the integration of high-end processing
services on top of them.
• Some key aspects are:
– The integration of health-related actors in grid projects
– The integration of grid standards and medical informatics standards for
interoperability
– The deployment of pilots for new ways of research and new methods
– The integration of bioinformatics community and medical informatics
• The mission of HealthGrid is to foster the communication among
the different key actors and to catalyse joint research actions at
international level
Jacq - 12 June 2007 2
3. Main achievements
• Edition of the HealthGrid Whitepaper in 2005 outlining the
concept, benefits and opportunities offered by applying grids in
different applications in biomedicine and healthcare
– http://whitepaper.healthgrid.org
• Involvement as full partner in several projects
– SHARE (SSA): http://www.eu-share.org
– EGEE II (I3): http://www.eu-egee.org
– ACGT (IP): http://www.eu-acgt.org
• Organisation of the HealthGrid conference since 2003
– HealthGrid.US Alliance will host the 6th International HealthGrid
Conference in Chicago – Spring 2008
• Development of the health grids knowledge base
– http://kb.healthgrid.org
Jacq - 12 June 2007 3
4. Content
• WISDOM, an initiative for grid-enabled drug discovery
against neglected and emerging diseases
• Deployment and results of grid-enabled large scale
virtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 4
5. Goal of the
WISDOM initiative
• WISDOM stands for World-wide In Silico Docking On Malaria
• Goal: contribute to develop new drugs for neglected and
emerging diseases with a particular focus on malaria and avian flu
• Specificity: extensively rely on emerging information technologies
to provide new tools and environments for drug discovery
• Initial focus: virtual screening
• Web site: http://wisdom.healthgrid.org
Jacq - 12 June 2007 5
6. WISDOM
collaboration
LPC Clermont-Ferrand: SCAI Fraunhofer:
Biomedical grid Knowledge extraction,
Web service Chemoinformatics
CEA, Acamba project: Univ. Modena:
Malaria biology, Malaria biology,
Chemogenomics Molecular Dynamics
HealthGrid: ITB CNR: Academica Sinica:
Biomedical grid, Bioinformatics, Grid user interface
Dissemination Molecular modelling Avian flu biology
In vitro testing
Univ. Los Andes:
Bioinformatics,
New
Malaria biology Chonnam Nat. Univ.:
Univ. Pretoria: Mahidol Univ. Bangkok: In vitro testing
Bioinformatics, In vitro testing
Malaria biology
Partners
Associated labs
7 partners, 4 associated laboratories providing targets
and/or in vitro facilities
Jacq - 12 June 2007 6
7. Benefits from using the
grid (1/2)
• World-wide distribution of malaria resistance
• 1975-2004: Only 21 new drugs for tropical diseases on 1,556 were
marketed (Chirac P. Toreele. E Lancet. May 2006)
• Neglected diseases keep suffering lack of R&D
• Grids allow reduced costs
Jacq - 12 June 2007 7
8. Benefits from using the
grid (2/2)
• H5N1 virus has the potential to cause a large-scale pandemic
• H5N1 may mutate and acquire the ability of drug resistance
• Time is a critical factor for handling emerging diseases
• Grids provide accelerating factor
months
Deaths from all causes each week expressed as an annual rate per 1000
Source : Ross E.G. Upshur BA(HONS), MA, MD, MSc, CCFP, FRCPC
Jacq - 12 June 2007 8
9. In silico drug
discovery
• Problem: development of a drug takes 12 to 15 years
and costs approximately 800 million dollars
Target discovery Lead discovery
Target Target Lead Lead Clinical
Identification Validation Identification Optimization Phases
(I-III)
Jacq - 12 June 2007 9
10. Grid impact on drug
discovery workflow down
to drug delivery (1/2)
• Grids provide the necessary tools and data to identify new
biological targets
– Bioinformatics services (database replication, workflow…)
– Resources for CPU intensive tasks (genomics comparative analysis,
inverse docking…)
• Grids provide the resources to speed up lead discovery
– Large scale in silico docking to identify potentially promising
compounds
– Molecular dynamics computations to refine virtual screening and further
assess selected compounds
• Grid offers very interesting perspectives to enable collaboration
between public and private partners
– Platform for information and knowledge sharing
Jacq - 12 June 2007 10
11. Grid impact on drug
discovery workflow down
to drug delivery (2/2)
• Grids provide environments for epidemiology
– Federation of databases to collect data in endemic areas to
study a disease and to evaluate impact of vaccine, vector control
measures
– Resources for data analysis and mathematical modelling
• Grids provide the services needed for clinical trials
– Federation of databases to collect data in the centres
participating to the clinical trials
• Grids provide the tools to monitor drug delivery
– Federation of databases to monitor drug delivery
Jacq - 12 June 2007 11
12. Content
• WISDOM, an initiative for grid-enabled drug discovery
against neglected and emerging diseases
• Deployment and results of grid-enabled large scale
virtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 12
13. Virtual screening by
docking
Compound Target structure
database model
DOCKING
Predicted
binding models
Post-analysis
Docking: predict how small
molecules bind to a receptor
of known 3D structure
Compounds
for assay
Jacq - 12 June 2007 13
14. Grid-enabled high
throughput virtual
screening by docking
Millions of potential
High Throughput Screening
drugs to test against 1-10$/compound, several hours
interesting proteins!
Too costly for neglected disease!
Compounds: Molecular docking (FlexX, Autodock)
ZINC: 4.3M ~1 to 15 minutes
Chembridge: 500,000
Data challenge on EGEE
Targets: ~ 2 to 30 days on ~5,000 computers
PDB: 3D structures
Cheap and fast!
Hits screening Leads
Selection of the using assays Clinical testing
best hits performed on
living cells Drug
Jacq - 12 June 2007 14
15. Statistics of
deployment
• First Data Challenge: July 1st - August 15th 2005
– Target: malaria
– 80 CPU years, 1 TB of data produced, 1,700 CPUs used in parallel
– 1st large scale docking deployment world-wide on a e-infrastructure
• Second Data Challenge: April 15th - June 30th 2006
– Target: avian flu
– 100 CPU years, 800 GB of data produced, 1,700 CPUs used in parallel
– Collaboration initiated on March 1st: deployment preparation achieved in 45
days
• Third Data Challenge: October 1st - 15th December 2006
– Target: malaria
– 400 CPU years, 1.6 TB of data produced, Up to 5,000 CPUs used in parallel
– Very high docking throughput: > 100,000 compounds per hour
Jacq - 12 June 2007 15
16. A huge international effort
for the third data challenge
1% 2% 2% 3%
3%
3%
EGEE Germany Switzerland
3%
EGEE Asia Pacific
38% 5% EGEE Russia
Auvergrid
6% EuChinaGrid
EELA
EGEE South Western Europe
EGEE Central Europe
EGEE Northern Europe
EGEE Italy
7% EGEE South Eastern Europe
EGEE France
EGEE UKI
12%
15%
Over 420 CPU years in 10 weeks
A record throughput of 100,000 docked compounds per hour
WISDOM calculations used FlexX from BioSolveIT
(6k free, floating licenses)
Jacq - 12 June 2007 16
18. Results from avian flu
data challenge (1/2)
• 5 out of 6 known effective inhibitors can be identified in the first
15% of the ranking and in the first 5% reranked (2,250 compounds)
– Enrichment: (5/6)/(15%x5%) = 111 (<1 in most cases)
• Most known effective inhibitors lose their affinity in binding with a
mutated target
Original type
E119A
E119A
mutated
type
GNA 2.4% GNA 11.5%
11.5%
15% cut off GNA=zanamivir Jacq - 12 June 2007 18
19. Results from avian flu
data challenge (2/2)
• Experimental assay confirms 7 actives out of 123 purchased
“potential hits” (interacting complexes with higher affinities and
proper docked poses) = 6%
• Average success rate of in vitro testing = 0.1%
• To be confirmed on more hits, tests are running in Univ. of
Chonnam (South Korea)
NA
Jacq - 12 June 2007 19
20. Results from first
malaria data challenge
1,000, 000 chemical compounds
Sorting based on scoring in different parameter sets;
Consensus scoring
10,000 compounds selected
Based on key interactions,
binding modes, etc.
1,000 compounds
MD
100 compounds will be tested in July by Univ. of
Credit: V. Kasam
Chonnam (South Korea)
Fraunhofer Institute Jacq - 12 June 2007 20
21. Content
• WISDOM, an initiative for grid-enabled drug discovery
against neglected and emerging diseases
• Deployment and results of grid-enabled large scale
virtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 21
22. Requirements for a
deployment on grid
• Adaptation of the application to the grid
• Access to a large infrastructure providing maintained
resources
• Use of a production system providing automated and
fault-tolerant job and file management
Jacq - 12 June 2007 22
23. Adaptation of the application to the grid
DB
• The application codes
can not be modified and
Input Data
are not designed for grid data
DB
Data
DB
subset
computing. Parameters
• A common strategy is to Docking software
split the application into
shorter tasks
• License management for Output
commercial software is
not adapted for large
Embarrassingly parallel application
infrastructure
Jacq - 12 June 2007 23
24. Real Time Monitor (Imperial College London) Grid Added Value
http://gridportal.hep.ph.ic.ac.uk/rtm/
• Large number of CPUs available
• Reliable and secured Data Management Services
– Sharing of results
– Replication of the data
– ACLs
• Availability of the resources
Jacq - 12 June 2007 24
25. Grid infrastructures and
projects contributing to the
data challenges
EMBRACE BioinfoGrid
SHARE
EGEE
Auvergrid
EUMedGrid EUChinaGrid
TWGrid
EELA
: European grid infrastructure : European grid project
: Regional/national grid infrastructure
Jacq - 12 June 2007 25
27. GUI designed by biologists
Compound selection
Complex visualization
Target selection
Energy table
Docking parameter setter
Credit: H-C12 June(ASGC)27
Jacq - Lee 2007
28. Content
• WISDOM, an initiative for grid-enabled drug discovery
against neglected and emerging diseases
• Deployment and results of grid-enabled large scale
virtual screening against malaria and avian influenza
• Deployment method
• Conclusion and perspectives
Jacq - 12 June 2007 28
29. Conclusion
• WISDOM proposes a new approach to drug discovery
thanks to the grid
– Rapid deployment of large scale virtual screening
– Collaborative environment for the sharing of data in the
research community
• First biochemical results demonstrate grid relevance
to the drug discovery community
Jacq - 12 June 2007 29
30. Perspectives
• Summer 2007
– 2nd data challenge against avian flu
– In vitro tests of the best molecules from the data challenges
• Winter 2007
– Discussion with WHO and Novartis
Targets provided by the Drug Target Portfolio Network from the
Tropical Disease Research initiative
– Discussion with Africa@home initiative
WISDOM deployment on a desktop grid
Jacq - 12 June 2007 30
31. Thank you
• To all members of the WISDOM collaboration for their
contribution to the project (CNRS-IN2P3, ASGC, ITB-CNR,
SCAI Fraunhofer, Univ of Modena…)
• To all grid nodes which committed resources and allowed
the success of the initiative
• To all projects which supported the initiative by providing
either computing resources or manpower to develop the
WISDOM environment (EGEE, BioinfoGRID, Embrace,
SHARE…)
• To BioSolveIT by offering up to 6,000 free licenses of FlexX
Jacq - 12 June 2007 31