This document summarizes an approach using cheminformatics and bioinformatics to analyze big data related to neglected tropical diseases, specifically applying it to Chagas disease. Key aspects included curating the Trypanosoma cruzi metabolome, developing machine learning models to predict active compounds from screening data, screening over 7,500 compounds and identifying hits, and validating the top 5 hits in vitro and in vivo in a mouse model. One particularly promising hit was pyronaridine, which showed strong anti-trypanosomal activity and is an approved antimalarial, highlighting its potential for repurposing for Chagas disease.
Applying cheminformatics and bioinformatics approaches to neglected tropical disease big data
1. Applying Cheminformatics and Bioinformatics
Approaches to Neglected Tropical Disease Big Data
Sean Ekins1,2*ǂ, Jair Lage de Siqueira-Neto3ǂ, Laura-Isobel McCall3, Malabika
Sarker4, Maneesh Yadav4, Elizabeth L. Ponder5, E. Adam Kallel1 $, Danielle Kellar6,§,
Steven Chen7, Michelle Arkin7, Barry A. Bunin1, James H. McKerrow3 and Carolyn
Talcott4.
1 Collaborative Drug Discovery, 1633 Bayshore Highway, Suite 342, Burlingame, CA 94010, USA.
2 Collaborations in Chemistry, 5616 Hilltop Needmore Road, Fuquay-Varina, NC 27526, USA.
3 Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, San Diego, CA 92093, USA.
4 SRI International, 333 Ravenswood Avenue, Menlo Park, CA 94025, USA.
5 ChEM-H, Shriram Center, 443 Via Ortega, Room 279, MC 5082, Stanford, CA 94305-4125, USA.
6 Department of Pathology, University of California San Francisco, San Francisco, CA 94158, USA.
7 Small Molecule Discovery Center and Department of Pharmaceutical Chemistry, University of California San Francisco, San
Francisco, CA 94158, USA.
$ Retrophin Inc. 12255 El Camino Real, Suite 250 San Diego, CA 92130, USA.
§ Present address: Five Prime Therapeutics, San Francisco, CA, USA.
ǂ contributed equally
2. Chagas Disease
• About 7 million to 8 million people
estimated to be infected worldwide
• Vector-borne transmission occurs in the
Americas.
• A triatomine bug carries the
parasite Trypanosoma cruzi which
causes the disease.
• The disease is curable if treatment is
initiated soon after infection.
Hotez et al., PLoS Negl Trop Dis. 2013
Oct 31;7(10):e2300
11. CDD & CDIPD & SRI Collaboration
• Develop a novel combined cheminformatics-systems biology approach to
predict metabolic enzyme targets of HTS hits
• Curate T. cruzi metabolome
• Identify interesting targets
• Identify novel metabolic enzyme-compound hit pairs for T. cruzi
- analyze hits in CDD e.g. Broad hits, literature etc.
- Compare to known compounds with known targets e.g. CYP51
• Developed Machine learning models
• Identified compounds for In vitro testing
• Tested hits in vivo
What we actually did
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
12. Curating T. cruzi metabolome
Pathway Genome Data Base (biocyc.org)
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
TCruCyc created
using complete
genome sequence of
Dm28c strain
Used Pathologic
workflow
• 11,349 distinct gene products
• 88 were enzymes, 16 transporters
• Infered 1030 enzymatic reactions, 122 pathways
• 806 metabolic compounds – set filtered to 358 for use in similarity searching
13. • Dataset from PubChem AID 2044 – Broad Institute data
• Dose response data (1853 actives and 2203 inactives)
• Dose response and cytotoxicity (1698 actives and 2363 inactives)
• EC50 values less than 1 mM were selected as actives.
• For cytotoxicity greater than 10 fold difference compared with EC50
• Models generated using : molecular function class fingerprints of maximum
diameter 6 (FCFP_6), AlogP, molecular weight, number of rotatable bonds,
number of rings, number of aromatic rings, number of hydrogen bond
acceptors, number of hydrogen bond donors, and molecular fractional polar
surface area.
• 5-fold cross validation or leave out 50% x 100 fold cross validation was used
to calculate the ROC for the models generated
T. cruzi Machine Learning models
15. Good Bad
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
T. cruzi Dose Response Machine Learning model features
Tertiary amines, piperidines and
aromatic fragments with basic Nitrogen
Cyclic hydrazines and electron poor
chlorinated aromatics
16. Good Bad
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
T. cruzi Dose Response and cytotoxicity Machine Learning model features
Tertiary amines, piperidines and
aromatic fragments with basic Nitrogen
Cyclic hydrazines and electron poor
chlorinated aromatics
17. Bayesian Machine Learning Models
Ekins et al, PLoS NTD, 2015 (in press)
- Selleck Chemicals natural product lib. (139 molecules);
- GSK kinase library (367 molecules);
- Malaria box (400 molecules);
- Microsource Spectrum (2320 molecules);
- CDD FDA drugs (2690 molecules);
- Prestwick Chemical library (1280 molecules);
- Traditional Chinese Medicine components (373 molecules)
7569 molecules
99 molecules
19. Synonyms
Infection
Ratio
EC50 (µM) EC90 (µM) Hill slope
Cytotoxicity
CC50 (µM)
Chagas mouse model
(4 days treatment,
luciferase): In vivo
efficacy at 50 mg/kg
bid (IP) (%)
(±)-Verapamil
hydrochloride,
715730, SC-0011762
0.02,
0.02
0.0383 0.143 1.67 >10.0 55.1
29781612,
Pyronaridine
0.00,
0.00
0.225 0.665 2.03 3.0 85.2
511176,
Furazolidone
0.00,
0.00
0.257 0.563 2.81 >10.0 100.5
501337,
SC-0011777,
Tetrandrine
0.00,
0.00
0.508 1.57 1.95 1.3 43.6
SC-0011754,
Nitrofural
0.01,
0.01
0.775 6.98 1.00 >10.0 78.5*
* Used hydroxymethylnitrofurazone for in vivo study (nitrofural pro-drug)
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
H3C
O
N
CH3
N
CH3
H3C
O
CH3
O
H3C
O
H3C
N
N
HN
N
N
OH
Cl
O
CH 3
O
N
N
+
N
O
O
–
O
O
O
N
+
O
O
–
N
H
N
NH2
O
In vitro and in vivo data for compounds selected
20. Verapamil – Broad EC50 < 0.1µM others have shown IC50 > 50µM
Pyronaridine EC50 < 0.587µM in Broad dose response data but never tested in
mouse
Furazolidone (H. Pylori treatment) only in the bigger Broad primary
screen.
Tetrandrine is a P-gp inhibitor used in combination with chloroquine in
Broad primary screen – classed as negative.
Nitrofural, (Known active – Beveridge et al 1980)
not in training set or Broad dataset, predicted active by us, EC50 = 0.77µM
This study used different cell line (CA-I/72 strain) to the Broad data (Tulahuen) –
The later seems to bias hits towards CYP51 etc.
Can account for differences in activity
What do we know about the hits?
21. 7,569 cpds => 99 cpds => 17 hits (5 in nM range)
Infection Treatment Reading
0 1 2 3 4 5 6
7
Pyronaridine Furazolidone Verapamil
Nitrofural Tetrandrine Benznidazole
In vivo efficacy of the 5 tested compounds
Vehicle
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
22. Sharing in vitro and in vivo data in CDD Vault
Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
CDD and UCSD used Vault to
securely share data
In vitro and in vivo data captured
Screening and dose response data
23. Pyronaridine: New anti-Chagas and known anti-Malarial
EMA approved in combination with
artesunate
The IC50 value 2 nM against the
growth of KT1 and KT3 P. falciparum
Known P-gp inhibitor
Active against Babesia and
Theileria Parasites tick-transmitted
24. Pyronaridine: target hunting for Chagas disease
Similarity search with pyronaridine in
literature dataset we curated on
Chagas Disease
GAPDH
A similarity search on ChEMBL using
the MMDS
trypanothione disulfide reductase
Most similar metabolite (Tanimoto MDL
keys = 0.67 ) = S-adenozyl 3-
(methylthio)propylamine = polyamine
biosynthesis
25. Ekins et al., PLoS Negl Trop Dis. 2015 Jun 26;9(6):e0003878
Bayesian models and training sets
were provided as supplemental data
Managed to find an overlooked
compound from Broad data
Future work:
Use models to score other libraries
Combinations of molecules
Longer term efficacy studies
Target identification
Test Pyronaridine vs other parasites,
bacteria, viruses
Conclusions
26. Drugs in use and in development for Chagas Disease
27. NIH NIAID grant R41-AI108003-01 “Identification and validation of
targets of phenotypic high throughput screening”
Mike Pollastri
Ni Ai
Alex Clark
Dr. Martin John Rogers
Acknowledgments