3 d virtual screening of pknb inhibitors using data
1. Abhik Seal
Phd Student(Chemical Informatics)
Indiana University Bloomington
http://chemin-abs.blogspot.com/
mypage.iu.edu/~abseal/
10/16/2012 abseal@indiana.edu 1
2. Whats Pknb ???
• Ser/Thr protein kinase (STPK) highly
conserved in Gram-positive bacteria and
apparently essential for Mycobacterial
viability.
• Essential for cell division and metabolism,
expressed in exponential growth and
overexpression causes defects in cell wall
synthesis and cell division.
10/16/2012 abseal@indiana.edu 2
4. Kinase inhibitor and pharmacophores
Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation
2009 10/16/2012 abseal@indiana.edu Chem. 2010, 53, 2681–2694
J. Med. 4
5. Properties of Kinase Inhibitors
Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694
10/16/2012 abseal@indiana.edu 5
8. • A data fusion algorithm accepts two or more ranked lists
and merges these lists into a single ranked list with the
aim of providing better effectiveness than all systems
used for data fusion. (Croft,2000, Chapter 1; Meng et al.,
2002).
• Another aim of the data fusion is to group existing search
services under one umbrella, as the number of existing
search services increases (Selberg & Etzioni, 1996)
• Fusion in automatic ranking of IR systems
Automatic ranking of information retrieval systems using data
fusion, Nuray & Can ’06
• Merging the retrieval results of multiple systems.
see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion)
10/16/2012 abseal@indiana.edu 8
9. Used By
Meta Search engines for example :
(http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines)
ex: www.dogpile.com,www.copernic.com,www.hotbot.com
Meta search
Engine1 Engine 2 Engine 2
D1 D2 D3
Information Resource
10/16/2012 abseal@indiana.edu 9
10. Workflow of meta-search
• Execute a database search for some particular target
structure using different similarity measures
• Note the rank position, R(i), of each database
structure in the ranking for the i-th similarity
measure using similarity coefficients
• Combine the various positions using a fusion rule to
give a new rank position for each database structure
• Use these fused positions to generate the final
output ranking for the search.
http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf
10/16/2012 abseal@indiana.edu 10
11. Types of fusion for 2D similarity search
a) Similarity fusion (SF):
SF involves searching a single reference structure against a database using
multiple different similarity measures, and the output is obtained by
combining the rankings resulting from these different measures.
b) Group fusion (GF):
GF involves searching multiple reference structures against a database using a
single similarity measure, and the output is obtained by combining the
rankings resulting from these different reference structures.
Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and
precision Journal of Cheminformatics 2011, 3:29
10/16/2012 abseal@indiana.edu 11
14. Reciprocal Rank method
• Merge compounds using only rank positions
• Rank score of compound i (j: system index)
1
r (d i )
1 pos ( d ij )
j
10/16/2012 abseal@indiana.edu 14
15. Reciprocal rank example
• 4 systems: A, B, C, D
documents: a, b, c, d, e, f, g
• Query results:
A={a,b,c,d}, B={a,d,b,e},
C={c,a,f,e}, D={b,g,e,f}
• r(a)=1/(1+1+1/2)=0.4
r(b)=1/(1/2+1/3+1)=0.52
• Final ranking of compounds:
(most relev) a > b > c > d > e > f > g (least relev)
Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data
fusion. Information Processing and Management 42 (2006) 595–614
10/16/2012 abseal@indiana.edu 15
16. Sum score
The normalized scores of each ranking are
summed to get the fused score of a compound
Ranking 1 Ranking 2 Ranking 3 Sum score Rank
Compound 1 1 0.9 0.7 2.6 1
Compound 2 0.8 0.5 1 2.3 2
Compound 3 0.7 1 0.5 2.2 3
Compound 4 0.2 0 0.1 0.3 4
Compound 5 0 0.3 0 0.3 5
17. Sum rank
• In sum rank ranking is done based on the sum
scores the maximum score receives the
minimum rank . The ranks are then summed
and reranked.
Ranking 1 Ranking 2 Ranking 3 Sum rank Rank
Compound 1 1 10 4 15 5
Compound 2 2 5 6 13 4
Compound 3 7 4 3 14 4
Compound 4 2 3 3 8 2
Compound 5 3 2 1 6 1
18.
19. Pharmacophore design
To generate the pharmacophoric features we used the energetic
pharmacophore as developed by Salam et al with presence of exclusion
spheres.
Pharmacophoric sites were automatically generated with Phase using the
default set of six chemical features: hydrogen bond acceptor (A), hydrogen
bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable
(P), and aromatic ring (R).
21. Validation of Pharmacophores
• To determine how well a hit list was for a query
compound or a pharmacophore; yield of active
compounds, enrichment factor, percentage actives and
Goodness of a Hit list (GH score) were considered.
• Also, how well a pharmacophore or any other screening
method can rank compounds “early” in a virtual
screening process using Boltzmann-enhanced
discrimination of receiver operating characteristic
(BEDROC Truchon et al) and RIE metric (Sheridan et al)
• 35 active compounds randomly sampled from 62 actives
along with 1000 decoys
(www. schrodinger.com/ glide_decoy_set).
10/16/2012 abseal@indiana.edu 21
23. Why BEDROC ??
• Despite its early recognition sensitivity, the Enrichment Factor has
the drawback of being insensitive to the relative ranking of the
compounds in the top X% and ignoring the complete ranking of the
remaining data set.
• The ROC measure cannot identify the compounds ranked early in a
virtual screening process.
• This BEDROC metric uses an exponential decay function to reduce
the influence of lower ranked compounds on the final score. The
score has a parameter α that allows the user to adjust the definition
of the early recognition problem.
• BEDROC value for three VS methods at α=20.At α=20 implies that
80% of the the final BEDROC score is based on the first 8% of the
ranked data set.
10/16/2012 abseal@indiana.edu 23
24. Validation of virtual screening
a) E- pharmacophore
E-pharmacophore III was selected based on the performance measures and
also number of compounds retrieved had more than fitness 2 and also high
Goodness of Hit Score, yield of actives and specificity.
b) ROCS
All the compounds were scored and ranked according to Tanimoto combo
score parameters were selected as mentioned by Bostrom et al.
c) Glide XP
All compound were score based on the glide XP docking score. The
compound were ranked in a descending order of scores.
25. R13
D8
E-pharmacophore II
E-pharmacophore I
Which pharmacophore is good?
Does sites D8 and R13 important?
E-pharmacophore III
29. Architecture
Data Preprocessing
Rescoring and Ranking
System1
Validation
System 2
Fusion Algorithms Decision
System 3
System 4
10/16/2012 abseal@indiana.edu 29
30. Virtual Screening of Asinex 400K compounds
Workflow
Chemical Structure Post processing Compound
Collection
3D virtual Screening
and Ranking Selection
Virtual Screening
Using Data Fusion Top 10% of the database
• 400K • Phase E Selected for for Glide XP
pharmacophore select docking
compounds top 5000 compounds
Data Fusion
from Asinex for VS in vROCs and Using Reciprocal 45 compounds
Glide SP Rank algorithm Selected after visual
Optimized • Conformer generation Inspection and
using ligprep and perfom ROCS pharmacophore mapping
• Glide SP docking
31. Machine Learning Models under process
• Tools used:
a)PowerMV descriptors 2D pharmacological fingerprints,
Weighted Burden Number and 8 properties
b) maccs(166 keys)
c) rcdk extended graph based
d) j compound mapper library PHAP2PT3 D, PHAP3PT3D ,
CATS3D,CATS2D
None of the descriptors till now efficient to retrieve the 3D
screening results well.
But ML model provides hope because it’s classifying active and
decoys well with polykernel SVM.
32. PCA Analysis of predicted compounds
• 12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc.
html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors
and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity
XLogP, molecular weight, topological shape and geometrical shape.
35. Tools Used
• For docking and pharmacophore –
Schrodinger’s Glide and phase
• Shape based Screening – vROCS
• Performance calculation and visualization - R
statistics, ggplot2, enrichVS package.
36. More work
• Working with Design of PknG inhibitors
• Enhanced Ranking systems for better
prediction
• Automated protocol for developing enhanced
virtual screening using open source tools.
37. Acknowledgements
• Indo US science Technology Forum
• Prof P.Yogeshwari and Prof D.Sriram (BITS
Hyderabad)
• Computer Aided Drug Design Lab BITS Pilani
Hyderabad.
• Prof David J Wild
• OSDD Team