3 d virtual screening of pknb inhibitors using data

Abhik Seal
Phd Student(Chemical Informatics)
Indiana University Bloomington
http://chemin-abs.blogspot.com/
mypage.iu.edu/~abseal/
10/16/2012 abseal@indiana.edu 1

Whats Pknb ???
• Ser/Thr protein kinase (STPK) highly
conserved in Gram-positive bacteria and
apparently essential for Mycobacterial
viability.
• Essential for cell division and metabolism,
expressed in exponential growth and
overexpression causes defects in cell wall
synthesis and cell division.


PknB binding ATP pocket

Gatekeeper

Wehenkel,FEBS Letters 580 (2006) 3018–3022

Kinase inhibitor and pharmacophores

Targeting cancer with small molecule kinase inhibitors Nature Review’s Cancer Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation
2009 10/16/2012 abseal@indiana.edu Chem. 2010, 53, 2681–2694
J. Med. 4

Properties of Kinase Inhibitors

Through the “Gatekeeper Door”: Exploiting the Active Kinase Conformation J. Med. Chem. 2010, 53, 2681–2694


Some PknB inhibitors


• A data fusion algorithm accepts two or more ranked lists
and merges these lists into a single ranked list with the
aim of providing better effectiveness than all systems
used for data fusion. (Croft,2000, Chapter 1; Meng et al.,
2002).
• Another aim of the data fusion is to group existing search
services under one umbrella, as the number of existing
search services increases (Selberg & Etzioni, 1996)
• Fusion in automatic ranking of IR systems
Automatic ranking of information retrieval systems using data
fusion, Nuray & Can ’06
• Merging the retrieval results of multiple systems.
see more on wikipedia (http://en.wikipedia.org/wiki/Data_fusion)

Used By
Meta Search engines for example :
(http://en.wikipedia.org/wiki/List_of_search_engines#Metasearch_engines)

ex: www.dogpile.com,www.copernic.com,www.hotbot.com

Meta search

Engine1 Engine 2 Engine 2

D1 D2 D3

Information Resource

Workflow of meta-search

• Execute a database search for some particular target
structure using different similarity measures
• Note the rank position, R(i), of each database
structure in the ranking for the i-th similarity
measure using similarity coefficients
• Combine the various positions using a fusion rule to
give a new rank position for each database structure
• Use these fused positions to generate the final
output ranking for the search.
http://www.his.se/PageFiles/6884/Peter%20Willet%20presentation.pdf

Types of fusion for 2D similarity search
a) Similarity fusion (SF):
SF involves searching a single reference structure against a database using
multiple different similarity measures, and the output is obtained by
combining the rankings resulting from these different measures.

b) Group fusion (GF):
GF involves searching multiple reference structures against a database using a
single similarity measure, and the output is obtained by combining the
rankings resulting from these different reference structures.

Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and
precision Journal of Cheminformatics 2011, 3:29


Similarity fusion (SF)

(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5%
searches.

Holliday etal :Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
Journal of Cheminformatics 2011, 3:29

Group fusion(GF)

(a) WOMBAT top-1% searches; (b) WOMBAT top-5% searches. (a) MDDR top-1% searches; (b) MDDR top-5%
searches.


Reciprocal Rank method

• Merge compounds using only rank positions

• Rank score of compound i (j: system index)
1
r (d i )
1 pos ( d ij )
j


Reciprocal rank example
• 4 systems: A, B, C, D
documents: a, b, c, d, e, f, g
• Query results:
A={a,b,c,d}, B={a,d,b,e},
C={c,a,f,e}, D={b,g,e,f}
• r(a)=1/(1+1+1/2)=0.4
r(b)=1/(1/2+1/3+1)=0.52
• Final ranking of compounds:
(most relev) a > b > c > d > e > f > g (least relev)

Nuray, R.;Can,F. Automatic ranking of information retrieval systems using data
fusion. Information Processing and Management 42 (2006) 595–614


Sum score

The normalized scores of each ranking are
summed to get the fused score of a compound

Ranking 1 Ranking 2 Ranking 3 Sum score Rank

Compound 1 1 0.9 0.7 2.6 1

Compound 2 0.8 0.5 1 2.3 2

Compound 3 0.7 1 0.5 2.2 3

Compound 4 0.2 0 0.1 0.3 4

Compound 5 0 0.3 0 0.3 5

Sum rank

• In sum rank ranking is done based on the sum
scores the maximum score receives the
minimum rank . The ranks are then summed
and reranked.

Ranking 1 Ranking 2 Ranking 3 Sum rank Rank

Compound 1 1 10 4 15 5

Compound 2 2 5 6 13 4

Compound 3 7 4 3 14 4

Compound 4 2 3 3 8 2

Compound 5 3 2 1 6 1

Pharmacophore design

To generate the pharmacophoric features we used the energetic
pharmacophore as developed by Salam et al with presence of exclusion
spheres.

Pharmacophoric sites were automatically generated with Phase using the
default set of six chemical features: hydrogen bond acceptor (A), hydrogen
bond donor (D), hydrophobic (H), negative ionizable (N),positive ionizable
(P), and aromatic ring (R).

E-Pharmacophores

E-pharmacophore I E-pharmacophore II E-pharmacophore III

Validation of Pharmacophores
• To determine how well a hit list was for a query
compound or a pharmacophore; yield of active
compounds, enrichment factor, percentage actives and
Goodness of a Hit list (GH score) were considered.

• Also, how well a pharmacophore or any other screening
method can rank compounds “early” in a virtual
screening process using Boltzmann-enhanced
discrimination of receiver operating characteristic
(BEDROC Truchon et al) and RIE metric (Sheridan et al)
• 35 active compounds randomly sampled from 62 actives
along with 1000 decoys
(www. schrodinger.com/ glide_decoy_set).


Some formula’s


Why BEDROC ??
• Despite its early recognition sensitivity, the Enrichment Factor has
the drawback of being insensitive to the relative ranking of the
compounds in the top X% and ignoring the complete ranking of the
remaining data set.
• The ROC measure cannot identify the compounds ranked early in a
virtual screening process.

• This BEDROC metric uses an exponential decay function to reduce
the influence of lower ranked compounds on the final score. The
score has a parameter α that allows the user to adjust the definition
of the early recognition problem.

• BEDROC value for three VS methods at α=20.At α=20 implies that
80% of the the final BEDROC score is based on the first 8% of the
ranked data set.


Validation of virtual screening

a) E- pharmacophore
E-pharmacophore III was selected based on the performance measures and
also number of compounds retrieved had more than fitness 2 and also high
Goodness of Hit Score, yield of actives and specificity.

b) ROCS
All the compounds were scored and ranked according to Tanimoto combo
score parameters were selected as mentioned by Bostrom et al.

c) Glide XP
All compound were score based on the glide XP docking score. The
compound were ranked in a descending order of scores.

R13

D8

E-pharmacophore II

E-pharmacophore I

Which pharmacophore is good?

Does sites D8 and R13 important?

E-pharmacophore III

Performance measures

Method EF(1%) EF(2%) EF(5%) EF(10%) BEDROC (α=20) RIE
E-pharmacophore I 11.71 11 10.51 6.8 0.538 7.81
E-pharmacophore II 29.57 27.51 12.14 6.9 0.716 10.40
E-pharmacophore III 29.57 27.14 13.71 7.42 0.744 10.81
vROCS 29.57 26.71 13.14 7.42 0.749 10.89
GlideXP 26.71 21 11.42 6.28 0.629 9.14
Sum score 29.57 28.57 14.85 7.42 0.785 11.42
Sum rank 29.57 24.28 12 7.42 0.703 10.21
Reciprocal rank 29.57 29.57 17.14 8.85 0.875 12.73

AUC ROC results

Methods AUC(1%) AUC(2%) AUC(5%) AUC(100%)

E-pharmacophore III 0.56 0.602 0.649 0.832

vROCS 0.58 0.62 0.62 0.89

GlideXP 0.39 0.44 0.51 0.84

Sum score 0.64 0.6780 0.717 0.90

Sum rank 0.47 0.49 0.565 0.91

Reciprocal rank 0.72 0.75 0.81 0.96

Architecture
Data Preprocessing
Rescoring and Ranking
System1
Validation

System 2

Fusion Algorithms Decision

System 3

System 4


Virtual Screening of Asinex 400K compounds
Workflow
Chemical Structure Post processing Compound
Collection
3D virtual Screening
and Ranking Selection

Virtual Screening
Using Data Fusion Top 10% of the database
• 400K • Phase E Selected for for Glide XP
pharmacophore select docking
compounds top 5000 compounds
Data Fusion
from Asinex for VS in vROCs and Using Reciprocal 45 compounds
Glide SP Rank algorithm Selected after visual
Optimized • Conformer generation Inspection and
using ligprep and perfom ROCS pharmacophore mapping
• Glide SP docking

Machine Learning Models under process

• Tools used:
a)PowerMV descriptors 2D pharmacological fingerprints,
Weighted Burden Number and 8 properties
b) maccs(166 keys)
c) rcdk extended graph based
d) j compound mapper library PHAP2PT3 D, PHAP3PT3D ,
CATS3D,CATS2D

None of the descriptors till now efficient to retrieve the 3D
screening results well.
But ML model provides hope because it’s classifying active and
decoys well with polykernel SVM.

PCA Analysis of predicted compounds

• 12 different physicochemical properties are calculated using cdk ((http://rguha.net/code/ java/cdkdesc.
html) including molecular refractivity, atom polarizabilities, bond polarizabilities, hydrogen bond donors
and acceptors, petitjean number, topological polar surface area, number of rotatable bonds,liphophilicity
XLogP, molecular weight, topological shape and geometrical shape.

Hits retrieved After visual inspection and Pharmacophore
mapping

Docking of predicted compounds

Tools Used

• For docking and pharmacophore –
Schrodinger’s Glide and phase
• Shape based Screening – vROCS
• Performance calculation and visualization - R
statistics, ggplot2, enrichVS package.

More work

• Working with Design of PknG inhibitors
• Enhanced Ranking systems for better
prediction
• Automated protocol for developing enhanced
virtual screening using open source tools.

Acknowledgements

• Indo US science Technology Forum
• Prof P.Yogeshwari and Prof D.Sriram (BITS
Hyderabad)
• Computer Aided Drug Design Lab BITS Pilani
Hyderabad.
• Prof David J Wild
• OSDD Team

3 d virtual screening of pknb inhibitors using data

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a 3 d virtual screening of pknb inhibitors using data

Similar a 3 d virtual screening of pknb inhibitors using data (20)

Más de Abhik Seal

Más de Abhik Seal (20)

Último

Último (20)

3 d virtual screening of pknb inhibitors using data