1) The document discusses using molecular fields to select a diverse set of compounds from a large commercial library to create a screening library of 10,000 compounds.
2) A pilot study found some added value to 3D molecular field similarity comparisons over 2D, but calculating a full similarity matrix was impractical.
3) The authors propose an approach using a reduced set of "probe" compounds to identify molecules with different fields, potentially eliminating the need for a full matrix calculation. However, more work is needed to optimize the method and evaluate if it can successfully generate a diverse library.
Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value?
1. Tim Cheeseright, Mark Mackey, Rob Scoffin, Martin Slater
Assessing the similarity of compound collections
using molecular fields: Does it add value?
1
2. Conclusions
> It works brilliantly
> All synthetic steps gave yields of 100%
> All enrichments were perfect
> All new molecules were sub nM
> All QSARs were totally predictive, q2 = 1.0
> We expect the call from Sweden any day now
2
3. Conclusions
> Work in progress
> 3D similarity can add value to compound
selection
> Full matrix of similarities possibly unnecessary
> Using probes looks like a possible solution
> Not a panacea
3
4. Agenda & Background
> Fields & similarity
> Generating screening compounds using Fields
> Selecting a 10K “diverse” library for screening
from commercial compounds
> Initial thoughts
> Problems
> More Initial thoughts
> A solution but not a complete one
> Conclusions
4
5. Field Points
Condensed representation of electrostatic, hydrophobic
and shape properties (“protein‟s view”)
> Molecular Field Extrema (“Field Points”)
2D 3D Molecular Field Points
Electrostatic = Positive
Potential (MEP) = Negative
= Shape
= Hydrophobic
5
6. Improved MM Electrostatics
> Field patterns from XED force field reproduce
experimental results
Experimental Using XEDs Not using XEDs
Interaction of Acetone and
Any-OH from small molecule
XED adds ‘p-orbitals’ to
crystal structures
get better representation
of atoms
6
12. An Opportunity & a Challenge
> Provide a small diverse screening library 10K for
a small biotech company
> Diversity in potential biological targets to be hit
> Minimum redundancy in the set
> Maximum chance of success in finding a lead within
available budget and screening resources
12
13. Initial thoughts
> Customised design not an option - commercial
compounds only
> Using Fields to successfully select compounds for
screening performed many times
> Virtual screening
> Always in a specific biological context
> What about using Fields to choose a „diverse‟ set
> Possible problem with numbers
> 10,000 cmpd library small
> 9,000,000 commercially available molecules v. large for 3D
diversity
13
14. Initial thoughts
> Compare 3D and 2D similarities for compound
collections - are we wasting our time?
> Take a small compound collection
> Full NxN calculation
> 3D method = Fields & Shape
> 2D method = atom pairs
> Compare and Contrast
14
15. Conformations
> 3D Method requires conformations - which
one(s) to use?
> What is the similarity of 2 compounds in 3D ?
> Context is important!
> Highest across all conformations?
> Average ?
> Lowest ?
> For 3D, similarity calculation is Nconfs x Nconfs
15
16. Compound Collection
> BIONET 'Rule of Three' ('Ro3') Fragment
Library: “7,907 'Ro3'-compliant fragments”
> Conformation hunt on every fragment
Maximum of 5 conformations (!)
> Full N x N similarity matrix, 3D & 2D (60 Million
data points)
> ~30 compounds failed conformation hunting
17
17. Problems
> 400Mb of data
> Tedious to use and examine
Pilot study just using the first 500 compounds
> Some chemical families in this area
> Still a large dataset to deal with (250,000 data points)
> 2D similarities and fragments
> Small changes cause disproportionately high changes
> Atom pairs particularly bad
> Switch to KNIME fingerprints
All 2D values lower than „normal‟
18
22. Example - Higher 3D Sim
2D sim = 0.3
(other methods 0.55)
437 440
3D field sim = 0.8
25
23. So…
> Pilot study suggests some added value
> Full matrix painful even if we could calculate it
> What about a reduced matrix?
> Use „Probe‟ compounds to tease out molecules that are
different in Field space
How many probes?
Across how many molecules
> We were running out of time…
26
24. Compound selection by Field Diversity
> Proposed workflow for generation of a field diverse library:
9M Pick 200
commercial Calc. 200 X 200
sub-set
compounds 2D similarity Pick 100
Calc. Shape matrix Diverse
Diversity by Field
Property PMI probes
Filters
1.2M Pick 20K
sub-set
Calc. 20K X 100
Field similarity
matrix
Pick 12K
3D PCA on
Field
Field matrix
Diverse set
27
25. Field Diverse library: Outcome
12K „Field Diverse‟ library mapped by 3D PCA on the
100 x 20,000 „Field Similarity Fingerprint‟
Ammoniums
Piperidines Distinct separation of
charged species within
this space
….so what!!
Benzoic and
aliphatic acids
30
26. Field Diverse library: Outcome
12K „Field Diverse‟ library mapped by 3D PCA
Distinct separation of by
molecules by size within
this space
….so what!!
Decreasing
Size
31
30. Is the chemical space sensible?
Small sulphonamides
Large esters
Two example clusters 36
31. Conclusions
> Work in progress
> Full similarity matrix shows potential of 3D sim to
add value
> Full matrix difficult to handle and possibly
unnecessary
> Using probes looks like a possible solution
> Not a panacea - still need to play the numbers
game
37
32. Acknowledgements
> Cresset
> Martin Slater
> Rob Scoffin
> Mark Mackey
> James Melville
> Mission Therapeutics
> Keith Menear
38
Notas del editor
Notes:The 2D drawing of a molecule gives limited information about its nature – in real life, molecules take on a 3D geometry whose nature can’t be truly represented by a flat cartoon.Consider the electrostatic potential surrounding a molecule and map that potential out to a surface as shown in the second figure. Field Points are points that are placed at the extrema of the MEP, with the point size governed by the size of the electrostatic contribution.Spatial points are also included at the van der Walls radii extrema.
1) Commercial databases 9 million filtered for Heavy atom count: >11 < 30 correspond to roughly Mwt >140 < 500 (4,655,051 cpds) (2) Further filtered for rotatable bond count < 5 reactive group filters applied (removes nasties like aldehydes, ketones, hydrazones, alkylhalides, isocyanates, nitrosyl etc… see below for full list), charge filters < 3 formal charges neg. or positive. (1,282,042 mols passed these filters). (3) For this list of compounds we intend to calculate logP, HBA, HBD, PMI and shadow indices and select 20K on shape diversity. I believe this is going to be a reasonable approximation of field similarity since fields are also heavily dependent on 3D conformation. (4) From this data we also intend to pick 100 probe molecules and use these to calculate similarity v the 20K set. This gives a 20K set each with a 100 bit field fingerprint. This is the equivalent of a completing a 2M virtual screen. (5) This fingerprint can be subjected to a PCA analysis to reduce the data effectively to a 3 dimensional ‘field space’ from which a diverse 12 K set can be chosen. From a practical point of view it will be difficult to expand this process to a bigger data set although if 3d shape sim correlates well with Field sim then the PMI selection may be enough – we simply don’t know until we do the experiment. (6) We will provide the 12K SD file set for you to purchase with 2000 cpd redundancy for those which are not available or too expensive etc. (a) filtered on properties and nasty functionality to obtain a 1.2 million compound data set. (b) On this set we ran a PMI shape descriptor calculation on a single ‘lowest energy’ conformation for each molecule in the set. (c) From this we picked a 20K shape diverse set using the PMI defined shape space. (d)From the 20K set I picked a diverse 200 cpd set in the same way.(e) We applied to this 200 an all by all 2D similarity matrix ‘200 by 200’ we could then ensure 2D dissimilarity in the choice of a set of a 100 probe molecules. (f) These 100 probe molecules were used as templates to measure Field similarity against each of the 20K cpds and thus produce a 100 bit number for each of the 20K cpds. (g) From the Field similarity matrix we collapsed the ‘ ~20000 X 100’ matrix to ~20000 X 3 dimensions using PCA to define the 3D fieldspace. (h) 12k Field diverse compounds were selected from this 3D Fieldspace.
Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In Fieldstere
Should have probably done a 200 X 200 field similarity at this stage to ensure picking field diverse probes? But 2d disim also ensured we were avoiding picking too similar chemotypes for the probes – probably doesn’t matter. Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In FieldstereNever tried using a smaller number of probes – could increase/decrease discrimination?
Picked a cluster set from the space 3D PCA – selected an arbitrary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
Picked a cluster set from the space 3D PCA – selected an arbitary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
Againselected an arbitary conformer (different one this time) then flexibly aligned (Falign) the rest – plot surface. Bottom 5 Fsim less than 6
Picked a second cluster and repeated with another Arbitary template – Fsims all > 6 discarded 4 which were below 6. – Cluster still OKConclude: Evenin this space - clusters of close field similarity are still fairly diverse!!
Separation of chemically intuitive groupings – DHP-like esters/lactones………….compact sulphonamides – clusters on periphery are truly Field dissimilar.