SlideShare una empresa de Scribd logo
1 de 32
Tim Cheeseright, Mark Mackey, Rob Scoffin, Martin Slater

Assessing the similarity of compound collections
using molecular fields: Does it add value?



                                                                          1
Conclusions

> It works brilliantly
> All synthetic steps gave yields of 100%
> All enrichments were perfect
> All new molecules were sub nM
> All QSARs were totally predictive, q2 = 1.0


> We expect the call from Sweden any day now


                                                2
Conclusions

> Work in progress
> 3D similarity can add value to compound
  selection
> Full matrix of similarities possibly unnecessary
> Using probes looks like a possible solution
> Not a panacea




                                                     3
Agenda & Background

> Fields & similarity
> Generating screening compounds using Fields
> Selecting a 10K “diverse” library for screening
  from commercial compounds
   > Initial thoughts
   > Problems
   > More Initial thoughts
   > A solution but not a complete one
> Conclusions
                                                    4
Field Points

Condensed representation of electrostatic, hydrophobic
and shape properties (“protein‟s view”)
   > Molecular Field Extrema (“Field Points”)




       2D                3D Molecular                       Field Points
                         Electrostatic      = Positive
                        Potential (MEP)     = Negative
                                            = Shape
                                            = Hydrophobic
                                                                           5
Improved MM Electrostatics

> Field patterns from XED force field reproduce
  experimental results
        Experimental           Using XEDs         Not using XEDs




  Interaction of Acetone and
 Any-OH from small molecule
                                            XED adds ‘p-orbitals’ to
       crystal structures
                                            get better representation
                                            of atoms
                                                                   6
Non-Classical Comparisons




                            7
Molecular Alignment



             0.82




                     0.66                             0.98




                    Cheeseright et al, J. Chem Inf. Mod., 2006, 665
                                                              8
Using Fields

>   Bioisosteric groups
>   Virtual Screening
>   Pharmacophore hypothesis
>   Qualitative SAR interpretation
>   3D QSAR
>   Library Design




                                     9
Field based library design success




                                     10
Libraries from Fields

> Small, custom synthesised libraries (~100s -
  1000s compds)
> Low scaffold diversity
> Highly targeted
> Lots of manual design




                                                 11
An Opportunity & a Challenge

> Provide a small diverse screening library 10K for
  a small biotech company

  > Diversity in potential biological targets to be hit

  > Minimum redundancy in the set


  > Maximum chance of success in finding a lead within
    available budget and screening resources


                                                          12
Initial thoughts

> Customised design not an option - commercial
  compounds only
> Using Fields to successfully select compounds for
  screening performed many times
   > Virtual screening
   > Always in a specific biological context
> What about using Fields to choose a „diverse‟ set
> Possible problem with numbers
   > 10,000 cmpd library small
   > 9,000,000 commercially available molecules v. large for 3D
     diversity

                                                              13
Initial thoughts

> Compare 3D and 2D similarities for compound
  collections - are we wasting our time?
> Take a small compound collection
> Full NxN calculation
> 3D method = Fields & Shape
> 2D method = atom pairs


> Compare and Contrast

                                                14
Conformations

> 3D Method requires conformations - which
  one(s) to use?
> What is the similarity of 2 compounds in 3D ?
  > Context is important!
  > Highest across all conformations?
  > Average ?
  > Lowest ?
> For 3D, similarity calculation is Nconfs x Nconfs


                                                      15
Compound Collection

> BIONET 'Rule of Three' ('Ro3') Fragment
  Library: “7,907 'Ro3'-compliant fragments”
> Conformation hunt on every fragment 
  Maximum of 5 conformations (!)
> Full N x N similarity matrix, 3D & 2D (60 Million
  data points)


> ~30 compounds failed conformation hunting

                                                      17
Problems

> 400Mb of data
> Tedious to use and examine
Pilot study just using the first 500 compounds
   > Some chemical families in this area
   > Still a large dataset to deal with (250,000 data points)
> 2D similarities and fragments
   > Small changes cause disproportionately high changes
   > Atom pairs particularly bad
   > Switch to KNIME fingerprints
    All 2D values lower than „normal‟

                                                                18
Comparing 2D and 3D metrics


                              Agreement




                                          19
Example - Similar Scores



                2D sim = 0.9
    101                              104


               3D field sim = 0.87




                                           22
Example - Higher 3D Sim



                 2D sim = 0.1
             (other methods=0.3)


               3D field sim = 0.82




                                     23
Example - Higher 3D Sim



               2D sim = 0.2

        141                   454




               3D sim = 0.7




                                    24
Example - Higher 3D Sim



                  2D sim = 0.3

              (other methods 0.55)
        437                           440




                 3D field sim = 0.8




                                            25
So…

> Pilot study suggests some added value
> Full matrix painful even if we could calculate it

> What about a reduced matrix?
   > Use „Probe‟ compounds to tease out molecules that are
     different in Field space
   How many probes?
   Across how many molecules


> We were running out of time…

                                                             26
Compound selection by Field Diversity

> Proposed workflow for generation of a field diverse library:


     9M                                Pick 200
  commercial                                       Calc. 200 X 200
                                       sub-set
  compounds                                         2D similarity                Pick 100
                      Calc. Shape                      matrix                    Diverse
                      Diversity by                                                 Field
   Property               PMI                                                     probes
    Filters
               1.2M                   Pick 20K
                                      sub-set
                                                   Calc. 20K X 100
                                                   Field similarity
                                                       matrix


                                      Pick 12K
                                                                       3D PCA on
                                        Field
                                                                      Field matrix
                                     Diverse set
                                                                                            27
Field Diverse library: Outcome

12K „Field Diverse‟ library mapped by 3D PCA on the
100 x 20,000 „Field Similarity Fingerprint‟
               Ammoniums
               Piperidines             Distinct separation of
                                       charged species within
                                       this space



                                       ….so what!!

                     Benzoic and
                     aliphatic acids



                                                                30
Field Diverse library: Outcome

12K „Field Diverse‟ library mapped by 3D PCA


                                   Distinct separation of by
                                   molecules by size within
                                   this space



                                   ….so what!!

                    Decreasing
                       Size



                                                          31
Deeper - Moderate „Field Similarity‟

                           Alignment to „template1‟




                                                  32
Deeper - Moderate „Field Similarity‟

Random selection of mols   Alignment to „template1‟




                                                  33
Deeper - Moderate „Field Similarity‟

                           Alignment to „template‟




                                                     35
Is the chemical space sensible?

                                  Small sulphonamides




                                  Large esters




    Two example clusters                         36
Conclusions

> Work in progress
> Full similarity matrix shows potential of 3D sim to
  add value
> Full matrix difficult to handle and possibly
  unnecessary
> Using probes looks like a possible solution
> Not a panacea - still need to play the numbers
  game

                                                    37
Acknowledgements

> Cresset
  > Martin Slater
  > Rob Scoffin
  > Mark Mackey
  > James Melville
> Mission Therapeutics
  > Keith Menear




                         38

Más contenido relacionado

Similar a Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value?

Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Cresset
 
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Carlos M. Fernandes
 
Laser Beam Homogenizer
Laser Beam HomogenizerLaser Beam Homogenizer
Laser Beam HomogenizerVikram Sachan
 
Master thesispresentation
Master thesispresentationMaster thesispresentation
Master thesispresentationMatthew Urffer
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksHakky St
 
A Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model CheckingA Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model CheckingOlivier Coudert
 
IGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxIGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxgrssieee
 
IGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxIGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxgrssieee
 
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...Cresset
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structuresayubimoak
 
COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)Aniket Tekawade
 
Xerrada a Aachen l'any 2007 sobre ferrofluids
Xerrada a Aachen l'any 2007 sobre ferrofluidsXerrada a Aachen l'any 2007 sobre ferrofluids
Xerrada a Aachen l'any 2007 sobre ferrofluidsjoanjosepcerdapi
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsBita Kazemi
 
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'Cresset
 
Lung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image ClassificationLung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image ClassificationShreshth Saxena
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Arinto Murdopo
 
Class 21: Changing State
Class 21: Changing StateClass 21: Changing State
Class 21: Changing StateDavid Evans
 

Similar a Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value? (20)

Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
Mark Mackey, Cresset, 'Meet Molecular Architect, A new product for understand...
 
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
Towards a 2-dimensional Self-organized Framework for Structured Population-ba...
 
Laser Beam Homogenizer
Laser Beam HomogenizerLaser Beam Homogenizer
Laser Beam Homogenizer
 
Master thesispresentation
Master thesispresentationMaster thesispresentation
Master thesispresentation
 
Reducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networksReducing the dimensionality of data with neural networks
Reducing the dimensionality of data with neural networks
 
A Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model CheckingA Performance Study of BDD-Based Model Checking
A Performance Study of BDD-Based Model Checking
 
IGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxIGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptx
 
IGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptxIGARSS-SAR-Pritt.pptx
IGARSS-SAR-Pritt.pptx
 
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
Tim Cheeseright, Cresset, 'Introducing Fragment Growing in FieldStere and oth...
 
Fullprof Refinement
Fullprof RefinementFullprof Refinement
Fullprof Refinement
 
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal StructuresLarge Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
Large Scale Parallel FDTD Simulation of Full 3D Photonic Crystal Structures
 
COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)COMSOL Training Series (NNMDC Initiative)
COMSOL Training Series (NNMDC Initiative)
 
Talk at SMASH 2011
Talk at SMASH 2011  Talk at SMASH 2011
Talk at SMASH 2011
 
Xerrada a Aachen l'any 2007 sobre ferrofluids
Xerrada a Aachen l'any 2007 sobre ferrofluidsXerrada a Aachen l'any 2007 sobre ferrofluids
Xerrada a Aachen l'any 2007 sobre ferrofluids
 
Distributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasetsDistributed approximate spectral clustering for large scale datasets
Distributed approximate spectral clustering for large scale datasets
 
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
 
Lung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image ClassificationLung Cancer Prediction using Image Classification
Lung Cancer Prediction using Image Classification
 
Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services Intelligent Placement of Datacenter for Internet Services
Intelligent Placement of Datacenter for Internet Services
 
Resolution
ResolutionResolution
Resolution
 
Class 21: Changing State
Class 21: Changing StateClass 21: Changing State
Class 21: Changing State
 

Más de Cresset

Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Cresset
 
Organic converstions: an aid in perspective
Organic converstions: an aid in perspectiveOrganic converstions: an aid in perspective
Organic converstions: an aid in perspectiveCresset
 
Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...Cresset
 
Knowledge-based chemical fragment analysis in protein binding sites
Knowledge-based chemical fragment analysis in protein binding sitesKnowledge-based chemical fragment analysis in protein binding sites
Knowledge-based chemical fragment analysis in protein binding sitesCresset
 
Using waterswap to predict and understand binding affinities
Using waterswap to predict and understand binding affinitiesUsing waterswap to predict and understand binding affinities
Using waterswap to predict and understand binding affinitiesCresset
 
Smart drug re-profiling using computational chemistry tools novel biology and...
Smart drug re-profiling using computational chemistry tools novel biology and...Smart drug re-profiling using computational chemistry tools novel biology and...
Smart drug re-profiling using computational chemistry tools novel biology and...Cresset
 
New features in cresst products
New features in cresst productsNew features in cresst products
New features in cresst productsCresset
 
Comparing the electrostatic properties of protein active sites and other cres...
Comparing the electrostatic properties of protein active sites and other cres...Comparing the electrostatic properties of protein active sites and other cres...
Comparing the electrostatic properties of protein active sites and other cres...Cresset
 
Torch for medicinal chemists
Torch for medicinal chemistsTorch for medicinal chemists
Torch for medicinal chemistsCresset
 
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Cresset
 
Smart drug re profiling using computational chemistry tools novel biology and...
Smart drug re profiling using computational chemistry tools novel biology and...Smart drug re profiling using computational chemistry tools novel biology and...
Smart drug re profiling using computational chemistry tools novel biology and...Cresset
 
Intelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spIntelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spCresset
 
Intelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spIntelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spCresset
 
Cresset: 25 year of Fields
Cresset: 25 year of FieldsCresset: 25 year of Fields
Cresset: 25 year of FieldsCresset
 
Rob Scoffin, Cresset, 'The Cresset Future'
Rob Scoffin, Cresset, 'The Cresset Future'Rob Scoffin, Cresset, 'The Cresset Future'
Rob Scoffin, Cresset, 'The Cresset Future'Cresset
 
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'Cresset
 
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...Cresset
 
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?Cresset
 

Más de Cresset (18)

Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
 
Organic converstions: an aid in perspective
Organic converstions: an aid in perspectiveOrganic converstions: an aid in perspective
Organic converstions: an aid in perspective
 
Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...Identification of novel potential anti cancer agents using network pharmacolo...
Identification of novel potential anti cancer agents using network pharmacolo...
 
Knowledge-based chemical fragment analysis in protein binding sites
Knowledge-based chemical fragment analysis in protein binding sitesKnowledge-based chemical fragment analysis in protein binding sites
Knowledge-based chemical fragment analysis in protein binding sites
 
Using waterswap to predict and understand binding affinities
Using waterswap to predict and understand binding affinitiesUsing waterswap to predict and understand binding affinities
Using waterswap to predict and understand binding affinities
 
Smart drug re-profiling using computational chemistry tools novel biology and...
Smart drug re-profiling using computational chemistry tools novel biology and...Smart drug re-profiling using computational chemistry tools novel biology and...
Smart drug re-profiling using computational chemistry tools novel biology and...
 
New features in cresst products
New features in cresst productsNew features in cresst products
New features in cresst products
 
Comparing the electrostatic properties of protein active sites and other cres...
Comparing the electrostatic properties of protein active sites and other cres...Comparing the electrostatic properties of protein active sites and other cres...
Comparing the electrostatic properties of protein active sites and other cres...
 
Torch for medicinal chemists
Torch for medicinal chemistsTorch for medicinal chemists
Torch for medicinal chemists
 
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
Discovery and optimization of novel small molecule HIV-1 entry inhibitors usi...
 
Smart drug re profiling using computational chemistry tools novel biology and...
Smart drug re profiling using computational chemistry tools novel biology and...Smart drug re profiling using computational chemistry tools novel biology and...
Smart drug re profiling using computational chemistry tools novel biology and...
 
Intelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spIntelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond sp
 
Intelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond spIntelligent library design for protein families and beyond sp
Intelligent library design for protein families and beyond sp
 
Cresset: 25 year of Fields
Cresset: 25 year of FieldsCresset: 25 year of Fields
Cresset: 25 year of Fields
 
Rob Scoffin, Cresset, 'The Cresset Future'
Rob Scoffin, Cresset, 'The Cresset Future'Rob Scoffin, Cresset, 'The Cresset Future'
Rob Scoffin, Cresset, 'The Cresset Future'
 
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'
Chris Ullman, Isogenica, 'The use of CIS display for drug discovery'
 
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...
Simon McIntosh-Smith, University of Bristol, 'Accelerating molecular docking ...
 
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?
Raphael Geney, Galapagos, H-bond strength predictions: Could we do better?
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

Tim Cheeseright, Assessing the Similarities of Compound collections using molecular fields: Does it add value?

  • 1. Tim Cheeseright, Mark Mackey, Rob Scoffin, Martin Slater Assessing the similarity of compound collections using molecular fields: Does it add value? 1
  • 2. Conclusions > It works brilliantly > All synthetic steps gave yields of 100% > All enrichments were perfect > All new molecules were sub nM > All QSARs were totally predictive, q2 = 1.0 > We expect the call from Sweden any day now 2
  • 3. Conclusions > Work in progress > 3D similarity can add value to compound selection > Full matrix of similarities possibly unnecessary > Using probes looks like a possible solution > Not a panacea 3
  • 4. Agenda & Background > Fields & similarity > Generating screening compounds using Fields > Selecting a 10K “diverse” library for screening from commercial compounds > Initial thoughts > Problems > More Initial thoughts > A solution but not a complete one > Conclusions 4
  • 5. Field Points Condensed representation of electrostatic, hydrophobic and shape properties (“protein‟s view”) > Molecular Field Extrema (“Field Points”) 2D 3D Molecular Field Points Electrostatic = Positive Potential (MEP) = Negative = Shape = Hydrophobic 5
  • 6. Improved MM Electrostatics > Field patterns from XED force field reproduce experimental results Experimental Using XEDs Not using XEDs Interaction of Acetone and Any-OH from small molecule XED adds ‘p-orbitals’ to crystal structures get better representation of atoms 6
  • 8. Molecular Alignment 0.82 0.66 0.98 Cheeseright et al, J. Chem Inf. Mod., 2006, 665 8
  • 9. Using Fields > Bioisosteric groups > Virtual Screening > Pharmacophore hypothesis > Qualitative SAR interpretation > 3D QSAR > Library Design 9
  • 10. Field based library design success 10
  • 11. Libraries from Fields > Small, custom synthesised libraries (~100s - 1000s compds) > Low scaffold diversity > Highly targeted > Lots of manual design 11
  • 12. An Opportunity & a Challenge > Provide a small diverse screening library 10K for a small biotech company > Diversity in potential biological targets to be hit > Minimum redundancy in the set > Maximum chance of success in finding a lead within available budget and screening resources 12
  • 13. Initial thoughts > Customised design not an option - commercial compounds only > Using Fields to successfully select compounds for screening performed many times > Virtual screening > Always in a specific biological context > What about using Fields to choose a „diverse‟ set > Possible problem with numbers > 10,000 cmpd library small > 9,000,000 commercially available molecules v. large for 3D diversity 13
  • 14. Initial thoughts > Compare 3D and 2D similarities for compound collections - are we wasting our time? > Take a small compound collection > Full NxN calculation > 3D method = Fields & Shape > 2D method = atom pairs > Compare and Contrast 14
  • 15. Conformations > 3D Method requires conformations - which one(s) to use? > What is the similarity of 2 compounds in 3D ? > Context is important! > Highest across all conformations? > Average ? > Lowest ? > For 3D, similarity calculation is Nconfs x Nconfs 15
  • 16. Compound Collection > BIONET 'Rule of Three' ('Ro3') Fragment Library: “7,907 'Ro3'-compliant fragments” > Conformation hunt on every fragment  Maximum of 5 conformations (!) > Full N x N similarity matrix, 3D & 2D (60 Million data points) > ~30 compounds failed conformation hunting 17
  • 17. Problems > 400Mb of data > Tedious to use and examine Pilot study just using the first 500 compounds > Some chemical families in this area > Still a large dataset to deal with (250,000 data points) > 2D similarities and fragments > Small changes cause disproportionately high changes > Atom pairs particularly bad > Switch to KNIME fingerprints  All 2D values lower than „normal‟ 18
  • 18. Comparing 2D and 3D metrics Agreement 19
  • 19. Example - Similar Scores 2D sim = 0.9 101 104 3D field sim = 0.87 22
  • 20. Example - Higher 3D Sim 2D sim = 0.1 (other methods=0.3) 3D field sim = 0.82 23
  • 21. Example - Higher 3D Sim 2D sim = 0.2 141 454 3D sim = 0.7 24
  • 22. Example - Higher 3D Sim 2D sim = 0.3 (other methods 0.55) 437 440 3D field sim = 0.8 25
  • 23. So… > Pilot study suggests some added value > Full matrix painful even if we could calculate it > What about a reduced matrix? > Use „Probe‟ compounds to tease out molecules that are different in Field space How many probes? Across how many molecules > We were running out of time… 26
  • 24. Compound selection by Field Diversity > Proposed workflow for generation of a field diverse library: 9M Pick 200 commercial Calc. 200 X 200 sub-set compounds 2D similarity Pick 100 Calc. Shape matrix Diverse Diversity by Field Property PMI probes Filters 1.2M Pick 20K sub-set Calc. 20K X 100 Field similarity matrix Pick 12K 3D PCA on Field Field matrix Diverse set 27
  • 25. Field Diverse library: Outcome 12K „Field Diverse‟ library mapped by 3D PCA on the 100 x 20,000 „Field Similarity Fingerprint‟ Ammoniums Piperidines Distinct separation of charged species within this space ….so what!! Benzoic and aliphatic acids 30
  • 26. Field Diverse library: Outcome 12K „Field Diverse‟ library mapped by 3D PCA Distinct separation of by molecules by size within this space ….so what!! Decreasing Size 31
  • 27. Deeper - Moderate „Field Similarity‟ Alignment to „template1‟ 32
  • 28. Deeper - Moderate „Field Similarity‟ Random selection of mols Alignment to „template1‟ 33
  • 29. Deeper - Moderate „Field Similarity‟ Alignment to „template‟ 35
  • 30. Is the chemical space sensible? Small sulphonamides Large esters Two example clusters 36
  • 31. Conclusions > Work in progress > Full similarity matrix shows potential of 3D sim to add value > Full matrix difficult to handle and possibly unnecessary > Using probes looks like a possible solution > Not a panacea - still need to play the numbers game 37
  • 32. Acknowledgements > Cresset > Martin Slater > Rob Scoffin > Mark Mackey > James Melville > Mission Therapeutics > Keith Menear 38

Notas del editor

  1. Notes:The 2D drawing of a molecule gives limited information about its nature – in real life, molecules take on a 3D geometry whose nature can’t be truly represented by a flat cartoon.Consider the electrostatic potential surrounding a molecule and map that potential out to a surface as shown in the second figure. Field Points are points that are placed at the extrema of the MEP, with the point size governed by the size of the electrostatic contribution.Spatial points are also included at the van der Walls radii extrema.
  2. 1) Commercial databases 9 million filtered for Heavy atom count:  >11 < 30 correspond to roughly  Mwt >140  < 500  (4,655,051 cpds)  (2) Further filtered for rotatable bond count < 5  reactive group filters applied (removes nasties like aldehydes, ketones, hydrazones, alkylhalides, isocyanates, nitrosyl etc… see below for full list), charge filters < 3 formal charges neg. or positive.    (1,282,042 mols passed these filters). (3) For this list of compounds we intend to calculate logP, HBA, HBD, PMI and shadow indices and select 20K on shape diversity. I believe this is going to be a reasonable approximation of field similarity since fields are also heavily dependent on 3D conformation. (4) From this data we also intend to pick 100 probe molecules and use these to calculate similarity v the 20K set. This gives a 20K set each with a 100 bit field fingerprint.  This is the equivalent of a completing a 2M virtual screen. (5) This fingerprint can be subjected to a PCA analysis to reduce the data effectively to a 3 dimensional ‘field space’ from which a diverse 12 K set can be chosen. From a practical point of view it will be difficult to expand this process to a bigger data set although if 3d shape sim correlates well with Field sim then the PMI selection may be enough – we simply don’t know until we do the experiment.  (6) We will provide the 12K SD file set for you to purchase with 2000 cpd redundancy for those which are  not available or too expensive etc. (a) filtered on properties and nasty functionality to obtain a 1.2 million compound data set.  (b) On this set we ran a PMI shape descriptor calculation on a single ‘lowest energy’ conformation for each molecule in the set. (c) From this we picked a 20K shape diverse set using the PMI defined shape space.  (d)From the 20K set I picked a diverse 200 cpd set in the same way.(e) We applied to this 200 an all by all 2D similarity matrix ‘200 by 200’ we could then ensure 2D dissimilarity in the choice of a set of a 100 probe molecules. (f) These 100 probe molecules were used as templates to measure Field similarity against each of the 20K cpds and thus produce a 100 bit number for each of the 20K cpds.                (g) From the Field similarity matrix we collapsed the ‘ ~20000 X 100’ matrix to ~20000 X 3 dimensions using PCA to define the 3D fieldspace.                (h) 12k Field diverse compounds were selected from this 3D Fieldspace.
  3. Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In Fieldstere
  4. Should have probably done a 200 X 200 field similarity at this stage to ensure picking field diverse probes? But 2d disim also ensured we were avoiding picking too similar chemotypes for the probes – probably doesn’t matter. Theoretically, field based metrics should be a good way to assess the similarity/diversity of fragment collections?? Diversity of fragment databases?? In FieldstereNever tried using a smaller number of probes – could increase/decrease discrimination?
  5. Picked a cluster set from the space 3D PCA – selected an arbitrary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
  6. Picked a cluster set from the space 3D PCA – selected an arbitary conformer then flexibly aligned (Falign) the rest – plot surface. Bottom 8 Fsim less than 6
  7. Againselected an arbitary conformer (different one this time) then flexibly aligned (Falign) the rest – plot surface. Bottom 5 Fsim less than 6
  8. Picked a second cluster and repeated with another Arbitary template – Fsims all > 6 discarded 4 which were below 6. – Cluster still OKConclude: Evenin this space - clusters of close field similarity are still fairly diverse!!
  9. Separation of chemically intuitive groupings – DHP-like esters/lactones………….compact sulphonamides – clusters on periphery are truly Field dissimilar.