SlideShare una empresa de Scribd logo
1 de 39
Library design and series selection by Pareto ranking Willem P. van Hoorn & Robert T Smith Pfizer Global Research and Development Sandwich  United Kingdom [email_address]
[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Content
Pareto ranking, the art of compromise You want to be here (high X, high Y) X Y But these are your compounds Two gold compounds are better than the black compound, it is ‘dominated’ as would all other compounds in the shaded area 5 compounds are special (gold): Going from one to the other, you can improve X or Y but not both. They are best compromises (Pareto front)
Why not applying cut-offs? Y > cut-off X > cut-off Applying cut-offs results in settling 5 times for nearly the same (mediocre) compromise 5 compounds on the Pareto front on average make the same compromise, but sample much more space
Pareto ranking, a summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Increasing the speed of Pareto ranking. 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Increasing the speed of Pareto ranking. 2 Worst value of Y Worst value of X Approximate the 2D Pareto front as a circle. Best compounds are furthest away from worst (X,Y) Linear scaling calculation: R 2  =   X 2  +   Y 2 Distance also defined for higher dimensions
How the R2 approximation works in practise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What R2 approximation missed (blue) What was included instead (red) How the R2 approximation works in practise
Dealing with large virtual libraries ,[object Object],10M in ~42 min  10M in ~59 min  ,[object Object],Mw filter One Bayesian model 10M in ~9.5 hours
Dealing with large virtual libraries by random sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
[object Object],[object Object],[object Object],[object Object],[object Object],Example of R2 ranking / random product sampling
Random 3.2% subset of VRXN-3-00352 coloured by R2 1116 ranked B (by avg R2) 314 ranked C (by avg R2 ) Best 100 x 100
Full VL of VRXN-3-00352 with top 14 Pareto fronts In red: Pareto ranked compounds (11327) In blue: the rest (Pareto front >= 15) (This takes ~weekend to calculate) 1116 ranked B (by avg R2 of all) 314 ranked C (by avg R2 of all)
Top 100 monomers R2 sampled versus true ranks R2 rank B (full 350k) R2 subset sampled rank B R2 rank C (full 350k) R2 subset sampled rank C In common:  83 Best rank of missing:  52 Worst rank of replacement:  242 In common:  93 Best rank of missing:  83 Worst rank of replacement:  145 100 100 100 100
Compounds found by R2 sampling per Pareto front Pareto front:  1  2  3  4  5 Contains:  171  246  337  445  523 Success rate:  94%  82%  78%  71%  58% Compounds found by sampling and enumerating 100 x 100 monomers Compounds not found by above A typical design contains ~100-200 compounds
Recent libraries designed using Pareto ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What has Pareto done for me? X score Y score Z score Top graphs: Model score distribution full VL, lower graphs: first Pareto front
VRXN-3-00582  results WP001398: difficult chemistry, only 22 compounds were made… X / Z selectivity X IC50 = 770 nM Y IC50 = 9   M Z IC50 = 3.7   M X IC50 = 997 nM Y IC50 = 7.5   M Z IC50 = 11   M Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Y selectivity > 5fold selective (5) Inverse selective (2) Inactive (14) Non-selective (1)
VRXN-3-00352  results WP001524: 77 compounds were made Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Z selectivity X / Y selectivity X over Y = 40 fold X over Z = 5.6 fold X over Y = 7 fold X over Z = 13 fold 8 compounds with the desired profile.
How were these series found from HTS hits? Nightly scheduled download: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Measured activity tracks with Bayesian model score Bayesian score: target X Y Red: IC50 > 2500 nM (inactive) Yellow: IC50 <= 2500 nM (moderate active) Blue: IC50 <= 250 nM (active) Colour by X activity Colour by Y activity Bayesian score: target X
Bayesian score: X Y Outside top 10k (light grey) Top 10k Pareto ranked, but no experimental selectivity (dark grey) Red: < 10fold selective Yellow: < 50fold selective  Blue: >= 50fold selective Area with highest predicted selectivity includes multiple greys Predicted vs experimental selectivity X over Y
Area with highest predicted selectivity has highest proportion truly selective compounds. Bayesian score: X Y Predicted vs experimental selectivity X over Y Good (>50) Bad (<10) Moderate (<50)
Nearly identical picture for X over Z X/Z selectivity X/Z selectivity
Ranking of series ,[object Object],[object Object],[object Object],[object Object],X Y Compounds Series Best Series Colour by library ID
Results ,[object Object],[object Object],X Y ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
http://www.sciencemag.org/cgi/content/abstract/324/5923/85 http://www.nytimes.com/2006/07/18/technology/18model.html July 18, 2006  More of science and decision making can automated Science  3 April 2009: Vol. 324. no. 5923, pp. 85 - 89 Pfizer sponsors a PhD position here. Interested? Contact me or Ross King (rdk@aber.ac.uk)
Spare slides etc
Pareto library design (Pfool) workflow PGVL hub ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Engine:  Pipeline Pilot services monomer processing and enumeration
Pfool is started as Pipeline Pilot webservice
Pfool input ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TargetX Bayesian model Send to Spotfire
Mw AlogP Filter monomers in Spotfire Availability in Pfizer stores: Protecting groups:
Product enumeration parameters
Start the enumeration / Pareto ranking of products Job can be retrieved via: Job has successfully started You can log off when see this
Accessing existing designs (Re)send original monomers to Spotfire Send designed products to Spotfire (Re)send filtered monomers to Spotfire TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model
Accessing existing jobs Create file for Pfizer in-house tool TargetX Bayesian model TargetY Bayesian model
Finish / register design in PGVL hub

Más contenido relacionado

Similar a Library design and series selection by Pareto ranking

MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
butest
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
Gaston Liberman
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
butest
 

Similar a Library design and series selection by Pareto ranking (20)

Blast Algorithm
Blast AlgorithmBlast Algorithm
Blast Algorithm
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Poggi analytics - star - 1a
Poggi   analytics - star - 1aPoggi   analytics - star - 1a
Poggi analytics - star - 1a
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Steven K Allott - Effective Testing - SoftTest Ireland
Steven K Allott - Effective Testing - SoftTest IrelandSteven K Allott - Effective Testing - SoftTest Ireland
Steven K Allott - Effective Testing - SoftTest Ireland
 
R for Statistical Computing
R for Statistical ComputingR for Statistical Computing
R for Statistical Computing
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Parfionovas_Interface_2008
Parfionovas_Interface_2008Parfionovas_Interface_2008
Parfionovas_Interface_2008
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVATopic Set Size Design with Variance Estimates from Two-Way ANOVA
Topic Set Size Design with Variance Estimates from Two-Way ANOVA
 
How to validate your model
How to validate your modelHow to validate your model
How to validate your model
 
Optimization techniques
Optimization techniquesOptimization techniques
Optimization techniques
 
Fractional factorial design tutorial
Fractional factorial design tutorialFractional factorial design tutorial
Fractional factorial design tutorial
 
Chapter12
Chapter12Chapter12
Chapter12
 
An Introduction to boosting
An Introduction to boostingAn Introduction to boosting
An Introduction to boosting
 
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
QMC: Transition Workshop - Small Sample Statistical Analysis and Algorithms f...
 
TDD Training
TDD TrainingTDD Training
TDD Training
 
Logistic Regression in Case-Control Study
Logistic Regression in Case-Control StudyLogistic Regression in Case-Control Study
Logistic Regression in Case-Control Study
 

Library design and series selection by Pareto ranking

  • 1. Library design and series selection by Pareto ranking Willem P. van Hoorn & Robert T Smith Pfizer Global Research and Development Sandwich United Kingdom [email_address]
  • 2.
  • 3. Pareto ranking, the art of compromise You want to be here (high X, high Y) X Y But these are your compounds Two gold compounds are better than the black compound, it is ‘dominated’ as would all other compounds in the shaded area 5 compounds are special (gold): Going from one to the other, you can improve X or Y but not both. They are best compromises (Pareto front)
  • 4. Why not applying cut-offs? Y > cut-off X > cut-off Applying cut-offs results in settling 5 times for nearly the same (mediocre) compromise 5 compounds on the Pareto front on average make the same compromise, but sample much more space
  • 5.
  • 6.
  • 7. Increasing the speed of Pareto ranking. 2 Worst value of Y Worst value of X Approximate the 2D Pareto front as a circle. Best compounds are furthest away from worst (X,Y) Linear scaling calculation: R 2 =  X 2 +  Y 2 Distance also defined for higher dimensions
  • 8.
  • 9. What R2 approximation missed (blue) What was included instead (red) How the R2 approximation works in practise
  • 10.
  • 11.
  • 12.
  • 13. Random 3.2% subset of VRXN-3-00352 coloured by R2 1116 ranked B (by avg R2) 314 ranked C (by avg R2 ) Best 100 x 100
  • 14. Full VL of VRXN-3-00352 with top 14 Pareto fronts In red: Pareto ranked compounds (11327) In blue: the rest (Pareto front >= 15) (This takes ~weekend to calculate) 1116 ranked B (by avg R2 of all) 314 ranked C (by avg R2 of all)
  • 15. Top 100 monomers R2 sampled versus true ranks R2 rank B (full 350k) R2 subset sampled rank B R2 rank C (full 350k) R2 subset sampled rank C In common: 83 Best rank of missing: 52 Worst rank of replacement: 242 In common: 93 Best rank of missing: 83 Worst rank of replacement: 145 100 100 100 100
  • 16. Compounds found by R2 sampling per Pareto front Pareto front: 1 2 3 4 5 Contains: 171 246 337 445 523 Success rate: 94% 82% 78% 71% 58% Compounds found by sampling and enumerating 100 x 100 monomers Compounds not found by above A typical design contains ~100-200 compounds
  • 17.
  • 18. What has Pareto done for me? X score Y score Z score Top graphs: Model score distribution full VL, lower graphs: first Pareto front
  • 19. VRXN-3-00582 results WP001398: difficult chemistry, only 22 compounds were made… X / Z selectivity X IC50 = 770 nM Y IC50 = 9  M Z IC50 = 3.7  M X IC50 = 997 nM Y IC50 = 7.5  M Z IC50 = 11  M Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Y selectivity > 5fold selective (5) Inverse selective (2) Inactive (14) Non-selective (1)
  • 20. VRXN-3-00352 results WP001524: 77 compounds were made Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Z selectivity X / Y selectivity X over Y = 40 fold X over Z = 5.6 fold X over Y = 7 fold X over Z = 13 fold 8 compounds with the desired profile.
  • 21.
  • 22. Measured activity tracks with Bayesian model score Bayesian score: target X Y Red: IC50 > 2500 nM (inactive) Yellow: IC50 <= 2500 nM (moderate active) Blue: IC50 <= 250 nM (active) Colour by X activity Colour by Y activity Bayesian score: target X
  • 23. Bayesian score: X Y Outside top 10k (light grey) Top 10k Pareto ranked, but no experimental selectivity (dark grey) Red: < 10fold selective Yellow: < 50fold selective Blue: >= 50fold selective Area with highest predicted selectivity includes multiple greys Predicted vs experimental selectivity X over Y
  • 24. Area with highest predicted selectivity has highest proportion truly selective compounds. Bayesian score: X Y Predicted vs experimental selectivity X over Y Good (>50) Bad (<10) Moderate (<50)
  • 25. Nearly identical picture for X over Z X/Z selectivity X/Z selectivity
  • 26.
  • 27.
  • 28.
  • 29. http://www.sciencemag.org/cgi/content/abstract/324/5923/85 http://www.nytimes.com/2006/07/18/technology/18model.html July 18, 2006 More of science and decision making can automated Science 3 April 2009: Vol. 324. no. 5923, pp. 85 - 89 Pfizer sponsors a PhD position here. Interested? Contact me or Ross King (rdk@aber.ac.uk)
  • 31.
  • 32. Pfool is started as Pipeline Pilot webservice
  • 33.
  • 34. Mw AlogP Filter monomers in Spotfire Availability in Pfizer stores: Protecting groups:
  • 36. Start the enumeration / Pareto ranking of products Job can be retrieved via: Job has successfully started You can log off when see this
  • 37. Accessing existing designs (Re)send original monomers to Spotfire Send designed products to Spotfire (Re)send filtered monomers to Spotfire TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model
  • 38. Accessing existing jobs Create file for Pfizer in-house tool TargetX Bayesian model TargetY Bayesian model
  • 39. Finish / register design in PGVL hub