The document discusses using Pareto ranking to efficiently sample large virtual libraries for drug design. It describes how R2 approximation and random product sampling can speed up Pareto ranking for libraries with millions of compounds. The method was used to design selective libraries for targets X, Y and Z. Several selective series were identified that would not have been found otherwise.
Library design and series selection by Pareto ranking
1. Library design and series selection by Pareto ranking Willem P. van Hoorn & Robert T Smith Pfizer Global Research and Development Sandwich United Kingdom [email_address]
2.
3. Pareto ranking, the art of compromise You want to be here (high X, high Y) X Y But these are your compounds Two gold compounds are better than the black compound, it is ‘dominated’ as would all other compounds in the shaded area 5 compounds are special (gold): Going from one to the other, you can improve X or Y but not both. They are best compromises (Pareto front)
4. Why not applying cut-offs? Y > cut-off X > cut-off Applying cut-offs results in settling 5 times for nearly the same (mediocre) compromise 5 compounds on the Pareto front on average make the same compromise, but sample much more space
5.
6.
7. Increasing the speed of Pareto ranking. 2 Worst value of Y Worst value of X Approximate the 2D Pareto front as a circle. Best compounds are furthest away from worst (X,Y) Linear scaling calculation: R 2 = X 2 + Y 2 Distance also defined for higher dimensions
8.
9. What R2 approximation missed (blue) What was included instead (red) How the R2 approximation works in practise
10.
11.
12.
13. Random 3.2% subset of VRXN-3-00352 coloured by R2 1116 ranked B (by avg R2) 314 ranked C (by avg R2 ) Best 100 x 100
14. Full VL of VRXN-3-00352 with top 14 Pareto fronts In red: Pareto ranked compounds (11327) In blue: the rest (Pareto front >= 15) (This takes ~weekend to calculate) 1116 ranked B (by avg R2 of all) 314 ranked C (by avg R2 of all)
15. Top 100 monomers R2 sampled versus true ranks R2 rank B (full 350k) R2 subset sampled rank B R2 rank C (full 350k) R2 subset sampled rank C In common: 83 Best rank of missing: 52 Worst rank of replacement: 242 In common: 93 Best rank of missing: 83 Worst rank of replacement: 145 100 100 100 100
16. Compounds found by R2 sampling per Pareto front Pareto front: 1 2 3 4 5 Contains: 171 246 337 445 523 Success rate: 94% 82% 78% 71% 58% Compounds found by sampling and enumerating 100 x 100 monomers Compounds not found by above A typical design contains ~100-200 compounds
17.
18. What has Pareto done for me? X score Y score Z score Top graphs: Model score distribution full VL, lower graphs: first Pareto front
19. VRXN-3-00582 results WP001398: difficult chemistry, only 22 compounds were made… X / Z selectivity X IC50 = 770 nM Y IC50 = 9 M Z IC50 = 3.7 M X IC50 = 997 nM Y IC50 = 7.5 M Z IC50 = 11 M Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Y selectivity > 5fold selective (5) Inverse selective (2) Inactive (14) Non-selective (1)
20. VRXN-3-00352 results WP001524: 77 compounds were made Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Z selectivity X / Y selectivity X over Y = 40 fold X over Z = 5.6 fold X over Y = 7 fold X over Z = 13 fold 8 compounds with the desired profile.
21.
22. Measured activity tracks with Bayesian model score Bayesian score: target X Y Red: IC50 > 2500 nM (inactive) Yellow: IC50 <= 2500 nM (moderate active) Blue: IC50 <= 250 nM (active) Colour by X activity Colour by Y activity Bayesian score: target X
23. Bayesian score: X Y Outside top 10k (light grey) Top 10k Pareto ranked, but no experimental selectivity (dark grey) Red: < 10fold selective Yellow: < 50fold selective Blue: >= 50fold selective Area with highest predicted selectivity includes multiple greys Predicted vs experimental selectivity X over Y
24. Area with highest predicted selectivity has highest proportion truly selective compounds. Bayesian score: X Y Predicted vs experimental selectivity X over Y Good (>50) Bad (<10) Moderate (<50)
36. Start the enumeration / Pareto ranking of products Job can be retrieved via: Job has successfully started You can log off when see this
37. Accessing existing designs (Re)send original monomers to Spotfire Send designed products to Spotfire (Re)send filtered monomers to Spotfire TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model
38. Accessing existing jobs Create file for Pfizer in-house tool TargetX Bayesian model TargetY Bayesian model