Library design and series selection by Pareto ranking

Library design and series selection by Pareto ranking Willem P. van Hoorn & Robert T Smith Pfizer Global Research and Development Sandwich United Kingdom [email_address]

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Content

Pareto ranking, the art of compromise You want to be here (high X, high Y) X Y But these are your compounds Two gold compounds are better than the black compound, it is ‘dominated’ as would all other compounds in the shaded area 5 compounds are special (gold): Going from one to the other, you can improve X or Y but not both. They are best compromises (Pareto front)

Why not applying cut-offs? Y > cut-off X > cut-off Applying cut-offs results in settling 5 times for nearly the same (mediocre) compromise 5 compounds on the Pareto front on average make the same compromise, but sample much more space

Pareto ranking, a summary ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Increasing the speed of Pareto ranking. 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Increasing the speed of Pareto ranking. 2 Worst value of Y Worst value of X Approximate the 2D Pareto front as a circle. Best compounds are furthest away from worst (X,Y) Linear scaling calculation: R 2 =  X 2 +  Y 2 Distance also defined for higher dimensions

How the R2 approximation works in practise ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What R2 approximation missed (blue) What was included instead (red) How the R2 approximation works in practise

Dealing with large virtual libraries ,[object Object],10M in ~42 min 10M in ~59 min ,[object Object],Mw filter One Bayesian model 10M in ~9.5 hours

Dealing with large virtual libraries by random sampling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

[object Object],[object Object],[object Object],[object Object],[object Object],Example of R2 ranking / random product sampling

Random 3.2% subset of VRXN-3-00352 coloured by R2 1116 ranked B (by avg R2) 314 ranked C (by avg R2 ) Best 100 x 100

Full VL of VRXN-3-00352 with top 14 Pareto fronts In red: Pareto ranked compounds (11327) In blue: the rest (Pareto front >= 15) (This takes ~weekend to calculate) 1116 ranked B (by avg R2 of all) 314 ranked C (by avg R2 of all)

Top 100 monomers R2 sampled versus true ranks R2 rank B (full 350k) R2 subset sampled rank B R2 rank C (full 350k) R2 subset sampled rank C In common: 83 Best rank of missing: 52 Worst rank of replacement: 242 In common: 93 Best rank of missing: 83 Worst rank of replacement: 145 100 100 100 100

Compounds found by R2 sampling per Pareto front Pareto front: 1 2 3 4 5 Contains: 171 246 337 445 523 Success rate: 94% 82% 78% 71% 58% Compounds found by sampling and enumerating 100 x 100 monomers Compounds not found by above A typical design contains ~100-200 compounds

Recent libraries designed using Pareto ranking ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What has Pareto done for me? X score Y score Z score Top graphs: Model score distribution full VL, lower graphs: first Pareto front

VRXN-3-00582 results WP001398: difficult chemistry, only 22 compounds were made… X / Z selectivity X IC50 = 770 nM Y IC50 = 9  M Z IC50 = 3.7  M X IC50 = 997 nM Y IC50 = 7.5  M Z IC50 = 11  M Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Y selectivity > 5fold selective (5) Inverse selective (2) Inactive (14) Non-selective (1)

VRXN-3-00352 results WP001524: 77 compounds were made Lorna Mitchell Nunzio Sciammetta Ian Marsh X / Z selectivity X / Y selectivity X over Y = 40 fold X over Z = 5.6 fold X over Y = 7 fold X over Z = 13 fold 8 compounds with the desired profile.

How were these series found from HTS hits? Nightly scheduled download: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Measured activity tracks with Bayesian model score Bayesian score: target X Y Red: IC50 > 2500 nM (inactive) Yellow: IC50 <= 2500 nM (moderate active) Blue: IC50 <= 250 nM (active) Colour by X activity Colour by Y activity Bayesian score: target X

Bayesian score: X Y Outside top 10k (light grey) Top 10k Pareto ranked, but no experimental selectivity (dark grey) Red: < 10fold selective Yellow: < 50fold selective Blue: >= 50fold selective Area with highest predicted selectivity includes multiple greys Predicted vs experimental selectivity X over Y

Area with highest predicted selectivity has highest proportion truly selective compounds. Bayesian score: X Y Predicted vs experimental selectivity X over Y Good (>50) Bad (<10) Moderate (<50)

Nearly identical picture for X over Z X/Z selectivity X/Z selectivity

Ranking of series ,[object Object],[object Object],[object Object],[object Object],X Y Compounds Series Best Series Colour by library ID

Results ,[object Object],[object Object],X Y ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

http://www.sciencemag.org/cgi/content/abstract/324/5923/85 http://www.nytimes.com/2006/07/18/technology/18model.html July 18, 2006 More of science and decision making can automated Science 3 April 2009: Vol. 324. no. 5923, pp. 85 - 89 Pfizer sponsors a PhD position here. Interested? Contact me or Ross King (rdk@aber.ac.uk)

Pareto library design (Pfool) workflow PGVL hub ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Engine: Pipeline Pilot services monomer processing and enumeration

Pfool is started as Pipeline Pilot webservice

Pfool input ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TargetX Bayesian model Send to Spotfire

Mw AlogP Filter monomers in Spotfire Availability in Pfizer stores: Protecting groups:

Product enumeration parameters

Start the enumeration / Pareto ranking of products Job can be retrieved via: Job has successfully started You can log off when see this

Accessing existing designs (Re)send original monomers to Spotfire Send designed products to Spotfire (Re)send filtered monomers to Spotfire TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model TargetX Bayesian model TargetY Bayesian model TargetZ Bayesian model

Accessing existing jobs Create file for Pfizer in-house tool TargetX Bayesian model TargetY Bayesian model

Finish / register design in PGVL hub

Library design and series selection by Pareto ranking

Recomendados

Recomendados

Más contenido relacionado

Similar a Library design and series selection by Pareto ranking

Similar a Library design and series selection by Pareto ranking (20)

Library design and series selection by Pareto ranking