This document discusses how the Discovery Bus manages the QSAR process by applying different modelling approaches and algorithms to generate many model paths from data in an automated way. It handles tasks like selecting descriptors, splitting data, building models using methods like linear regression and neural networks, and adding results to a database. This allows industrial-scale QSAR to be performed by generating over 750,000 models from 10,000 datasets in 3 weeks using cloud computing resources. The goal of the Discovery Bus is to significantly improve drug discovery productivity by performing the work independently without human involvement.
1. QSAR Process requires many choices Which descriptors? Which modelling algorithm? What model testing strategy? Quality of result depends on make correct choices All runs are different Discovery Bus manages this process Apply everything approach QSAR choices
2. The Discovery Bus Manages the many model generation paths Random split 80:20 split Partition training & test data Java CDK descriptors C++ CDL descriptors Calculate descriptors Correlation analysis Genetic algorithms Random selection Select descriptors Linear regression Neural Network Partial Least Squares Classification Trees Build model Add to database
3. Filter Features QSAR Agent Model Build Filter feature request ... responses Model build request ... responses Calculate descriptors request ... responses Calculate Descriptors
4. Filter Features QSAR Agent Model Build Calculate Descriptors Filter feature request ... responses responses Model build request ... responses Calculate descriptors request ... responses responses responses Calculate Descriptors
5. Industrial Scale QSAR Predict likely properties based on similar molecules CHEMBL Database: data on 622,824 compounds, collected from 33,956 publications WOMBAT Database: data on 251,560 structures, for over 1,966 targets WOMBAT-PK Database: data on 1230 compounds, for over 13,000 clinical measurements Project Junior (Newcastle University & Microsoft Research) 10,000 datasets gave 750,000 QSAR models in 3 weeks using 100 Azure Cloud Servers
6. “The Discovery Bus is not a tool for users. It is a system for doing drug design independent of any user” The ambition is a step-change in productivity arising from breaking the link between human effort and drug discovery output.