Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Prediction of novel targets using disease association data from Open Targets

Ad

Prediction of novel targets using
disease association data from
Open Targets
Enrico Ferrero, PhD, Associate GSK Fellow
Sci...

Ad

Data + AI = drugs?
BBC News, 2017 Nature Biotechnology, 2017

Ad

The pharma AI space is getting crowded
Partner
Partner

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Eche un vistazo a continuación

1 de 16 Anuncio
1 de 16 Anuncio

Más Contenido Relacionado

Presentaciones para usted (19)

Similares a Prediction of novel targets using disease association data from Open Targets (20)

Prediction of novel targets using disease association data from Open Targets

  1. 1. Prediction of novel targets using disease association data from Open Targets Enrico Ferrero, PhD, Associate GSK Fellow Scientific Leader, Computational Biology, Target Sciences GSK @enricoferrero
  2. 2. Data + AI = drugs? BBC News, 2017 Nature Biotechnology, 2017
  3. 3. The pharma AI space is getting crowded Partner Partner
  4. 4. Developing a new drug: 15+ years, $2B+
  5. 5. So, what’s wrong? Harrison, Nat Rev Drug Discov, 2016 Cook et al., Nat Rev Drug Discov, 2014
  6. 6. Rethink the drug discovery pipeline Manhattan Institute, 2012 Late phase failures cost (a lot) more Spend more time and resources in target discovery Reduce attrition in later phases
  7. 7. But how do we find good targets? Nelson et al., Nat Genet, 2015
  8. 8. Open Targets Koscielny et al., 2016
  9. 9. Could it be as easy as spotting spam emails? ▪ Is it possible to predict novel therapeutic targets using available gene – disease association data? ▪ Is Open Targets just a catalogue of gene – disease associations or can we learn from it what makes a good target?
  10. 10. A positive – unlabelled (PU) semi- supervised learning approach ▪ Obtain all gene – disease associations and supporting evidence from Open Targets platform. For all genes, create numeric features by taking the mean score across all diseases: ▪ Genetic associations (germline) ▪ Somatic mutations ▪ Significant gene expression changes ▪ Disease-relevant phenotype in animal model ▪ Pathway-level evidence ▪ Gather positive labels from Pharmaprojects: only consider targets with drugs currently on the market, in clinical trials or preclinical studies. A semi-supervised framework with only positive labels is used: targets according to PharmaProjects constitute the positive class (P), while the rest of the proteome is used as the unlabelled class (U), containing both negatives and yet-to-be-discovered positive. ▪ All positive cases (1421) and an equal number of randomly selected unlabelled cases (2842 in total) are set apart for training (80%) and testing (20%). The remainder is kept as a prediction set where predictions from the final model will be made.
  11. 11. Finding structure and most important features t-SNE dimensionality reduction reveals structured observations Most important features according to chi-squared test and information gain
  12. 12. Nested cross-validation and bagging for tuning and model selection Bischl et al., 2012 Wikipedia Four classifiers are independently tuned, trained and tested on the training set using a nested cross-validation strategy (4 inner rounds for parameter tuning and 4 outer rounds to assess performance): ▪ Random forest ▪ Feed-forward neural network with single hidden layer ▪ Support vector machine with radial kernel ▪ Gradient boosting machine with AdaBoost exponential loss function In PU learning, U contains both positive and negative cases, which results in classifier instability. Bagging (bootstrap aggregating) can improve the performance of instable classifiers by randomly resampling P and U with replacement (bootstrap) and then aggregating the results by majority voting: ▪ Bagging with 100 iterations was applied to the neural network, the support vector machine and the gradient boosting machine. ▪ Random forests are already a special case of bagging.
  13. 13. Assessing performance and investigating results Neural network classifier achieves 71% accuracy (0.76 AUC) on test set More advanced targets have higher disease association evidence
  14. 14. Validation of predictions with literature mining Significant overlap between neural network predictions and text mining results (p = 5.05e-172)
  15. 15. Automating drug target discovery with machine learning ▪ The gene – disease association data from Open Targets contains enough information to predict whether a protein can make a therapeutic target or not with decent accuracy. ▪ According to our model, the most informative evidence types are animal models showing disease-relevant phenotypes, dysregulated gene expression in disease tissue and genetic associations between gene and disease. ▪ The ability to predict late stage targets with greater accuracy confirms that clear linkage between target and disease is essential to maximise chances of success in the clinic. ▪ Limitations: ▪ Lack of prediction on indication; ▪ No tractability considerations.
  16. 16. Thank you! ▪ Philippe Sanseau ▪ Ian Dunham ▪ Gautier Koscielny ▪ Giovanni Dall’Olio ▪ Pankaj Agarwal ▪ Mark Hurle ▪ Steven Barrett ▪ Nicola Richmond ▪ Jin Yao

×