SlideShare a Scribd company logo
1 of 23
Prediction of Animal Clearance Using Naïve Bayesian Classification and Extended Connectivity Fingerprints Timothy A McIntyre
Introduction ,[object Object],[object Object],[object Object],[object Object],[object Object]
Pharmacokinetics and Clearance ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental ADME and lead optimization ,[object Object],[object Object],[object Object],[object Object],[object Object]
In Silico ADME – Background of our approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Experimental Data ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Bayesian Modeling ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Descriptive Statistics: Experimental CL and CLi
Summary of animal in vivo clearance
Chemical Diversity ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Benzimidazole Quinoline pyrido-pyrimidine
Near Neighbors Rat CL Mouse CL Dog CL Monkey CL
Dog Model Summary of Performance Diagnostics for Methods Used a   a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.70 – 0.79) (0.55 – 0.66) (0.89 – 0.94) (0.67 – 0.80) (0.67 – 0.74) (0.68 – 0.74) 0.75 0.6 0.91 0.74 0.71 0.71 610 Dog CLi (0.62 – 0.75) (0.35 – 0.53) (0.65 – 0.78) (0.47 – 0.65) (0.65 – 0.78) (0.60 – 0.71) 0.68 0.44 0.72 0.56 0.72 0.65 202 Mouse CL (0.71 – 0.76) (0.38 – 0.45) (0.75 – 0.79) (0.54 – 0.61) (0.75 – 0.80) (0.68 – 0.72) 0.74 0.42 0.77 0.58 0.77 0.7 1417 Rat CL (0.78 – 0.85) (0.27 – 0.37) (0.79 – 0.87) (0.76  – 0.85) (0.68 – 0.77) (0.72 – 0.79) 0.81 0.32 0.83 0.8 0.72 0.76 490 Dog Model  ROC AUC FPR TPR  NPV PPV ACC N Predictor
FPR Comparisons
ROC AUC Comparisons
NPV Comparisons
Effect of Optimization on Rat CL
Key Messages ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Backup Slides
Near Neighbors
Rat Model Summary of Performance Diagnostics for Methods Used a   a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.64 – 0.67) (0.57 – 0.60) (0.78 – 0.80) (0.64 – 0.68) (0.57 – 0.59) (0.60 – 0.62) 0.65 0.58 0.79 0.66 0.58 0.61 5947 Rat Cli (0.76 – 0.84) (0.35 – 0.48) (0.80 – 0.88) (0.66 – 0.78) (0.70 – 0.79) (0.70 – 0.77) 0.8 0.42 0.84 0.72 0.74 0.74 409 Mouse CL (0.80 – 0.83) (0.29 – 0.33) (0.77 – 0.81) (0.74  – 0.78) (0.71 – 0.75) (0.73 – 0.75) 0.82 0.31 0.79 0.76 0.73 0.74 3077 Rat Model  ROC AUC FPR TPR  NPV PPV ACC N Predictor
Monkey Model Summary of Performance Diagnostics for Methods Used a   a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.66 – 0.73) (0.24 – 0.37) (0.53 – 0.62) (0.30 – 0.40) (0.81 – 0.89) (0.57 – 0.64) 0.69 0.31 0.58 0.35 0.85 0.60 486 Monkey CLi (0.69 – 0.77) (0.37 – 0.52) (0.67 – 0.74) (0.30 – 0.42) (0.81 – 0.87) (0.64 – 0.71) 0.73 0.44 0.71 0.36 0.84 0.67 569 Dog CL (0.71 – 0.77) (0.29 – 0.41) (0.68 – 0.74) (0.31 – 0.40) (0.87 – 0.92) (0.67 – 0.73) 0.74 0.35 0.71 0.35 0.90 0.7 835 Rat CL (0.75 – 0.87) (0.17 – 0.41) (0.72 – 0.83) (0.31  – 0.52) (0.88 – 0.96) (0.71 – 0.81) 0.81 0.29 0.77 0.42 0.92 0.76 206 Monkey Model  ROC AUC FPR TPR  NPV PPV ACC N Predictor

More Related Content

Viewers also liked

Viewers also liked (6)

Naive Bayes with Conditionally Dependent Data
Naive Bayes with Conditionally Dependent DataNaive Bayes with Conditionally Dependent Data
Naive Bayes with Conditionally Dependent Data
 
Decision tree example problem
Decision tree example problemDecision tree example problem
Decision tree example problem
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 

Similar to Tam June 2009

Ns Plus iFOBT WEO Final May 2011
Ns Plus iFOBT WEO Final May 2011Ns Plus iFOBT WEO Final May 2011
Ns Plus iFOBT WEO Final May 2011bpstat
 
Validation of bevacizumab elisa ich q2 ver3,0 dt14.03
Validation of bevacizumab elisa   ich q2 ver3,0 dt14.03Validation of bevacizumab elisa   ich q2 ver3,0 dt14.03
Validation of bevacizumab elisa ich q2 ver3,0 dt14.03krishgen
 
AAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellAAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellLawrence Hwang
 
AsedaSciences SLAS2017 poster presentation
AsedaSciences SLAS2017 poster presentationAsedaSciences SLAS2017 poster presentation
AsedaSciences SLAS2017 poster presentationAndrew Bieberich
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNAUlises Urzua
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoverySean Ekins
 
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...Loan Pham
 
Using available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoEUsing available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoERebeccaClewell
 
Alternative to animal toxicit testing.pptx
Alternative to animal toxicit testing.pptxAlternative to animal toxicit testing.pptx
Alternative to animal toxicit testing.pptxANANYAPANDEY71
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judesSean Ekins
 
AAPS 2015_M1028_U-PLEX Feasibility Poster_Russell
AAPS 2015_M1028_U-PLEX Feasibility Poster_RussellAAPS 2015_M1028_U-PLEX Feasibility Poster_Russell
AAPS 2015_M1028_U-PLEX Feasibility Poster_RussellLawrence Hwang
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverySean Ekins
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Sean Ekins
 
Laboratory reference range values app17
Laboratory reference range values app17Laboratory reference range values app17
Laboratory reference range values app17Elsa von Licy
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosisdanieltm33
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Dmitry Grapov
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Prof. Wim Van Criekinge
 

Similar to Tam June 2009 (20)

Ns Plus iFOBT WEO Final May 2011
Ns Plus iFOBT WEO Final May 2011Ns Plus iFOBT WEO Final May 2011
Ns Plus iFOBT WEO Final May 2011
 
Validation of bevacizumab elisa ich q2 ver3,0 dt14.03
Validation of bevacizumab elisa   ich q2 ver3,0 dt14.03Validation of bevacizumab elisa   ich q2 ver3,0 dt14.03
Validation of bevacizumab elisa ich q2 ver3,0 dt14.03
 
AAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_RussellAAPS 2015_W3081_Biomarker Screening Poster_Russell
AAPS 2015_W3081_Biomarker Screening Poster_Russell
 
AsedaSciences SLAS2017 poster presentation
AsedaSciences SLAS2017 poster presentationAsedaSciences SLAS2017 poster presentation
AsedaSciences SLAS2017 poster presentation
 
Genomica - Microarreglos de DNA
Genomica - Microarreglos de DNAGenomica - Microarreglos de DNA
Genomica - Microarreglos de DNA
 
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug DiscoveryCollaborative Database and Computational Models for Tuberculosis Drug Discovery
Collaborative Database and Computational Models for Tuberculosis Drug Discovery
 
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
Clinical Trial Simulation to Evaluate the Pharmacokinetics of an Abuse-Deterr...
 
Using available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoEUsing available tools for tiered assessments and rapid MoE
Using available tools for tiered assessments and rapid MoE
 
Alternative to animal toxicit testing.pptx
Alternative to animal toxicit testing.pptxAlternative to animal toxicit testing.pptx
Alternative to animal toxicit testing.pptx
 
Slides for st judes
Slides for st judesSlides for st judes
Slides for st judes
 
Research Seminar
Research SeminarResearch Seminar
Research Seminar
 
AAPS 2015_M1028_U-PLEX Feasibility Poster_Russell
AAPS 2015_M1028_U-PLEX Feasibility Poster_RussellAAPS 2015_M1028_U-PLEX Feasibility Poster_Russell
AAPS 2015_M1028_U-PLEX Feasibility Poster_Russell
 
dual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discoverydual-event machine learning models to accelerate drug discovery
dual-event machine learning models to accelerate drug discovery
 
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...Applying cheminformatics and bioinformatics approaches to neglected tropical ...
Applying cheminformatics and bioinformatics approaches to neglected tropical ...
 
Sam Case Study
Sam Case StudySam Case Study
Sam Case Study
 
Laboratory reference range values app17
Laboratory reference range values app17Laboratory reference range values app17
Laboratory reference range values app17
 
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosisShorter Multi-marker Signatures:  a new tool to facilitate cancer diagnosis
Shorter Multi-marker Signatures: a new tool to facilitate cancer diagnosis
 
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures:  a new tool to facilitate cancer diagnosisShorter Multimarker signatures:  a new tool to facilitate cancer diagnosis
Shorter Multimarker signatures: a new tool to facilitate cancer diagnosis
 
Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014Normalization of Large-Scale Metabolomic Studies 2014
Normalization of Large-Scale Metabolomic Studies 2014
 
Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014Bioinformatics t9-t10-biocheminformatics v2014
Bioinformatics t9-t10-biocheminformatics v2014
 

Tam June 2009

  • 1. Prediction of Animal Clearance Using Naïve Bayesian Classification and Extended Connectivity Fingerprints Timothy A McIntyre
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 9. Summary of animal in vivo clearance
  • 10.
  • 11. Near Neighbors Rat CL Mouse CL Dog CL Monkey CL
  • 12. Dog Model Summary of Performance Diagnostics for Methods Used a a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.70 – 0.79) (0.55 – 0.66) (0.89 – 0.94) (0.67 – 0.80) (0.67 – 0.74) (0.68 – 0.74) 0.75 0.6 0.91 0.74 0.71 0.71 610 Dog CLi (0.62 – 0.75) (0.35 – 0.53) (0.65 – 0.78) (0.47 – 0.65) (0.65 – 0.78) (0.60 – 0.71) 0.68 0.44 0.72 0.56 0.72 0.65 202 Mouse CL (0.71 – 0.76) (0.38 – 0.45) (0.75 – 0.79) (0.54 – 0.61) (0.75 – 0.80) (0.68 – 0.72) 0.74 0.42 0.77 0.58 0.77 0.7 1417 Rat CL (0.78 – 0.85) (0.27 – 0.37) (0.79 – 0.87) (0.76 – 0.85) (0.68 – 0.77) (0.72 – 0.79) 0.81 0.32 0.83 0.8 0.72 0.76 490 Dog Model ROC AUC FPR TPR NPV PPV ACC N Predictor
  • 17.
  • 18.
  • 19.
  • 22. Rat Model Summary of Performance Diagnostics for Methods Used a a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.64 – 0.67) (0.57 – 0.60) (0.78 – 0.80) (0.64 – 0.68) (0.57 – 0.59) (0.60 – 0.62) 0.65 0.58 0.79 0.66 0.58 0.61 5947 Rat Cli (0.76 – 0.84) (0.35 – 0.48) (0.80 – 0.88) (0.66 – 0.78) (0.70 – 0.79) (0.70 – 0.77) 0.8 0.42 0.84 0.72 0.74 0.74 409 Mouse CL (0.80 – 0.83) (0.29 – 0.33) (0.77 – 0.81) (0.74 – 0.78) (0.71 – 0.75) (0.73 – 0.75) 0.82 0.31 0.79 0.76 0.73 0.74 3077 Rat Model ROC AUC FPR TPR NPV PPV ACC N Predictor
  • 23. Monkey Model Summary of Performance Diagnostics for Methods Used a a The upper and lower limits of the 90% confidence intervals for each diagnostic are included in parentheses. (0.66 – 0.73) (0.24 – 0.37) (0.53 – 0.62) (0.30 – 0.40) (0.81 – 0.89) (0.57 – 0.64) 0.69 0.31 0.58 0.35 0.85 0.60 486 Monkey CLi (0.69 – 0.77) (0.37 – 0.52) (0.67 – 0.74) (0.30 – 0.42) (0.81 – 0.87) (0.64 – 0.71) 0.73 0.44 0.71 0.36 0.84 0.67 569 Dog CL (0.71 – 0.77) (0.29 – 0.41) (0.68 – 0.74) (0.31 – 0.40) (0.87 – 0.92) (0.67 – 0.73) 0.74 0.35 0.71 0.35 0.90 0.7 835 Rat CL (0.75 – 0.87) (0.17 – 0.41) (0.72 – 0.83) (0.31 – 0.52) (0.88 – 0.96) (0.71 – 0.81) 0.81 0.29 0.77 0.42 0.92 0.76 206 Monkey Model ROC AUC FPR TPR NPV PPV ACC N Predictor

Editor's Notes

  1. Welcome to my talk about predicting animal CL using Bayesian analysis and extended connectivity fingerprints. Today I’ll give a brief introduction and describe pharmacokinetics. Give some background on why we chose this technique, what our results were and try to convince you why you should care!
  2. The pharmaceutical industry is under going a vigorous evolution The industry faces unprecedented pressure from payers, regulators, ethicists and the general public to reduce cost of drug development, which is now approaching $1 Billion dollars At the same time there is increasing demand for developing new medicines with well established safety and efficacy. In addition, reducing the high levels of attrition during drug discovery and development has long been a financial challenge for the pharmaceutical industry These increased pressures are occurring at the same time the pharmaceutical sector is contracting its resource in a manner to change the way they do business in order to remain profitable. Routine utilization of absorption, distribution, metabolism and excretion screening has been found to significantly reduce attrition through poor pharmacokinetics in humans.
  3. What is Pharmacokinetics (PK)? In a nutshell it is the study of what the body does to a drug. This is done by characterizing the drug concentration-time profile Drug Clearance is a primary determinant of drug PK. The rate of which a drug is eliminated is proportional to the clearance and the drug concentration CL measures the efficiency of irreversible elimination in which a volume of plasmalood from which a drug is completely removed per unit time CL can be from a particular organ or metabolic pathway, or by the whole body. However CL is often a result of metabolism by the liver. High clearance is known to limit systemic exposure and oral bioavailability. Oral bioavailability, for those of you who don’t know, is a measurement of the rate of which a drug reaches the systemic circulation and is then available at the site of action. High clearance then leads to low bioavailability. Low bioavailability then yields less drug at the point of which you need it to do it’s job.
  4. Ok, clearance seems to be important, but what are some of the strategies around optimizing drugs through PK? In vitro screens such as intrinsic CL in liver microsomes, are commonly used to prioritize decisions for progression of compounds into in vivo studies. Animal studies are more resource intensive, more expensive, and lower throughput than in vitro screens. Rodent pharmacokinetics are typically evaluated prior to higher species. These rodent studies use smaller amounts of compound than studies in the higher species. For example, rodent studies generally use in the range of 1 – 30 mg of compound, whereas studies in higher species routinely can exceed 1 g of drug. As eluded above, rodent studies may then guide strategy around more intensive PK studies in higher species, such as the dog or monkey. Appropriate PK is essential for pharmacological or toxicological models where understanding the relationship between concentration and effect or adverse event is critical. Animal PK is used to predict PK in humans usually through some type of allometric or similarly derived mechanism this then sets appropriate starting doses for initial clinical trials
  5. So we can see how drugs can be optimized through PK, but are there any better aka cheaper methods? Well now I would like to talk a little about In Silico ADME and the background of our approach. Theoretically In silico models have the potential to provide unlimited information. In Silico models are often based on physchem properties From our knowledge there is l imited precedence for modeling animal CL directly using detailed structural information such as fingerprints. Which is one driver for our approach: Bayesian classification and extended connectivity fingerprints… It made a lot of sense to us to use something that was tangible, and something we could relate to such as structure… Since various in vitro and in vivo ADME assays are commonly employed during lead optimization to progress a compound, we compared Bayesian model performance to methods such as intrinsic clearance and rodent PK.
  6. Now with enough background in place, lets talk about how we got this data… Data for compounds having mouse, rat, dog and monkey CL and CLi was acquired from the GSK corporate database. All PK studies were performed in accordance with local regulatory authorities as well as the GSK Animal Care and Use Committee. The GSK corporate database for the In-Vivo experiments contains ~20K unique compounds, with the bulk of these compounds coming from rat data In order to make comparisons across species, Animal CL was normalized to liver blood flow, which ranged from 30 mL/min/kg in the dog to 90 mL/min/kg in the mouse. A threshold of 70% liver blood flow was used on these normalized values. Less than 70% liver blood flow are classified as “Low CL” being positive and greater than 70% are classified as “High CL” being negative. 70% liver blood flow is a physiologically relevant threshold, as it represents the point of which CL really starts to impact oral bioavailability. For microsomal intrinsic clearance we used a threshold of 5 mL/min/g tissue. A value of 5 for this assay is the mid-point of the dynamic range which normally ranges from 0.5 to 50 mL/min/g tissue.
  7. Moving on to some description on the models we built… Binary classification models were built using naïve Bayesian analysis in Pipeline Pilot using an extended connectivity fingerprints (six bond diameter). Compounds were randomly assigned to training or test sets in a 5:1 ratio; this process typically took less than a minute on a standard dell desktop. We studied a number of physchem descriptors such as aLopP in multiple combination either separate from or in addition to the extended fingerprint. In the end however they failed to improve the model performance, or were in some instances were substantially worse Pipeline Pilot and the modeling process has been described in detail in previous a couple of really good publications ( Xia et al. 2004, Rogers et al. 2005) Model predictions for each test set were compared to experimental data for each test set. Diagnostics such Accuracy, Positive Predictive Value, Negative Predictive Value, True Positive Rate, False Positive Rat and Receiver Operating Characteristic AUC were used to cross compare the various methods. 90% Confidence intervals as well as p-values were calculated using standard equations.
  8. In order to have a high level of confidence in the Bayesian models, as well as other comparisons we ensured that sufficient biological and chemical diversity existed. In this slide we’ll take a look at the biological diversity in the in vivo and in vitro datasets. There was In Vivo CL data available for at least 1000 compounds in the monkey to greater than 17,500 compounds in the rat The Median normalized CL values ranged from 38% in the monkey to 68% lbf in the rat. Overall, 90% of CL data was less then twice liver blood flow For all datasets there was a good range of biological data available which can be seen in the three fold difference in the 1 st and 3 rd Quartiles Intrinsic clearance data was available for at least 3500 compounds in the monkey to 42,000 compounds in the rat Median CLi values ranged from 2.0 mL min -1 g tissue -1 in the dog to 7.2 mL min -1 g tissue -1 in the monkey
  9. This slide is a graphical representation of some of the content on the previous slide in which I wanted to highlight one point. Notice that the CL distribution for the rat, dog and mouse are highly similar. however monkey CL is markedly lower than the other species, this is due to something we call optimization bias, and we’ll touch more on this in the upcoming slides.
  10. Similarly to the biological data, we ensured chemical diversity by detailed analysis… There were 20,000 unique compounds in the in-vivo datasets representing hundreds of lead optimization programs. Chemical diversity was ensured by examining self-similarity (near neighbor) tests, ring analysis, and the distribution of Murcko assemblies These tests demonstrated substantial structural diversity, for example looking at the most common ring in each species, Benzimidazole appeared in 11% and 8% compounds in the mouse and dog CL sets respectively, while quiloline was the most common ring in the rat CL set was, 7% of compounds. The most common ring in monkey CL compounds was pyrido-pyrimidine, 8% The top 20 rings accounted for 45-51% of the compounds depending on the species Median frequency of any particular Murcko assembly was less then 0.1% this corresponds to ~18 compounds with Rat CL sharing the same assembly Since Murcko assemblies represent interconnected rings, their distribution is more representative of chemical series within the database
  11. Near neighbor analysis was conduction using a fixed length 1024 bit daylight style fingerprint and by using a tanimoto distance of less than 0.15 to be classified as a near neighbor. At this threshold, at the most there was a change of 2 atoms in pair wise comparisons. Very good distribution was also found in the near neighbor analysis Focusing on the In Vivo datasets 45 – 62% of compounds had less than 2 near neighbors 11 – 25% of compounds had greater than ten near neighbors Mouse and monkey CL data sets have a greater proportion of no near neighbors, reflecting that these species are less likely to be used as the primary species for a CL screen
  12. Now stepping into the comparisons of the various methods, I wanted to show you an example of the types of data that was collected and compared. There is a lot of data on this slide, on datasets similar to this one, we examined comparisons looking for non-overlapping 90% CI as well as conducted p-value analysis Performance diagnostics for the Bayesian model predicting dog CL were compared to experimental rat CL, mouse CL and dog CLi as potential predictors of experimental dog CL. ACC, PPV, NPV, TPR and ROC AUC were all high for the dog model ROC AUC for the dog CL model was greater than both rat CL and mouse CL NPV for the Bayesian dog model was substantially higher than both rodent species. FPR was substantially lower for the Bayesian dog model compared with dog CLi Higher accuracy of the Bayesian dog model is also seen, although 90% CI overlaps with some of the other methods.
  13. In the next few slides I will talk about statistically significant observations between the Bayesian models are other methods. In each of the charts observations will be plotted displaying the values, confidence intervals as well as any pair wise p values. Looking at some key comparisons for false predictive rate, the rate of which low CL molecules are falsely predicted to be low when they are actually high, we see that the Bayesian dog models has a lower rate than the Rat or Dog Clint And this also holds true for the Bayesian rat and mouse models when compared CLi in their respective species In addition the Bayesian rat model has a lower False positive rate then the mouse The impact of FPR would be the progression of compounds into more intensive studies in which PK was not as favorable as the screen indicated, thus perhaps increasing resource expenditure
  14. This slide displays the values for ROC AUC. ROC AUC is a general measure of tests diagnostics capability by examining the trade off’s between hit rates and false alarm rates. As shown here, the Bayesian dog model has better overall diagnostic capability versus rat and mouse CL in predicting experimental dog CL The Bayesian rat model is a better performer than rat intrinsic clearance in predicting rat CL While the same holds true for the mouse and monkey Bayesian models compared to CLi in their respective species. Dog CLi versus Bayesian Dog is not listed, however the 90% confidence intervals of are barely overlapping. What’s particularly interesting about this comparison is that CLi is one of the most widely used ADME in vitro screens for compound prioritization. Here we see, at least through ROC AUC that the Bayesian models seem to have a better diagnostic capability.
  15. Bayesian dog model was substantially better predictor of high CL dog compounds for than either the Rat or the Mouse This is important, because compounds with higher rodent CL are likely to be de-prioritized during lead optimization Because of this there is high potential for promising compounds to be overlooked if one relies to heavily on the experimental determination of rodent CL The higher NPV suggests the Bayesian dog model is superior to rodent CL in this respect the same holds true for the Bayesian rat model versus Rat CLi, which also indicating the Bayesian method is better at avoiding erroneous de-selection early in the discovery process
  16. This slide demonstrates, the natural and desirable consequence of optimization bias. This is something that occurs as compounds are progressed through lead optimization. Compounds with favorable pharmacokinetics in lower species are more likely to have good pharmacokinetics in higher species. Looking at a plot of rat CL values grouped by percentiles we can clearly see that those compounds in which moved on into higher species such as dog or monkey clearly had a lower overall distribution of rat CL. Since, generally compounds with lower rodent CL have lower CL in higher species, this optimization bias also impacts on the performance of the Bayesian monkey models. This was noticed in particular to the relatively low NPV, as the Bayesian model has fewer high CL compounds on which to differentiate between.
  17. We consider the Bayesian model performance to be exceptional. Relatively good ROC AUC, ACC, PPV, NPV and TPR was found for the rat and dog models ranging from 0.72 to 0.82, This was also found for the monkey with the exception of NPV. In predicting dog CL, the Bayesian model was better than experimental rat or mouse CL. In predicting rat CL, the Bayesian model performed just as well as mouse CL. Bayesian models outperformed mouse, rat and monkey CLi for predicting mouse, rat and monkey CL, respectively. Bayesian models have lower negative predictive value (compounds with high experimental CLi have low CL). Lead optimization bias can affect modeling success (monkey) as there was likely insufficient fingerprint information to predict this attribute optimally Recently published monkey and human display this same bias. Monkey CL data from 124 compounds had a median CL of about 26% liver blood flow. A large compilation of human CL for over 600+ compounds had a median CL of about 20% liver blood flow. For these published datasets as well as our monkey dataset structural features associated with high CL may be limited, therefore the classification approach we describe may not be ideal for these species. This lack of CL diversity in monkey or human datasets would also address the question “Why not just predict human CL?”
  18. Near neighbor analysis was conduction using a fixed length 1024 bit daylight style fingerprint and by using a tanimoto distance of less than 0.15 to be classified as a near neighbor. At this threshold, at the most there was a change of 2 atoms in pair wise comparisons. Very good distribution was also found in the near neighbor analysis Focusing on the In Vivo datasets 45 – 62% of compounds had less than 2 near neighbors 27 – 29% of compounds had three to nine near neighbors 11 – 25% of compounds had greater than ten near neighbors Mouse and monkey CL data sets have a greater proportion of no near neighbors, reflecting that these species are less likely to be used as the primary species for a CL screen
  19. Diagnostics of the Bayesian rat model were compared to experimental mouse CL and rat CLi as determinants of experimental rat CL ROC AUC and ACC, PPV, NPV and TPR for the Bayesian rat model was high, (range = 0.73 – 0.82) Diagnostic performance was generally similar between the Bayesian rat model and mouse CL, however the FPR for mouse CL was notably higher for mouse CL (0.42 versus 0.31) Bayesian rat model outperformed rat CLi for all performance diagnostics with the exception of TPR for which the predictive performance was similar NPV was substantially higher for the Bayesian rat model compared to rat CLi (0.76 versus 0.66) FPR was substantially for rat CLi (0.58) compared to the Bayesian model (0.31)
  20. Diagnostics for the Bayesian monkey model were compared to experimental rat CL, dog CL and monkey CLi Mouse data was excluded from this analysis due to the relatively small amount of data (n = 100) with monkey CL and wide 90% confidence intervals ACC for the Bayesian monkey model was higher than dog CL (0.76 versus 0.67) as well as PPV (0.92 versus 0.84, p = 0.007 ) Monkey model outperformed monkey CLi for ROC AUC, ACC, PPV and TPR with non-overlapping confidence intervals and p-values less than 0.014. Notice in these results that each of the in vivo models (Rat and Dog) are noticeably better predictors of low CL in the monkey than high CL. The relatively low monkey CL in this database is a natural sign of selection bias occurs due to the progression through lead optimization. Compounds with favorable pharmacokinetics in lower species are more likely to have good pharmacokinetics in higher species. This also impacts on the performance of the Bayesian monkey models relatively low NPV, as the Bayesian model has fewer high CL compounds.