SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
• Features are acid, base, hydrogen
bond donor, acceptor, hydrophobe,
aromatic attachment, aliphatic
attachment and halogen. Definitions
are highly engineered.†
• Feature 1 – topological distance -
Feature 2
• Engineered for chemical relevance –
features can be superimposed or
directly linked, e.g. enables a group
to be both a hydrogen bond
acceptor and a base
• A bit identifies a pharmacophore pair
e.g. : Aromatic - 3 bonds - Base
• Used as unfolded 280 bit fingerprints
• Regression Forest as ML method
• Build models with 10 fold CV – report
CV-Pearson’s R2 and CV RMSE
• Build RF error model to generate
predicted error for each compound
using the same descriptors
†Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472.
†Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including amidines, guanidine’s - MedChemica
definitions.
Regression forest models
Strategy Number of
compounds
generated
Number of
matches to D2
known set
Maximum
pIC50
(actual)
Maximum pIC50
(predicted[error])
Hit-to-Lead 682 10 7.8 5.5[0.21]
Dopamine class 469 8 7.9 5.5[0.23]
Solubility 10148 10 7.8 5.5[0.21]
Metabolism 12729 19 7.9 5.5[0.21]
Permutative
MMPA
(env = 4)
5 3 7.9 6.1[?]
Accelerating lead optimisation with active learning by exploiting MMPA based
ADMET knowledge with regression forest potency models
A. G. Dossetter•, E. Griffen•, A. Leach•+, P. de Sousa•.
•Medchemica Ltd, Macclesfield, UK, + Pharmacy and Biomolecular Sciences, Liverpool John Moores University,
Problem
How can we reduce the number of compounds made in going from a small set of confirmed hits to
compounds we can test in vivo? For example: can we go from 30 hits to potent in vivo available leads in 10
rounds of synthesizing 30 compounds?
Learning
Combining focused generative approaches with
explainable QSAR models is shows initial promise.
The pinch point is the second set of compounds.
MedChemica
contact@medchemica.com
Approach Case Study
Dopamine D2 dataset
• Well studied target, ligand based design,
• >5200 measured compounds known
• Simulate hit optimization process
• Use known compounds as validation
The Startpoints
30 compounds: 5 <= pIC50 <=6 , -1 < AlogP < 3.5, selected by LLE sort
Generate virtual compounds from MedChemica Knowledge database
• Hit-to-Lead transformations – the most used medicinal chemistry
• ADMET transformations for metabolism and solubility
• Target class transformations learning from target analogues
Permutative MMPA
• generate compounds from data already gained
Regression forest models
• Accurate pharmacophore features with topological distance
• Unfolded fingerprints connect feature importance to pharmacophores
• Error models give accuracy of prediction for each compound
Active Learning
• Explore from predicted high potency, high error
• Exploit from predicted high potency, low error
• Take all compounds in a data set
• Find all matched pairs extract DpIC50
and the transforms between them
• Aggregate transformations with
median DpIC50 and count of pairs
• Apply all transformations back to the
initial data set (at what environment
level?)
• Predicted pIC50 = substrate pIC50 +
median DpIC50
• Remove existing compounds
• Prioritise new compounds by pIC50
estimate
Permutative MMPA
M1
M2
M3
M4
t1
M5
t1
t1
M*
• M1 à M2 transform t1
• M3 à M4 transform t1
• M5 matches t1 and generates
M*
• Predict pIC50:
pIC50(M5) + median DpIC50(t1)
MedChemica
Transformation
Database
Generator
Substrate
molecules
Virtual
molecules
Generate molecules from Knowledge Database
• Hit – to - Lead transformations:
689 transformations with >=250 example pairs
• Dopamine receptor transformations(not D2!)
1027 transformations
• Solubility
6320 transformations
• Metabolism
12719 transformations
Generating new structures is not an issue…
Conclusions
• Good starting points are key(!)
• There is no free lunch – good models need data
• Make best use of the data you already have – focused permutative MMPA finds SAR you may have missed by eye
• Target class based enumeration is most efficient, but still need a better method for round 2 synthesis
• The first set of compounds after the hits are critical if you want to move fast…
Experiment: Fully automated active learning
• Build RF model CV-R2 -0.26, small data set, is it useful?
• Enumerate from all compounds:
• what’s the best enumeration strategy?
• how to pick the (few)compounds to make from the enumerated set?
?
90% of predictions within 0.5 log of measured
• Enumeration generates high potency
compounds, but but early models are too
coarse to correctly prioritize the best small
set for synthesis either by high error or high
potency
7.9!
• Permutative MMPA with tight definition of MMPA environment generates an excellent first
set of follow up compounds learning from the SAR within the hits
• The second batch of compounds is more of a challenge….
Most potent compound(measured) from HtL
enumeration
Active Learning
Hits
Build model with
error estimates
Enumerate
Select for
Explore and
Exploit
Synthesise & Test
Compounds
with data
Compounds
meet
criteria?
Yes
No
Explore: prioritize high error
Exploit : prioritize high potency & low error
Ratio of explore to exploit varies with stage
Select enumeration strategy by stage:
Hit-to lead, target class, solubility, metabolism
For in silico simulation match to
known and measured compounds

Más contenido relacionado

La actualidad más candente

Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017Ed Griffen
 
Molecular docking
Molecular dockingMolecular docking
Molecular dockingRahul B S
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Al Dossetter
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligenceAl Dossetter
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug designThanh Truong
 
molecular docking
molecular dockingmolecular docking
molecular dockingKOUSHIK DEB
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesGreg Landrum
 
SCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaSCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaEd Griffen
 
Molecular docking and_virtual_screening
Molecular docking and_virtual_screeningMolecular docking and_virtual_screening
Molecular docking and_virtual_screeningFlorent Barbault
 
Lecture 4 ligand based drug design
Lecture 4 ligand based drug designLecture 4 ligand based drug design
Lecture 4 ligand based drug designRAJAN ROLTA
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular DockingSatarupa Deb
 
Docking Score Functions
Docking Score FunctionsDocking Score Functions
Docking Score FunctionsSAKEEL AHMED
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextDmitry Grapov
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modelingpkchoudhury
 

La actualidad más candente (19)

Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
Learning Medicinal Chemistry ADMET rules UKQSAR Sept 2017
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...Accelerating multiple medicinal chemistry projects using Artificial Intellige...
Accelerating multiple medicinal chemistry projects using Artificial Intellige...
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Practical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial IntelligencePractical Drug Discovery using Explainable Artificial Intelligence
Practical Drug Discovery using Explainable Artificial Intelligence
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 
Structure based computer aided drug design
Structure based computer aided drug designStructure based computer aided drug design
Structure based computer aided drug design
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
molecular docking
molecular dockingmolecular docking
molecular docking
 
Open-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databasesOpen-source tools for querying and organizing large reaction databases
Open-source tools for querying and organizing large reaction databases
 
Machine learning in computational docking
Machine learning in computational dockingMachine learning in computational docking
Machine learning in computational docking
 
SCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemicaSCI What can Big Data do for Chemistry 2017 MedChemica
SCI What can Big Data do for Chemistry 2017 MedChemica
 
Molecular docking and_virtual_screening
Molecular docking and_virtual_screeningMolecular docking and_virtual_screening
Molecular docking and_virtual_screening
 
Lecture 4 ligand based drug design
Lecture 4 ligand based drug designLecture 4 ligand based drug design
Lecture 4 ligand based drug design
 
Basics Of Molecular Docking
Basics Of Molecular DockingBasics Of Molecular Docking
Basics Of Molecular Docking
 
Docking Score Functions
Docking Score FunctionsDocking Score Functions
Docking Score Functions
 
Connecting Metabolomic Data with Context
Connecting Metabolomic Data with ContextConnecting Metabolomic Data with Context
Connecting Metabolomic Data with Context
 
Molecular Docking
 Molecular Docking Molecular Docking
Molecular Docking
 
Computer Aided Molecular Modeling
Computer Aided Molecular ModelingComputer Aided Molecular Modeling
Computer Aided Molecular Modeling
 

Similar a Accelerating lead optimisation with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models

DENOVO DRUG DESIGN AS PER PCI SYLLABUS
DENOVO DRUG DESIGN AS PER PCI SYLLABUSDENOVO DRUG DESIGN AS PER PCI SYLLABUS
DENOVO DRUG DESIGN AS PER PCI SYLLABUSShikha Popali
 
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMDENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMShikha Popali
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical DatasetsAbhik Seal
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...Kamel Mansouri
 
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSPREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSBilal Nizami
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarAnn-Marie Roche
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...MLAI2
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxNoorelhuda2
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Thomas Bagley
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...Ed Griffen
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...Kamel Mansouri
 
Molecular modelling and dcoking.pptx
Molecular modelling and dcoking.pptxMolecular modelling and dcoking.pptx
Molecular modelling and dcoking.pptx12nikitaborade1
 
Enhanced bioseparations peptide mapping and m abs
Enhanced bioseparations peptide mapping and m absEnhanced bioseparations peptide mapping and m abs
Enhanced bioseparations peptide mapping and m absOskari Aro
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...Kamel Mansouri
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...ChemAxon
 
Data analysis
Data analysisData analysis
Data analysisamlbinder
 

Similar a Accelerating lead optimisation with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models (20)

Denovo Drug Design
Denovo Drug DesignDenovo Drug Design
Denovo Drug Design
 
DENOVO DRUG DESIGN AS PER PCI SYLLABUS
DENOVO DRUG DESIGN AS PER PCI SYLLABUSDENOVO DRUG DESIGN AS PER PCI SYLLABUS
DENOVO DRUG DESIGN AS PER PCI SYLLABUS
 
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARMDENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
DENOVO DRUG DESIGN AS PER PCI SYLLABUS M.PHARM
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Virtual sreening
Virtual sreeningVirtual sreening
Virtual sreening
 
Modeling Chemical Datasets
Modeling Chemical DatasetsModeling Chemical Datasets
Modeling Chemical Datasets
 
The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...The influence of data curation on QSAR Modeling – Presented at American Chemi...
The influence of data curation on QSAR Modeling – Presented at American Chemi...
 
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODSPREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
PREDICTION OF ANTIMICROBIAL PEPTIDES USING MACHINE LEARNING METHODS
 
How predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinarHow predictive models help Medicinal Chemists design better drugs_webinar
How predictive models help Medicinal Chemists design better drugs_webinar
 
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Genera...
 
cadd-191129134050 (1).pptx
cadd-191129134050 (1).pptxcadd-191129134050 (1).pptx
cadd-191129134050 (1).pptx
 
Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015Bagley_HNRS_CRM_talk_2015
Bagley_HNRS_CRM_talk_2015
 
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
MedChemica Large scale analysis and sharing of Medicinal chemistry Knowledge ...
 
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
CERAPP - Collaborative Estrogen Receptor Activity Prediction Project. Computa...
 
Molecular modelling and dcoking.pptx
Molecular modelling and dcoking.pptxMolecular modelling and dcoking.pptx
Molecular modelling and dcoking.pptx
 
docking
docking docking
docking
 
Enhanced bioseparations peptide mapping and m abs
Enhanced bioseparations peptide mapping and m absEnhanced bioseparations peptide mapping and m abs
Enhanced bioseparations peptide mapping and m abs
 
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
The importance of data curation on QSAR Modeling: PHYSPROP open data as a cas...
 
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
Cheminfo Stories APAC 2020 - Chemical Descriptors & Standardizers for Machine...
 
Data analysis
Data analysisData analysis
Data analysis
 

Último

Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxrohankumarsinghrore1
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformationAreesha Ahmad
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .Poonam Aher Patil
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingadibshanto115
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptxSilpa
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curveAreesha Ahmad
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 

Último (20)

Introduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptxIntroduction of DNA analysis in Forensic's .pptx
Introduction of DNA analysis in Forensic's .pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Conjugation, transduction and transformation
Conjugation, transduction and transformationConjugation, transduction and transformation
Conjugation, transduction and transformation
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
An introduction on sequence tagged site mapping
An introduction on sequence tagged site mappingAn introduction on sequence tagged site mapping
An introduction on sequence tagged site mapping
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Stages in the normal growth curve
Stages in the normal growth curveStages in the normal growth curve
Stages in the normal growth curve
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 

Accelerating lead optimisation with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models

  • 1. • Features are acid, base, hydrogen bond donor, acceptor, hydrophobe, aromatic attachment, aliphatic attachment and halogen. Definitions are highly engineered.† • Feature 1 – topological distance - Feature 2 • Engineered for chemical relevance – features can be superimposed or directly linked, e.g. enables a group to be both a hydrogen bond acceptor and a base • A bit identifies a pharmacophore pair e.g. : Aromatic - 3 bonds - Base • Used as unfolded 280 bit fingerprints • Regression Forest as ML method • Build models with 10 fold CV – report CV-Pearson’s R2 and CV RMSE • Build RF error model to generate predicted error for each compound using the same descriptors †Taylor, R.; Cole, J. C.; Cosgrove, D. A.; Gardiner, E. J.; Gillet, V. J.; Korb, O. J Comput Aided Mol Des 2012, 26 (4), 451–472. †Acid & Base definitions are SMARTS including C, N, heteroaromatic acids, bases excluding weak aniline bases, including amidines, guanidine’s - MedChemica definitions. Regression forest models Strategy Number of compounds generated Number of matches to D2 known set Maximum pIC50 (actual) Maximum pIC50 (predicted[error]) Hit-to-Lead 682 10 7.8 5.5[0.21] Dopamine class 469 8 7.9 5.5[0.23] Solubility 10148 10 7.8 5.5[0.21] Metabolism 12729 19 7.9 5.5[0.21] Permutative MMPA (env = 4) 5 3 7.9 6.1[?] Accelerating lead optimisation with active learning by exploiting MMPA based ADMET knowledge with regression forest potency models A. G. Dossetter•, E. Griffen•, A. Leach•+, P. de Sousa•. •Medchemica Ltd, Macclesfield, UK, + Pharmacy and Biomolecular Sciences, Liverpool John Moores University, Problem How can we reduce the number of compounds made in going from a small set of confirmed hits to compounds we can test in vivo? For example: can we go from 30 hits to potent in vivo available leads in 10 rounds of synthesizing 30 compounds? Learning Combining focused generative approaches with explainable QSAR models is shows initial promise. The pinch point is the second set of compounds. MedChemica contact@medchemica.com Approach Case Study Dopamine D2 dataset • Well studied target, ligand based design, • >5200 measured compounds known • Simulate hit optimization process • Use known compounds as validation The Startpoints 30 compounds: 5 <= pIC50 <=6 , -1 < AlogP < 3.5, selected by LLE sort Generate virtual compounds from MedChemica Knowledge database • Hit-to-Lead transformations – the most used medicinal chemistry • ADMET transformations for metabolism and solubility • Target class transformations learning from target analogues Permutative MMPA • generate compounds from data already gained Regression forest models • Accurate pharmacophore features with topological distance • Unfolded fingerprints connect feature importance to pharmacophores • Error models give accuracy of prediction for each compound Active Learning • Explore from predicted high potency, high error • Exploit from predicted high potency, low error • Take all compounds in a data set • Find all matched pairs extract DpIC50 and the transforms between them • Aggregate transformations with median DpIC50 and count of pairs • Apply all transformations back to the initial data set (at what environment level?) • Predicted pIC50 = substrate pIC50 + median DpIC50 • Remove existing compounds • Prioritise new compounds by pIC50 estimate Permutative MMPA M1 M2 M3 M4 t1 M5 t1 t1 M* • M1 à M2 transform t1 • M3 à M4 transform t1 • M5 matches t1 and generates M* • Predict pIC50: pIC50(M5) + median DpIC50(t1) MedChemica Transformation Database Generator Substrate molecules Virtual molecules Generate molecules from Knowledge Database • Hit – to - Lead transformations: 689 transformations with >=250 example pairs • Dopamine receptor transformations(not D2!) 1027 transformations • Solubility 6320 transformations • Metabolism 12719 transformations Generating new structures is not an issue… Conclusions • Good starting points are key(!) • There is no free lunch – good models need data • Make best use of the data you already have – focused permutative MMPA finds SAR you may have missed by eye • Target class based enumeration is most efficient, but still need a better method for round 2 synthesis • The first set of compounds after the hits are critical if you want to move fast… Experiment: Fully automated active learning • Build RF model CV-R2 -0.26, small data set, is it useful? • Enumerate from all compounds: • what’s the best enumeration strategy? • how to pick the (few)compounds to make from the enumerated set? ? 90% of predictions within 0.5 log of measured • Enumeration generates high potency compounds, but but early models are too coarse to correctly prioritize the best small set for synthesis either by high error or high potency 7.9! • Permutative MMPA with tight definition of MMPA environment generates an excellent first set of follow up compounds learning from the SAR within the hits • The second batch of compounds is more of a challenge…. Most potent compound(measured) from HtL enumeration Active Learning Hits Build model with error estimates Enumerate Select for Explore and Exploit Synthesise & Test Compounds with data Compounds meet criteria? Yes No Explore: prioritize high error Exploit : prioritize high potency & low error Ratio of explore to exploit varies with stage Select enumeration strategy by stage: Hit-to lead, target class, solubility, metabolism For in silico simulation match to known and measured compounds