SlideShare una empresa de Scribd logo
1 de 11
Descargar para leer sin conexión
Data analysis in QSAR

        Noel O’Boyle

  Dave Palmer, John Mitchell
Quantitative Structure-Activity
             Relationship (QSAR)
Also QSPR (Property)



                       Perceive
                                  Physical
          Structure    Predict
                                  property
                       Propose
Quantitative Structure-Activity
              Relationship (QSAR)
Also QSPR (Property)



                             Perceive
                                        Physical
          Structure          Predict
                                        property
                             Propose

    Descriptors

    Molecular weight
    No.of H-bonding groups
    Surface area

    MOE: 180 descriptors
Quantitative Structure-Activity
              Relationship (QSAR)
                                           Need to use an estimate of the error
 Optimise model on training set (2/3)      of prediction (CV or bootstrap, but not
     Test model on test set (1/3)          resubstitution)

                  or

Build model on training set (2/3 of 2/3)
   Optimise on test set (1/3 of 2/3)       Using the error of prediction
  Test model on validation set (1/3)
Quantitative Structure-Activity
                   Relationship (QSAR)
                                                         Need to use an estimate of the error
  Optimise model on training set (2/3)                   of prediction (CV or bootstrap, but not
      Test model on test set (1/3)                       resubstitution)

                         or

Build model on training set (2/3 of 2/3)
   Optimise on test set (1/3 of 2/3)                     Using the error of prediction
  Test model on validation set (1/3)


Testing must be carried out on a hold-out set from the same distribution
as the the set of molecules used for optimisation
=> don't optimise model on diverse set
Assessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579.
Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
Feature selection




                  Number of descriptors →

Features = descriptors                2n different subsets
Parameter optimisation

– Models have parameters which should be tuned
   • Support vector machines
        – gamma, cost, epsilon
QSAR: Feature selection and parameter
             optimisation
• Aim: To find a robust method of simultaneously
  optimising the parameters and performing feature
  selection
• Prediction of solubility for drug-like molecules
   – David Palmer
   – Support vector machines
       • gamma, cost, epsilon
   – 127 descriptors
• Large search space
   – stochastic algorithm required
Ant Colony Optimisation (ACO)
•   Insipired by behaviour of ants foraging for food
•   Ants lay down pheromones, which influence the path taken by other
    ants. Meanwhile pheromones are evaporating.
•   Ants’ trails converge to shortest path between nest and food




•   Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992.
•   ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024.
•   We have extended it to perform simultaneous parameter optimisation
Ant Colony Optimisation (ACO)

• population of ants (typically 50 to 100)
    – each ant represents a model – i.e. a subset of descriptors and
      values for the parameters

    – each ant has a fitness score, e.g. 10-fold cross validation rmse

• it is more likely that an ant will choose a particular
  descriptor/parameter value in the next iteration if
    – many ants have chosen it in this iteration (local search), or

    – many ants have chosen it in their best models to date (global
      search)

• for descriptors/parameter values that are not chosen in the
  current or best models, the probability that they will be chosen
  decreases (evaporation)
Does it work?

Más contenido relacionado

Destacado

Hyperchem Ma, badbarcode en_1109_nocomment-final
Hyperchem Ma, badbarcode en_1109_nocomment-finalHyperchem Ma, badbarcode en_1109_nocomment-final
Hyperchem Ma, badbarcode en_1109_nocomment-final
PacSecJP
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
Mobiliuz
 
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
Prasanthperceptron
 
Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97
David Leahy
 
Qsar introduction beginners
Qsar introduction beginnersQsar introduction beginners
Qsar introduction beginners
Antonio Cassano
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolbox
guestcfca1eb1
 
QSAR Study on Antitubercular Drug Derivatives
QSAR Study on Antitubercular Drug DerivativesQSAR Study on Antitubercular Drug Derivatives
QSAR Study on Antitubercular Drug Derivatives
Lydia Yeshitla
 
Qsar studies of saponine analogues for anticancer activity by sagar alone
Qsar studies of saponine analogues for anticancer activity by sagar aloneQsar studies of saponine analogues for anticancer activity by sagar alone
Qsar studies of saponine analogues for anticancer activity by sagar alone
sagar alone
 

Destacado (17)

Computational Chemistry Robots
Computational Chemistry RobotsComputational Chemistry Robots
Computational Chemistry Robots
 
Making the most of a QM calculation
Making the most of a QM calculationMaking the most of a QM calculation
Making the most of a QM calculation
 
Hyperchem Ma, badbarcode en_1109_nocomment-final
Hyperchem Ma, badbarcode en_1109_nocomment-finalHyperchem Ma, badbarcode en_1109_nocomment-final
Hyperchem Ma, badbarcode en_1109_nocomment-final
 
Quantum pharmacology. Basics
Quantum pharmacology. BasicsQuantum pharmacology. Basics
Quantum pharmacology. Basics
 
JAI
JAIJAI
JAI
 
Focus on Reading Structure Activity
Focus on Reading Structure Activity Focus on Reading Structure Activity
Focus on Reading Structure Activity
 
Computer Aided Drug Design QSAR Related Methods
Computer Aided Drug Design QSAR Related MethodsComputer Aided Drug Design QSAR Related Methods
Computer Aided Drug Design QSAR Related Methods
 
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
2 d qsar model of dihydrofolate reductase (dhfr) inhibitors with activity in ...
 
Electronegativity
ElectronegativityElectronegativity
Electronegativity
 
Discovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSKDiscovery Bus: UK QSAR meeting at GSK
Discovery Bus: UK QSAR meeting at GSK
 
Molecular design of life
Molecular design of lifeMolecular design of life
Molecular design of life
 
Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97Automating Drug Design Nov 13th 2009 97
Automating Drug Design Nov 13th 2009 97
 
Molecular design
Molecular designMolecular design
Molecular design
 
Qsar introduction beginners
Qsar introduction beginnersQsar introduction beginners
Qsar introduction beginners
 
Introduction to OECD QSAR Toolbox
Introduction to OECD QSAR ToolboxIntroduction to OECD QSAR Toolbox
Introduction to OECD QSAR Toolbox
 
QSAR Study on Antitubercular Drug Derivatives
QSAR Study on Antitubercular Drug DerivativesQSAR Study on Antitubercular Drug Derivatives
QSAR Study on Antitubercular Drug Derivatives
 
Qsar studies of saponine analogues for anticancer activity by sagar alone
Qsar studies of saponine analogues for anticancer activity by sagar aloneQsar studies of saponine analogues for anticancer activity by sagar alone
Qsar studies of saponine analogues for anticancer activity by sagar alone
 

Similar a Data Analysis in QSAR

IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
grssieee
 
Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...
Chakkrit (Kla) Tantithamthavorn
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
Andrew McEachran
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
jaumebp
 
structure analysis.ppt
structure analysis.pptstructure analysis.ppt
structure analysis.ppt
RaviJangid77
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Rajarshi Guha
 

Similar a Data Analysis in QSAR (20)

ADMET.pptx
ADMET.pptxADMET.pptx
ADMET.pptx
 
IGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.pptIGARSS2011-I-Ling.ppt
IGARSS2011-I-Ling.ppt
 
BIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_ProjectBIM_2010_20_Bioinformatics_Project
BIM_2010_20_Bioinformatics_Project
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
 
Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...Automated parameter optimization should be included in future 
defect predict...
Automated parameter optimization should be included in future 
defect predict...
 
A comparison of particle swarm optimization and the genetic algorithm by Rani...
A comparison of particle swarm optimization and the genetic algorithm by Rani...A comparison of particle swarm optimization and the genetic algorithm by Rani...
A comparison of particle swarm optimization and the genetic algorithm by Rani...
 
A comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction modelsA comparison of three chromatographic retention time prediction models
A comparison of three chromatographic retention time prediction models
 
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...Data Mining Protein Structures' Topological Properties  to Enhance Contact Ma...
Data Mining Protein Structures' Topological Properties to Enhance Contact Ma...
 
T180203125133
T180203125133T180203125133
T180203125133
 
A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...A parsimonious SVM model selection criterion for classification of real-world ...
A parsimonious SVM model selection criterion for classification of real-world ...
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
AI 바이오 (4일차).pdf
AI 바이오 (4일차).pdfAI 바이오 (4일차).pdf
AI 바이오 (4일차).pdf
 
structure analysis.ppt
structure analysis.pptstructure analysis.ppt
structure analysis.ppt
 
structure analysis.ppt
structure analysis.pptstructure analysis.ppt
structure analysis.ppt
 
structure analysis.ppt
structure analysis.pptstructure analysis.ppt
structure analysis.ppt
 
structure analysis.ppt
structure analysis.pptstructure analysis.ppt
structure analysis.ppt
 
Improved Particle Swarm Optimization
Improved Particle Swarm OptimizationImproved Particle Swarm Optimization
Improved Particle Swarm Optimization
 
an improver particle optmizacion plan de negocios
an improver particle optmizacion plan de negociosan improver particle optmizacion plan de negocios
an improver particle optmizacion plan de negocios
 
Small Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity DataSmall Molecules and siRNA: Methods to Explore Bioactivity Data
Small Molecules and siRNA: Methods to Explore Bioactivity Data
 
Cohen
CohenCohen
Cohen
 

Más de baoilleach

Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
baoilleach
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
baoilleach
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
baoilleach
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
baoilleach
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
baoilleach
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment rates
baoilleach
 

Más de baoilleach (20)

We need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILESWe need to talk about Kekulization, Aromaticity and SMILES
We need to talk about Kekulization, Aromaticity and SMILES
 
Open Babel project overview
Open Babel project overviewOpen Babel project overview
Open Babel project overview
 
So I have an SD File... What do I do next?
So I have an SD File... What do I do next?So I have an SD File... What do I do next?
So I have an SD File... What do I do next?
 
Chemistrify the Web
Chemistrify the WebChemistrify the Web
Chemistrify the Web
 
Universal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES stringUniversal Smiles: Finally a canonical SMILES string
Universal Smiles: Finally a canonical SMILES string
 
What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2What's New and Cooking in Open Babel 2.3.2
What's New and Cooking in Open Babel 2.3.2
 
Intro to Open Babel
Intro to Open BabelIntro to Open Babel
Intro to Open Babel
 
Protein-ligand docking
Protein-ligand dockingProtein-ligand docking
Protein-ligand docking
 
Cheminformatics
CheminformaticsCheminformatics
Cheminformatics
 
Large-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cellsLarge-scale computational design and selection of polymers for solar cells
Large-scale computational design and selection of polymers for solar cells
 
My Open Access papers
My Open Access papersMy Open Access papers
My Open Access papers
 
Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...Improving the quality of chemical databases with community-developed tools (a...
Improving the quality of chemical databases with community-developed tools (a...
 
De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...De novo design of molecular wires with optimal properties for solar energy co...
De novo design of molecular wires with optimal properties for solar energy co...
 
Cinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tuneCinfony - Bring cheminformatics toolkits into tune
Cinfony - Bring cheminformatics toolkits into tune
 
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...Density functional theory calculations on Ruthenium polypyridyl complexes inc...
Density functional theory calculations on Ruthenium polypyridyl complexes inc...
 
Application of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling MicroscopyApplication of Density Functional Theory to Scanning Tunneling Microscopy
Application of Density Functional Theory to Scanning Tunneling Microscopy
 
Towards Practical Molecular Devices
Towards Practical Molecular DevicesTowards Practical Molecular Devices
Towards Practical Molecular Devices
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...Why multiple scoring functions can improve docking performance - Testing hypo...
Why multiple scoring functions can improve docking performance - Testing hypo...
 
Improving enrichment rates
Improving enrichment ratesImproving enrichment rates
Improving enrichment rates
 

Último

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 

Data Analysis in QSAR

  • 1. Data analysis in QSAR Noel O’Boyle Dave Palmer, John Mitchell
  • 2. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose
  • 3. Quantitative Structure-Activity Relationship (QSAR) Also QSPR (Property) Perceive Physical Structure Predict property Propose Descriptors Molecular weight No.of H-bonding groups Surface area MOE: 180 descriptors
  • 4. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3)
  • 5. Quantitative Structure-Activity Relationship (QSAR) Need to use an estimate of the error Optimise model on training set (2/3) of prediction (CV or bootstrap, but not Test model on test set (1/3) resubstitution) or Build model on training set (2/3 of 2/3) Optimise on test set (1/3 of 2/3) Using the error of prediction Test model on validation set (1/3) Testing must be carried out on a hold-out set from the same distribution as the the set of molecules used for optimisation => don't optimise model on diverse set Assessing Model Fit by Cross-Validation, JCICS, 2003, 43, 579. Beware of q2, J.Mol.Graph.Model., 2002, 20, 269.
  • 6. Feature selection Number of descriptors → Features = descriptors 2n different subsets
  • 7. Parameter optimisation – Models have parameters which should be tuned • Support vector machines – gamma, cost, epsilon
  • 8. QSAR: Feature selection and parameter optimisation • Aim: To find a robust method of simultaneously optimising the parameters and performing feature selection • Prediction of solubility for drug-like molecules – David Palmer – Support vector machines • gamma, cost, epsilon – 127 descriptors • Large search space – stochastic algorithm required
  • 9. Ant Colony Optimisation (ACO) • Insipired by behaviour of ants foraging for food • Ants lay down pheromones, which influence the path taken by other ants. Meanwhile pheromones are evaporating. • Ants’ trails converge to shortest path between nest and food • Ant Colony Optimisation (ACO) – Marco Dorigo, PhD Thesis, 1992. • ACO for feature selection – Shen et al, JCIM, 2005, 45, 1024. • We have extended it to perform simultaneous parameter optimisation
  • 10. Ant Colony Optimisation (ACO) • population of ants (typically 50 to 100) – each ant represents a model – i.e. a subset of descriptors and values for the parameters – each ant has a fitness score, e.g. 10-fold cross validation rmse • it is more likely that an ant will choose a particular descriptor/parameter value in the next iteration if – many ants have chosen it in this iteration (local search), or – many ants have chosen it in their best models to date (global search) • for descriptors/parameter values that are not chosen in the current or best models, the probability that they will be chosen decreases (evaporation)