SlideShare una empresa de Scribd logo
1 de 24
Descargar para leer sin conexión
See	discussions,	stats,	and	author	profiles	for	this	publication	at:	http://www.researchgate.net/publication/280066187
Comparative	Assessment	of	Machine-Learning	Scoring
Functions	on	PDBbind	2013	(Demo)
DATASET	·	JULY	2015
DOWNLOADS
2
2	AUTHORS,	INCLUDING:
Mohamed	AbdElAziz	Khamis
Egypt-Japan	University	of	Science	and	Technology
21	PUBLICATIONS			26	CITATIONS			
SEE	PROFILE
Available	from:	Mohamed	AbdElAziz	Khamis
Retrieved	on:	15	July	2015
Mohamed A. Khamis, Walid Gomaa, Comparative assessment of machine-learning
scoring functions on PDBbind 2013, Engineering Applications of Artificial Intelligence
(2015), http://dx.doi.org/10.1016/j.engappai.2015.06.021
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Objective
http://dx.doi.org/10.1016/j.engappai.2015.06.0212
 We present a comparative assessment of machine-learning scoring
functions on PDBbind 2013 in computational docking.
 Computational docking is the process of predicting the best pose
(orientation + conformation) of a small molecule (drug candidate)
when bound to a target larger receptor molecule (protein) in order
to form a stable complex molecule.
 A scoring function is a mathematical predictive model that produces
a score that represents the binding free energy of a binding pose.
 The result of the docking process is a set of ligands ranked according
to their predicted binding scores.
Powers of Scoring Functions
3
 Scoring Power: Score the protein-ligand complex.
 Ranking Power: Rank different ligands bound to the
same target protein.
 Docking Power: Identify the native binding pose among
computer-generated decoys.
 Screening Power: Classify the true binders versus the
negative binders (random molecules).
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Powers of Scoring Functions -
Measurements
4
 Scoring Power: Pearson linear correlation coefficient
between predicted & experimentally determined binding
affinities.
 Ranking Power: Ranking percentage (high-level ranking,
low-level ranking, Spearman rank correlation coefficient).
 Docking Power: Root-mean-square-deviation (RMSD) value
between the native binding pose & best-scored binding pose.
 Screening Power: Total number of true binders among the
1%, 5%, and 10% top-ranked ligands.
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Molecular Features
5
 For the scoring and ranking powers, the proposed ML scoring functions
depend on wide range of features that entirely characterize the protein-
ligand complexes.
 These features include geometrical features of the RF-Score (Ballester and
Mitchell, 2010) (36 features), energy terms of the BALL software
(Hildebrandt et al., 2010) (5 features) and energy terms of the X-Score
(Wang et al., 2002) (8 features), and pharmacophore features of the SLIDE
software (Zavodszky et al., 2002) (59 features).
 We perform dimensionality reduction using the principal component
analysis (PCA) technique.
 For the docking and screening powers, the proposed ML scoring functions
depend on the geometrical features of the RF-Score (Ballester and Mitchell,
2010) (36 features).
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Summary of the scoring functions
evaluated in CASF-2013
6 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Optimal parameters values of the 12 ML scoring functions
on the scoring, ranking, docking, and screening powers
7 http://dx.doi.org/10.1016/j.engappai.2015.06.021
 Random Forests (RF), Boosted Regression Trees (BRT), K-Nearest Neighbours
(kNN), Multivariate Adaptive Regression Splines (MARS), Neural Network (NN),
Partial Least Squares Regression (PLSR), Principle Component Regression
(PCR), Logistic Regression (LR) , Multiple Linear Regression (MLR), Regression
with Regularization (RR), Support Vector Machines (SVM), Decision Tree (DT).
Performance of the 20 classical scoring functions versus the 12 ML scoring
functions with the most important 17 principle components (with @ML
suffix) in the scoring power test
8 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Protein family dependent scoring power of
top ML scoring functions
9 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Effect of changing the number of principal components
on the top 5 ML scoring functions scoring power
10 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Effect of applying the PCA technique on the top 10 ML
scoring functions scoring power
11 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Performance of the 20 classical scoring functions versus the 12 ML
scoring functions with the most important 17 principal components (with
@ML suffix) in the ranking power test
12 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Protein family dependent ranking power of
the top ML scoring functions
13 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Effect of changing the number of principal components on
the top 5 ML scoring functions high level ranking power
14 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Effect of changing the number of principal components on
the top 5 ML scoring functions low level ranking power
15 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Effect of applying the PCA technique on the top 10 ML
scoring functions ranking power
16 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Success rates in the docking power test when one or more best-scored ligand binding poses
are considered. The cutoff of acceptance here is that the RMSD value between one best-
scored binding pose and the true binding pose is lower than 2.0 ˚A. The scoring functions are
ranked when the top three best-scored ligand poses are considered to match the native pose.
17 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Enrichment factors of all 20 scoring functions versus the 12 ML
scoring functions in the screening power test. The scoring functions are
ranked by their average enrichment factor obtained at the top 1% level.
18 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Success rates of finding the best ligand molecule of all 20 scoring functions versus the 12 ML
scoring functions in the screening power test. Scoring functions are ranked by their success
rates obtained at the top 1% level. Numbers in brackets are the number of successful cases,
for which the upper limit is 65 (for ML scoring functions the upper limit is 62).
19 http://dx.doi.org/10.1016/j.engappai.2015.06.021
Conclusion
20
 Machine Learning techniques give ability to utilize as many
relevant molecular features (e.g., geometric features,
pharmacophore features, etc.) as possible.
 Particularly, ensemble-based machine learning approaches
(e.g., random forest, boosted regression trees, etc.) are
resilient to over fitting.
 For docking & screening powers, machine learning
techniques need to be more target-specific, train on a larger
number of known binders for each target protein, using SVM
classifier for discriminating actives from decoys instead of
SVR regressor.
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Acknowledgement
21
 This work is supported:
 Mainly by Information Technology Industry
Development Agency (ITIDA) under ITAC Program
grant number CFP#58
 In part by E-JUST Research Fellowship
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Publications
22
 Mohamed A. Khamis, Walid Gomaa, Walaa A. Fathy,
Machine Learning in Computational Docking,
Artificial Intelligence in Medicine, Elsevier, Volume 63,
Feb 2015, Pages 135–152.
 Mohamed A. Khamis, Walid Gomaa, Basem Galal,
Deep Learning Competes Random Forest in
Computational Docking, Artificial Intelligence in
Medicine, Elsevier, 2015 (submitted).
http://dx.doi.org/10.1016/j.engappai.2015.06.021
Supplemental Material & Questions
http://dx.doi.org/10.1016/j.engappai.2015.06.02123
 Supplemental Material:
 Source code of machine learning techniques, feature
extraction scripts, PDB IDs, and molecular features, etc.
 https://www.researchgate.net/profile/Mohamed_Khamis4
 E-mail:
 mohamed.khamis@ejust.edu.eg
 mohamed.abdelaziz.khamis@gmail.com

Más contenido relacionado

Similar a Khamis 2015 - Comparative Assessment of Machine-Learning Scoring Functions on PDBbind 2013

IRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET Journal
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...eSAT Publishing House
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...eSAT Journals
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...IRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET Journal
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxwkyra78
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningIRJET Journal
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Gurdal Ertek
 
IRJET- Placement Portal and Prediction System
IRJET- Placement Portal and Prediction SystemIRJET- Placement Portal and Prediction System
IRJET- Placement Portal and Prediction SystemIRJET Journal
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesjournalBEEI
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval PredictionIRJET Journal
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGIRJET Journal
 
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsTUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsHong-Linh Truong
 
Automatically Estimating Software Effort and Cost using Computing Intelligenc...
Automatically Estimating Software Effort and Cost using Computing Intelligenc...Automatically Estimating Software Effort and Cost using Computing Intelligenc...
Automatically Estimating Software Effort and Cost using Computing Intelligenc...cscpconf
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...ertekg
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...IOSR Journals
 
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...AIRCC Publishing Corporation
 
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...ijcsit
 
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...IJECEIAES
 
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET Journal
 

Similar a Khamis 2015 - Comparative Assessment of Machine-Learning Scoring Functions on PDBbind 2013 (20)

IRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation TechniquesIRJET- Analysis of Software Cost Estimation Techniques
IRJET- Analysis of Software Cost Estimation Techniques
 
Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...Software testing effort estimation with cobb douglas function a practical app...
Software testing effort estimation with cobb douglas function a practical app...
 
Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...Software testing effort estimation with cobb douglas function- a practical ap...
Software testing effort estimation with cobb douglas function- a practical ap...
 
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
Comparative Analysis of Tuning Hyperparameters in Policy-Based DRL Algorithm ...
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine LearningIRJET- Comparison of Classification Algorithms using Machine Learning
IRJET- Comparison of Classification Algorithms using Machine Learning
 
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docxProceedings of the 2015 Industrial and Systems Engineering Res.docx
Proceedings of the 2015 Industrial and Systems Engineering Res.docx
 
Visualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine LearningVisualizing and Forecasting Stocks Using Machine Learning
Visualizing and Forecasting Stocks Using Machine Learning
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
IRJET- Placement Portal and Prediction System
IRJET- Placement Portal and Prediction SystemIRJET- Placement Portal and Prediction System
IRJET- Placement Portal and Prediction System
 
Threshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniquesThreshold benchmarking for feature ranking techniques
Threshold benchmarking for feature ranking techniques
 
Loan Approval Prediction
Loan Approval PredictionLoan Approval Prediction
Loan Approval Prediction
 
BIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNINGBIG MART SALES PREDICTION USING MACHINE LEARNING
BIG MART SALES PREDICTION USING MACHINE LEARNING
 
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analyticsTUW-ASE Summer 2015 - Quality of Result-aware data analytics
TUW-ASE Summer 2015 - Quality of Result-aware data analytics
 
Automatically Estimating Software Effort and Cost using Computing Intelligenc...
Automatically Estimating Software Effort and Cost using Computing Intelligenc...Automatically Estimating Software Effort and Cost using Computing Intelligenc...
Automatically Estimating Software Effort and Cost using Computing Intelligenc...
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...A Hierarchical Feature Set optimization for effective code change based Defec...
A Hierarchical Feature Set optimization for effective code change based Defec...
 
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
Testing-as-a-Service (TaaS) – Capabilities and Features for Real-Time Testing...
 
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...
TESTING-AS-A-SERVICE (TAAS) – CAPABILITIES AND FEATURES FOR REAL-TIME TESTING...
 
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...
Towards Benchmarking User Stories Estimation with COSMIC Function Points-A Ca...
 
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...
 

Khamis 2015 - Comparative Assessment of Machine-Learning Scoring Functions on PDBbind 2013

  • 2. Mohamed A. Khamis, Walid Gomaa, Comparative assessment of machine-learning scoring functions on PDBbind 2013, Engineering Applications of Artificial Intelligence (2015), http://dx.doi.org/10.1016/j.engappai.2015.06.021 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 3. Objective http://dx.doi.org/10.1016/j.engappai.2015.06.0212  We present a comparative assessment of machine-learning scoring functions on PDBbind 2013 in computational docking.  Computational docking is the process of predicting the best pose (orientation + conformation) of a small molecule (drug candidate) when bound to a target larger receptor molecule (protein) in order to form a stable complex molecule.  A scoring function is a mathematical predictive model that produces a score that represents the binding free energy of a binding pose.  The result of the docking process is a set of ligands ranked according to their predicted binding scores.
  • 4. Powers of Scoring Functions 3  Scoring Power: Score the protein-ligand complex.  Ranking Power: Rank different ligands bound to the same target protein.  Docking Power: Identify the native binding pose among computer-generated decoys.  Screening Power: Classify the true binders versus the negative binders (random molecules). http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 5. Powers of Scoring Functions - Measurements 4  Scoring Power: Pearson linear correlation coefficient between predicted & experimentally determined binding affinities.  Ranking Power: Ranking percentage (high-level ranking, low-level ranking, Spearman rank correlation coefficient).  Docking Power: Root-mean-square-deviation (RMSD) value between the native binding pose & best-scored binding pose.  Screening Power: Total number of true binders among the 1%, 5%, and 10% top-ranked ligands. http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 6. Molecular Features 5  For the scoring and ranking powers, the proposed ML scoring functions depend on wide range of features that entirely characterize the protein- ligand complexes.  These features include geometrical features of the RF-Score (Ballester and Mitchell, 2010) (36 features), energy terms of the BALL software (Hildebrandt et al., 2010) (5 features) and energy terms of the X-Score (Wang et al., 2002) (8 features), and pharmacophore features of the SLIDE software (Zavodszky et al., 2002) (59 features).  We perform dimensionality reduction using the principal component analysis (PCA) technique.  For the docking and screening powers, the proposed ML scoring functions depend on the geometrical features of the RF-Score (Ballester and Mitchell, 2010) (36 features). http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 7. Summary of the scoring functions evaluated in CASF-2013 6 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 8. Optimal parameters values of the 12 ML scoring functions on the scoring, ranking, docking, and screening powers 7 http://dx.doi.org/10.1016/j.engappai.2015.06.021  Random Forests (RF), Boosted Regression Trees (BRT), K-Nearest Neighbours (kNN), Multivariate Adaptive Regression Splines (MARS), Neural Network (NN), Partial Least Squares Regression (PLSR), Principle Component Regression (PCR), Logistic Regression (LR) , Multiple Linear Regression (MLR), Regression with Regularization (RR), Support Vector Machines (SVM), Decision Tree (DT).
  • 9. Performance of the 20 classical scoring functions versus the 12 ML scoring functions with the most important 17 principle components (with @ML suffix) in the scoring power test 8 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 10. Protein family dependent scoring power of top ML scoring functions 9 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 11. Effect of changing the number of principal components on the top 5 ML scoring functions scoring power 10 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 12. Effect of applying the PCA technique on the top 10 ML scoring functions scoring power 11 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 13. Performance of the 20 classical scoring functions versus the 12 ML scoring functions with the most important 17 principal components (with @ML suffix) in the ranking power test 12 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 14. Protein family dependent ranking power of the top ML scoring functions 13 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 15. Effect of changing the number of principal components on the top 5 ML scoring functions high level ranking power 14 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 16. Effect of changing the number of principal components on the top 5 ML scoring functions low level ranking power 15 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 17. Effect of applying the PCA technique on the top 10 ML scoring functions ranking power 16 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 18. Success rates in the docking power test when one or more best-scored ligand binding poses are considered. The cutoff of acceptance here is that the RMSD value between one best- scored binding pose and the true binding pose is lower than 2.0 ˚A. The scoring functions are ranked when the top three best-scored ligand poses are considered to match the native pose. 17 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 19. Enrichment factors of all 20 scoring functions versus the 12 ML scoring functions in the screening power test. The scoring functions are ranked by their average enrichment factor obtained at the top 1% level. 18 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 20. Success rates of finding the best ligand molecule of all 20 scoring functions versus the 12 ML scoring functions in the screening power test. Scoring functions are ranked by their success rates obtained at the top 1% level. Numbers in brackets are the number of successful cases, for which the upper limit is 65 (for ML scoring functions the upper limit is 62). 19 http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 21. Conclusion 20  Machine Learning techniques give ability to utilize as many relevant molecular features (e.g., geometric features, pharmacophore features, etc.) as possible.  Particularly, ensemble-based machine learning approaches (e.g., random forest, boosted regression trees, etc.) are resilient to over fitting.  For docking & screening powers, machine learning techniques need to be more target-specific, train on a larger number of known binders for each target protein, using SVM classifier for discriminating actives from decoys instead of SVR regressor. http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 22. Acknowledgement 21  This work is supported:  Mainly by Information Technology Industry Development Agency (ITIDA) under ITAC Program grant number CFP#58  In part by E-JUST Research Fellowship http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 23. Publications 22  Mohamed A. Khamis, Walid Gomaa, Walaa A. Fathy, Machine Learning in Computational Docking, Artificial Intelligence in Medicine, Elsevier, Volume 63, Feb 2015, Pages 135–152.  Mohamed A. Khamis, Walid Gomaa, Basem Galal, Deep Learning Competes Random Forest in Computational Docking, Artificial Intelligence in Medicine, Elsevier, 2015 (submitted). http://dx.doi.org/10.1016/j.engappai.2015.06.021
  • 24. Supplemental Material & Questions http://dx.doi.org/10.1016/j.engappai.2015.06.02123  Supplemental Material:  Source code of machine learning techniques, feature extraction scripts, PDB IDs, and molecular features, etc.  https://www.researchgate.net/profile/Mohamed_Khamis4  E-mail:  mohamed.khamis@ejust.edu.eg  mohamed.abdelaziz.khamis@gmail.com