SlideShare una empresa de Scribd logo
1 de 22
Christos Kannas
University of Cyprus
Department of Computer
Science
Outline
• Introduction
• Related Work
• ESOL
• RDKit based Implementation
• Results
• Correlation Table & Chart
• Conclusion

3rd October, 2013

2nd RDKit UGM

2
Introduction
• Need to estimate the solubility of
molecules in:
• DMSO (CS(=O)C), and
• Water.

• Predictive Models for DMSO and Water
Solubility.

3rd October, 2013

2nd RDKit UGM

3
3rd October, 2013

2nd RDKit UGM

4
Related Work
• J. S. Delaney, “ESOL: Estimating Aqueous
Solubility Directly from Molecular
Structure,” Journal of Chemical
Information and Modeling, vol. 44, no. 3,
pp. 1000–1005, May 2004.

3rd October, 2013

2nd RDKit UGM

5
Related Work: ESOL
• ESOL – Estimated SOLubility
• Linear Regression Model
• 8 Molecular Properties (Initially)
• Preeminent Method: General Solubility
Equation (GSE), logP and melting point
(Tm)

3rd October, 2013

2nd RDKit UGM

6
ESOL: Molecular Properties
(Initial) 1/3
• clogP – Daylight CLOGP v4.72
• MolWeight
• RotBonds – Rotatable Bonds, Daylight
SMARTS structures define rotatable bonds

3rd October, 2013

2nd RDKit UGM

7
ESOL: Molecular Properties
(Initial) 2/3
• Aromatic Proportion (AromProp) – The
proportion of heavy atoms in the molecule
that are in an aromatic ring. Daylight
SMARTS ([a]) aromatic atoms.

• Non-Carbon Proportion – The proportion
of heavy atoms in a molecule that are not
carbon. Daylight SMARTS ([!#6])

3rd October, 2013

2nd RDKit UGM

8
ESOL: Molecular Properties
(Initial) 3/3
• H-bond Donors
• H-bond Acceptors
• Polar Surface Area – Peter Ertl’s Polar
Surface Area

3rd October, 2013

2nd RDKit UGM

9
ESOL: Methodology
• Multiple Linear Regression
• Significance of each parameter based in
terms of its absolute t-statistic.

3rd October, 2013

2nd RDKit UGM

10
ESOL: Train Dataset
• Training Set: 2874 molecules
• Small – Low MolWeight organic
compounds
• Medium – Pesticide products,
MolWeight 200-300
• Large – Sygenta compounds,
MolWeight 300-400

3rd October, 2013

2nd RDKit UGM

11
ESOL: Results
• 4 parameters with t-statistic > 2
• clogP
• MolWeight
• RotBonds
• AromProp

Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromProp

3rd October, 2013

J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular
Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp.
1000–1005, May 2004.
2nd RDKit UGM
12
3rd October, 2013

2nd RDKit UGM

13
RDKit Based Implementation 1/2
• Use Regression Equation:
Log(Sw) = 0.16
- 0.63 x clogP
- 0.0062 x MolWeight
+ 0.066 x RotBonds
- 0.74 x AromProp
• Calculate properties using RDKit.

3rd October, 2013

2nd RDKit UGM

14
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

15
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

16
RDKit Based Implementation 2/2

3rd October, 2013

2nd RDKit UGM

17
3rd October, 2013

2nd RDKit UGM

18
Testing…
• Supplementary Dataset:
• 1143 molecules with:
• Measured Water Solubility (logSw)
• ESOL

• Correlation Charts:
• Measured vs ESOL
• Measured vs RDKit_clogSw
• ESOL vs RDKit_clogSw
• Measured vs ESOL vs RDKit_clogSw
3rd October, 2013

2nd RDKit UGM

19
Correlation Table & Chart
IMPORTED_measured log(solubility:mol/L)
IMPORTED_measured log(solubility:mol/L)

IMPORTED_ESOL predicted
log(solubility:mol/L)

clogSw

1

IMPORTED_ESOL predicted log(solubility:mol/L)

0.90794375
0.864718601

clogSw

Predicted log(solubility:mol/L)

0.964683313

Predicted vs Measured

IMPORTED_ESOL predicted
log(solubility:mol/L)
clogSw

-12

1

4

2

Linear (IMPORTED_ESOL predicted
log(solubility:mol/L))
Linear (clogSw)

0
-10

-8

-6

-4

-2

0

2

-2

-4

-6

-8

Measured log(solubility:mol/L)

3rd October, 2013

2nd RDKit UGM

-10

20

1
Conclusion
• Comparable results.
• Easy, fast and relatively accurate.
• What is importance of adding Hydrogens
prior to Aromatic Proportion calculation?

3rd October, 2013

2nd RDKit UGM

21
3rd October, 2013

2nd RDKit UGM

22

Más contenido relacionado

Similar a Estimate Water Solubility

Pin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.comPin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.comwww. mientayvn.com
 
Fast Scanning Chip Calorimetry
Fast Scanning Chip CalorimetryFast Scanning Chip Calorimetry
Fast Scanning Chip CalorimetryInsideScientific
 
Experimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heatExperimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heatiaemedu
 
Experimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latentExperimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latentIAEME Publication
 
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...IOSR Journals
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxyAnn-Marie Roche
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxyAnn-Marie Roche
 
CS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 PresentationCS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 Presentationchetsean7
 
Formulas and Equations
Formulas and EquationsFormulas and Equations
Formulas and EquationsLumen Learning
 
Lesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdfLesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdfssuser71bc9c
 
Slides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptxSlides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptxOlabanji3
 
To study the behavior of nanofluids in heat transfer applications a review
To study the behavior of nanofluids in heat transfer applications  a reviewTo study the behavior of nanofluids in heat transfer applications  a review
To study the behavior of nanofluids in heat transfer applications a revieweSAT Journals
 
5th International Conference : Garvin Heath
5th International Conference : Garvin Heath5th International Conference : Garvin Heath
5th International Conference : Garvin Heathicarb
 
Investigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slagInvestigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slageSAT Publishing House
 
PVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptxPVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptxAhmedTalaatEinar
 
Experimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage SystemExperimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage SystemIRJET Journal
 

Similar a Estimate Water Solubility (20)

Pin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.comPin mặt trời chất màu nhạy quang www.mientayvn.com
Pin mặt trời chất màu nhạy quang www.mientayvn.com
 
Fast Scanning Chip Calorimetry
Fast Scanning Chip CalorimetryFast Scanning Chip Calorimetry
Fast Scanning Chip Calorimetry
 
Experimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heatExperimental investigation of a double slope solar still with a latent heat
Experimental investigation of a double slope solar still with a latent heat
 
Experimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latentExperimental investigation of a double slope solar still with a latent
Experimental investigation of a double slope solar still with a latent
 
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
Multi-Element Determination of Cu, Mn, and Se using Electrothermal Atomic Abs...
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
Making solubility models with reaxy
Making solubility models with reaxyMaking solubility models with reaxy
Making solubility models with reaxy
 
Meso- and Microporous Carbon Electrode and Its Effect on the Capacitive, Ene...
Meso- and Microporous Carbon Electrode and Its Effect on the  Capacitive, Ene...Meso- and Microporous Carbon Electrode and Its Effect on the  Capacitive, Ene...
Meso- and Microporous Carbon Electrode and Its Effect on the Capacitive, Ene...
 
Sampling Techniques
Sampling TechniquesSampling Techniques
Sampling Techniques
 
CS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 PresentationCS Biliyok ESCAPE22 Presentation
CS Biliyok ESCAPE22 Presentation
 
Formulas and Equations
Formulas and EquationsFormulas and Equations
Formulas and Equations
 
Kinetics Project
Kinetics ProjectKinetics Project
Kinetics Project
 
Lesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdfLesson 1_Interrelated Scientific Principles.pdf
Lesson 1_Interrelated Scientific Principles.pdf
 
Slides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptxSlides for NSBE Oral Presentation.pptx
Slides for NSBE Oral Presentation.pptx
 
To study the behavior of nanofluids in heat transfer applications a review
To study the behavior of nanofluids in heat transfer applications  a reviewTo study the behavior of nanofluids in heat transfer applications  a review
To study the behavior of nanofluids in heat transfer applications a review
 
5th International Conference : Garvin Heath
5th International Conference : Garvin Heath5th International Conference : Garvin Heath
5th International Conference : Garvin Heath
 
Investigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slagInvestigation on the activating effect of na2 co3 and naoh on slag
Investigation on the activating effect of na2 co3 and naoh on slag
 
PVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptxPVT Correlations for Gas Calculations.pptx
PVT Correlations for Gas Calculations.pptx
 
Novak2012
Novak2012Novak2012
Novak2012
 
Experimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage SystemExperimental Study on Phase Change Material based Thermal Energy Storage System
Experimental Study on Phase Change Material based Thermal Energy Storage System
 

Más de Christos Kannas

CKannas PhD Thesis Slides
CKannas PhD Thesis SlidesCKannas PhD Thesis Slides
CKannas PhD Thesis SlidesChristos Kannas
 
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818Christos Kannas
 
LiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchLiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchChristos Kannas
 
LiSIs Poster Presentation
LiSIs Poster PresentationLiSIs Poster Presentation
LiSIs Poster PresentationChristos Kannas
 
GCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkGCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkChristos Kannas
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Christos Kannas
 
20120615_Granatum_COST_v2
20120615_Granatum_COST_v220120615_Granatum_COST_v2
20120615_Granatum_COST_v2Christos Kannas
 
2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGAChristos Kannas
 
9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGAChristos Kannas
 

Más de Christos Kannas (13)

CKannas PhD Thesis Slides
CKannas PhD Thesis SlidesCKannas PhD Thesis Slides
CKannas PhD Thesis Slides
 
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
CKannas_ACS_MOST_Transfomation_Based_DnD_20150818
 
CSC2013_LiSIs_poster
CSC2013_LiSIs_posterCSC2013_LiSIs_poster
CSC2013_LiSIs_poster
 
LiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences ResearchLiSIs: a Galaxy based platform for Life Sciences Research
LiSIs: a Galaxy based platform for Life Sciences Research
 
Diversity Filtering
Diversity FilteringDiversity Filtering
Diversity Filtering
 
LiSIs platform
LiSIs platformLiSIs platform
LiSIs platform
 
LiSIs Poster Presentation
LiSIs Poster PresentationLiSIs Poster Presentation
LiSIs Poster Presentation
 
GCC2013 LiSIs poster
GCC2013 LiSIs posterGCC2013 LiSIs poster
GCC2013 LiSIs poster
 
GCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning TalkGCC2013 LiSIs Lightning Talk
GCC2013 LiSIs Lightning Talk
 
Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0Granatum_LiSIs_BIBE_2012_presentation_v4.0
Granatum_LiSIs_BIBE_2012_presentation_v4.0
 
20120615_Granatum_COST_v2
20120615_Granatum_COST_v220120615_Granatum_COST_v2
20120615_Granatum_COST_v2
 
2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA2009 MSc Presentation for Parallel-MEGA
2009 MSc Presentation for Parallel-MEGA
 
9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA9th ITAB 2009 Parallel-MEGA
9th ITAB 2009 Parallel-MEGA
 

Último

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 

Último (20)

Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 

Estimate Water Solubility

  • 1. Christos Kannas University of Cyprus Department of Computer Science
  • 2. Outline • Introduction • Related Work • ESOL • RDKit based Implementation • Results • Correlation Table & Chart • Conclusion 3rd October, 2013 2nd RDKit UGM 2
  • 3. Introduction • Need to estimate the solubility of molecules in: • DMSO (CS(=O)C), and • Water. • Predictive Models for DMSO and Water Solubility. 3rd October, 2013 2nd RDKit UGM 3
  • 4. 3rd October, 2013 2nd RDKit UGM 4
  • 5. Related Work • J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004. 3rd October, 2013 2nd RDKit UGM 5
  • 6. Related Work: ESOL • ESOL – Estimated SOLubility • Linear Regression Model • 8 Molecular Properties (Initially) • Preeminent Method: General Solubility Equation (GSE), logP and melting point (Tm) 3rd October, 2013 2nd RDKit UGM 6
  • 7. ESOL: Molecular Properties (Initial) 1/3 • clogP – Daylight CLOGP v4.72 • MolWeight • RotBonds – Rotatable Bonds, Daylight SMARTS structures define rotatable bonds 3rd October, 2013 2nd RDKit UGM 7
  • 8. ESOL: Molecular Properties (Initial) 2/3 • Aromatic Proportion (AromProp) – The proportion of heavy atoms in the molecule that are in an aromatic ring. Daylight SMARTS ([a]) aromatic atoms. • Non-Carbon Proportion – The proportion of heavy atoms in a molecule that are not carbon. Daylight SMARTS ([!#6]) 3rd October, 2013 2nd RDKit UGM 8
  • 9. ESOL: Molecular Properties (Initial) 3/3 • H-bond Donors • H-bond Acceptors • Polar Surface Area – Peter Ertl’s Polar Surface Area 3rd October, 2013 2nd RDKit UGM 9
  • 10. ESOL: Methodology • Multiple Linear Regression • Significance of each parameter based in terms of its absolute t-statistic. 3rd October, 2013 2nd RDKit UGM 10
  • 11. ESOL: Train Dataset • Training Set: 2874 molecules • Small – Low MolWeight organic compounds • Medium – Pesticide products, MolWeight 200-300 • Large – Sygenta compounds, MolWeight 300-400 3rd October, 2013 2nd RDKit UGM 11
  • 12. ESOL: Results • 4 parameters with t-statistic > 2 • clogP • MolWeight • RotBonds • AromProp Log(Sw) = 0.16 - 0.63 x clogP - 0.0062 x MolWeight + 0.066 x RotBonds - 0.74 x AromProp 3rd October, 2013 J. S. Delaney, “ESOL: Estimating Aqueous Solubility Directly from Molecular Structure,” Journal of Chemical Information and Modeling, vol. 44, no. 3, pp. 1000–1005, May 2004. 2nd RDKit UGM 12
  • 13. 3rd October, 2013 2nd RDKit UGM 13
  • 14. RDKit Based Implementation 1/2 • Use Regression Equation: Log(Sw) = 0.16 - 0.63 x clogP - 0.0062 x MolWeight + 0.066 x RotBonds - 0.74 x AromProp • Calculate properties using RDKit. 3rd October, 2013 2nd RDKit UGM 14
  • 15. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 15
  • 16. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 16
  • 17. RDKit Based Implementation 2/2 3rd October, 2013 2nd RDKit UGM 17
  • 18. 3rd October, 2013 2nd RDKit UGM 18
  • 19. Testing… • Supplementary Dataset: • 1143 molecules with: • Measured Water Solubility (logSw) • ESOL • Correlation Charts: • Measured vs ESOL • Measured vs RDKit_clogSw • ESOL vs RDKit_clogSw • Measured vs ESOL vs RDKit_clogSw 3rd October, 2013 2nd RDKit UGM 19
  • 20. Correlation Table & Chart IMPORTED_measured log(solubility:mol/L) IMPORTED_measured log(solubility:mol/L) IMPORTED_ESOL predicted log(solubility:mol/L) clogSw 1 IMPORTED_ESOL predicted log(solubility:mol/L) 0.90794375 0.864718601 clogSw Predicted log(solubility:mol/L) 0.964683313 Predicted vs Measured IMPORTED_ESOL predicted log(solubility:mol/L) clogSw -12 1 4 2 Linear (IMPORTED_ESOL predicted log(solubility:mol/L)) Linear (clogSw) 0 -10 -8 -6 -4 -2 0 2 -2 -4 -6 -8 Measured log(solubility:mol/L) 3rd October, 2013 2nd RDKit UGM -10 20 1
  • 21. Conclusion • Comparable results. • Easy, fast and relatively accurate. • What is importance of adding Hydrogens prior to Aromatic Proportion calculation? 3rd October, 2013 2nd RDKit UGM 21
  • 22. 3rd October, 2013 2nd RDKit UGM 22

Notas del editor

  1. Preeminent == Best, Leading
  2. Rhombi