SlideShare a Scribd company logo
1 of 33
Download to read offline
Introduction to Chemoinformatics




Dr. Igor V. Tetko
Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH)
Institute of Bioinformatics & Systems Biology (HMGU)


Kyiv, 13 August 2010, V Summer School AACIMP 2010
Helmholtz Zentrum München statistics

•    Part of Helmholtz Network (€3 Milliards, 30000 people)
•    Leading center for Environmental Health in Germany
•    25 institutes (1789 people, 569 scientists & 280 PhD students)*
•    70 contracts with EU
•    Disciplines
       •    Biology 40%
       •    Chemistry/biochemistry 15%
       •    Physics 10%
       •    Medicine 8%


•    Chemoinformatics group, Institute for Bioinformatics & Systems Biology
         9 peoples, strong expertise in in silico data analysis, machine learning methods,
          chemoinfomatics software development, data dissemination

                                         *January 2010
Layout


•  Chemoiformatics??? What does it mean?
•  Role of chemoinformatics
    •  in drug discovery
    •  Chemical Biology, NIH Roadmaps
    •  REACH, chemical safety
•  Definitions of chemoinformatics
•  OCHEM
•  Overview of the demonstration course
Cheminformatics & Bioinformatics keywords in
 SCOPUS
250                                                70000
                                            2000
200                                         2001   60000                        2000
                                            2002                                2001
                                                   50000
                                                                                2002
150                                         2003
                                                   40000                        2003
                                            2004                                2004
100                                         2005   30000                        2005
                                            2006                                2006
                                                   20000
                                                                                2007
 50                                         2007
                                                                                2008
                                                   10000
                                            2008                                2009
  0                                         2009      0
      Chemoinfomatics Cheminformatics                         Bioinfomatics




      There was no dedicated journal,                  about 10x lower if journals
      many authors does not explicitly use world       and references are exclued
      chemoinformatics
Dmitrii Ivanovich Mendeleev,
1834-1907

   Discoverer of the Periodic Table — An Early “Chemoinformatician”
Why Mendeleev?
Faced with a large amount of data, with many gaps, Mendeleev:
 •  Sought patterns where none were obvious,
 •  Made predictions about properties of unknown chemical substances,
    based on observed properties of known substances,
 •  Created a great visualization tool!


 Why he? Why at that time?
 “The Law of transformation of Quantity into Quality”
    by Karl Marx
Quantity aspects of chemoinformatics
Major challenges of Chemoinformatics


  Millions of structures
  Thousand of publications

       Storage, organization and search of information
       QSAR / QSPR studies: prediction of properties and activities


                Chemical biology & drug discovery
                In silico environmental toxicology, REACH




  QSAR/QSPR - Quantitative Structure-Activity (Property) Relationship Studies
Goal of chemistry: Synthesis of Properties




                       The most fundamental and lasting objective
                                    of synthesis is
                         not production of new compounds
                                          but
                              production of properties



                                   George S. Hammond
                                   Norris Award Lecture, 1968
Where Chemoinformatics is required?

Pharma companies and public organisation
 •    data collection and handling
 •    model development QSAR/QSPR, in silico design
 •    In vitro, in vivo data analysis and interpretation


REACH
 •    > 140,000 compounds
 •    Development of in silico methods to predict toxicity of chemical compounds


Chemical industry
 •    Design of new properties; chemical synthesis
 •    Prediction of toxicity of chemicals BEFORE they enter market
Pharma R&D: Cost and Productivity issues
Compound numbers
1.000.000 Cpds          <5000 Cpds   < 500 Cpds      < 5 Cpds
                                                                      Up to 15 Years:
                                                                                         1 drug

                   Lead          Lead Optimisation   Lead Profiling       Clinics       Approval
Target Discovery   Discovery     ADME/T              ADME/T




 In vitro and in vivo property determination:
 Millions of screens for solubility, stability, absorption, metabolism, transport,
 reactive products, drug interactions, etc etc
        Preclinics Costs: > $300m PER COMPOUND to reach approval
Pharma: Declining R&D productivity in the
drug discovery industry




                                    Approved medicine




  See http://www.frost.com/prod/servlet/market-insight-top.pag?docid=128394740
Pharma R&D Cost and productivity:
Reasons for compound failure

                                                 Pharmacokinetics
                                                 Animal toxicity
                                                 Adverse effects
                                                 Lack of efficacy
                                                 Commercial reasons
                                                 Miscellaneous




TOP four reasons are connected to compound
Absorption, Distribution, Metabolism and Excretion, all of which may
contribute to lack of efficacy and Toxicity: ADME/T issues
    Experimental measurements or computational predictions?

                        Chemoinformatics!
Public organizations: NIH Roadmaps




                            Dr. Elias Zerhouni
                            Former NIH director (2002-2008)
                            $29.5 billions in 2008 for
                            27 Institutes

                            Ukraine GDP is ca $127 billions
     Chemical Biology       Belarus GDP is ca $48 billions
Public organizations: NIH Roadmaps


New pathways to discovery                   Molecular Libraries Screening Center Network
 •    understand complex biological         (MLSCN)
      systems                                 •    10 centers were created
 •    understand their connections            •    250 assays to screen >200,000 molecules
 •    build better "toolbox" for
      researchers
                                            Chemoinfomatics!
Research teams of the future
                                              •    PubChem database to handle data
Re-engineering the clinical research team
REACH

Registration, Evaluation, Authorisation and
  Restriction of Chemical substances




European Chemicals Agency (ECHA) in Helsinki
REACH and Environmental Chemoinformatics


> 140,000 chemicals to be registered
It is expensive to measure all of them ($200,000 per compound), a lot of
    animal testing => €24 billions over next 10 years
QSAR models can be used to prioritize compounds
    •  Compound is predicted to be toxic
        •  Biological testing will be done to prove/
           disprove the models
    •  Compound is predicted to be not toxic
        •  tests can be avoided, saving money, animals
        •  but ... only if we are confident in the predictions
"One can not embrace the unembraceable.”

Possible: 1060 - 10100 molecules theoretically exist

Achievable: 1020 - 1024 can be synthesized now
by companies (weight of the Moon is ca 1023 kg)

Available: 2*107 molecules are on the market

Measured: 102 - 105 molecules with ADME/T data
                                                                Kozma Prutkov
Problem: To predict ADME/T properties of just molecules   1080 atoms in the Universe
on the market we must extrapolate data from one to
1,000 - 100,000 molecules!
Applicability domain: Key points


      There are NO universal computational models that work well
       If x is small, on the whole chemical space
       Sin(x) ≈ x
Applicability domain: Key points

      There are NO universal computational models that work well
                    on the whole chemical space




     Performance on the test set        Performance on a dissimilar set
Making predictions with the experimental accuracy:
      case study of 96,000 Pfizer molecules



       Non-reliable Reliable

                                                              60% of molecules
                                                              are predicted
                                                              with average accuracy of
                                                              0.3 log units




                   Tetko et al, Chem. & Biodiver., 2009, 6(11), 197-202.
Definitions of Chemoinformatics

Chemoinformatics is a generic term that encompasses the design, creation, organization,
management, retrieval, analysis, dissemination, visualization, and use of chemical
information. G. Paris, 1988
Chemoinformatics is the mixing of those information resources to transform data in to
information and information in to knowledge for the intended purpose of making better
decisions faster in the area of drug lead identification and optimization. F.K. Brown, 1998


Chemoinformatics is the application of informatics methods to solve chemical problems.
J. Gasteiger, 2004

Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in
multidimensional chemical space. A. Varnek & A. Tropsha, 2007.
Chemoinfomatics: data to knowledge
What is required to create a good
(chemoinformatics) model?

 •  Data is the most essential part!
 •  Appropriate representation of molecules (choice of
    descriptors is crucial)
 •  Good statistical methods
>100,000 lines of code in Java
Database schema
Simplified overview
      Property                                            Condition
                                 Filtering
                             Toxicology, Biology,         Temperature,
       logP = 0.5            Partition coefficient.         pH, species,
  Melting Point = 100
                                                          tissue, method
           °C




           Tags                Data Point                   Introducer
                                                           Bill G., Sergey B.
    Toxicology, Biology,
    Partition coefficient.
                                                       Date of modification
                                                       Informationsystem




      Structure                                             Article
                              Manipulation
    Benzene, Urea, ...              Editing                  Garberg, P
                              Organization            “In vitro models for …”
                                Working sets<
Practical course on using OCHEM and data
modeling on-line

Tomorrow 10:00-12:00 auditorium 235-6


Advice: make yourself familiar with the on-line tutorials available as
   http://ochem/tutorials


If you have data (SMILES/sdf file and molecular activities) – bring it with you
    – we can try to build model.
HMGU Team
Interested in Short-Term stays (3-12 months)?
                    Apply!!!




              http://www.eco-itn.eu

More Related Content

What's hot

Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
Prof. Dr. Basavaraj Nanjwade
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
Databricks
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Mathew Varghese
 
A seminar on design of ligands for known
A seminar on design of ligands for knownA seminar on design of ligands for known
A seminar on design of ligands for known
shishirkawde
 

What's hot (20)

Bio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformaticsBio inspiring computing and its application in cheminformatics
Bio inspiring computing and its application in cheminformatics
 
Cheminformatics in drug design
Cheminformatics in drug designCheminformatics in drug design
Cheminformatics in drug design
 
Pallavi gupta
Pallavi guptaPallavi gupta
Pallavi gupta
 
Cheminformatics: An overview
Cheminformatics: An overviewCheminformatics: An overview
Cheminformatics: An overview
 
Chemoinformatic
Chemoinformatic Chemoinformatic
Chemoinformatic
 
Chemical database preparation ppt
Chemical database preparation pptChemical database preparation ppt
Chemical database preparation ppt
 
Chemo informatics scope and applications
Chemo informatics scope and applicationsChemo informatics scope and applications
Chemo informatics scope and applications
 
Overview of cheminformatics
Overview of cheminformaticsOverview of cheminformatics
Overview of cheminformatics
 
Applications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And ProcessApplications Of Bioinformatics In Drug Discovery And Process
Applications Of Bioinformatics In Drug Discovery And Process
 
Drug Discovery and Development Using AI
Drug Discovery and Development Using AIDrug Discovery and Development Using AI
Drug Discovery and Development Using AI
 
(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習(2018.9) 分子のグラフ表現と機械学習
(2018.9) 分子のグラフ表現と機械学習
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MININGANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
ANALYSIS OF PROTEIN MICROARRAY DATA USING DATA MINING
 
COMPUTER AIDED DRUG DESIGN
COMPUTER AIDED DRUG DESIGNCOMPUTER AIDED DRUG DESIGN
COMPUTER AIDED DRUG DESIGN
 
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
Computational Biology Methods for Drug Discovery_Phase 1-5_November 2015
 
A seminar on design of ligands for known
A seminar on design of ligands for knownA seminar on design of ligands for known
A seminar on design of ligands for known
 
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERYSTRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
STRUCTURE BASED DRUG DESIGN - MOLECULAR MODELLING AND DRUG DISCOVERY
 
II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network" II-SDV 2017: The "International Chemical Ontology Network"
II-SDV 2017: The "International Chemical Ontology Network"
 
Research summary
Research summaryResearch summary
Research summary
 
Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0Omics Logic - Bioinformatics 2.0
Omics Logic - Bioinformatics 2.0
 

Viewers also liked

Σχορτσιανίτης
ΣχορτσιανίτηςΣχορτσιανίτης
Σχορτσιανίτης
haddadhlias
 
Ahmed Mohamed Maher Shafik
Ahmed Mohamed Maher ShafikAhmed Mohamed Maher Shafik
Ahmed Mohamed Maher Shafik
Ahmed Shafik
 

Viewers also liked (20)

What is Gluten? By Kerry Karl
What is Gluten? By Kerry KarlWhat is Gluten? By Kerry Karl
What is Gluten? By Kerry Karl
 
The simple-power-of-the-doodle
The simple-power-of-the-doodleThe simple-power-of-the-doodle
The simple-power-of-the-doodle
 
προσ δημο κω
προσ δημο κωπροσ δημο κω
προσ δημο κω
 
Horario 3ª fase formativa
Horario   3ª fase formativaHorario   3ª fase formativa
Horario 3ª fase formativa
 
Σχορτσιανίτης
ΣχορτσιανίτηςΣχορτσιανίτης
Σχορτσιανίτης
 
Open Id
Open IdOpen Id
Open Id
 
By Michał M.
By Michał M.By Michał M.
By Michał M.
 
Hoja de vida
Hoja de vidaHoja de vida
Hoja de vida
 
Neglected Tendo-Achilles Rupture Repair by Fhl Augmentation Using Bio-Screw a...
Neglected Tendo-Achilles Rupture Repair by Fhl Augmentation Using Bio-Screw a...Neglected Tendo-Achilles Rupture Repair by Fhl Augmentation Using Bio-Screw a...
Neglected Tendo-Achilles Rupture Repair by Fhl Augmentation Using Bio-Screw a...
 
44 joão 21 - pedro vc me ama
44 joão 21 - pedro vc me ama44 joão 21 - pedro vc me ama
44 joão 21 - pedro vc me ama
 
(الهيموفيليا)
(الهيموفيليا)(الهيموفيليا)
(الهيموفيليا)
 
Ahmed Mohamed Maher Shafik
Ahmed Mohamed Maher ShafikAhmed Mohamed Maher Shafik
Ahmed Mohamed Maher Shafik
 
Ate irekiak kartela 2015.16 baja (1) (1)(1)
Ate irekiak kartela 2015.16 baja (1) (1)(1)Ate irekiak kartela 2015.16 baja (1) (1)(1)
Ate irekiak kartela 2015.16 baja (1) (1)(1)
 
Marcos Group Cosma Tour
Marcos Group  Cosma TourMarcos Group  Cosma Tour
Marcos Group Cosma Tour
 
επικουρικος
επικουρικοςεπικουρικος
επικουρικος
 
Storytelling i tecnologia
Storytelling i tecnologiaStorytelling i tecnologia
Storytelling i tecnologia
 
Interpreting CES 2014
Interpreting CES 2014Interpreting CES 2014
Interpreting CES 2014
 
Ey hot topic_robotics
Ey hot topic_roboticsEy hot topic_robotics
Ey hot topic_robotics
 
Sustaining Traction for your Start-Up
Sustaining Traction for your Start-UpSustaining Traction for your Start-Up
Sustaining Traction for your Start-Up
 
Property key terms you may not know
Property key terms you may not knowProperty key terms you may not know
Property key terms you may not know
 

Similar to Introduction to Chemoinformatics

Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonlinePresentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
Manuel GEA - Bio-Modeling Systems
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Medical imaging Seminar Session 1
Medical imaging Seminar Session 1Medical imaging Seminar Session 1
Medical imaging Seminar Session 1
Space IDEAS Hub
 

Similar to Introduction to Chemoinformatics (20)

Assignment 105B.pptx
Assignment 105B.pptxAssignment 105B.pptx
Assignment 105B.pptx
 
Automated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent LiteratureAutomated Extraction of Reactions from the Patent Literature
Automated Extraction of Reactions from the Patent Literature
 
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
Chromatography: Complete Inorganic Elemental Speciation Analysis Solutions fo...
 
INTRODUCTION TO BIOTECHNOLOGY.pptx
INTRODUCTION TO BIOTECHNOLOGY.pptxINTRODUCTION TO BIOTECHNOLOGY.pptx
INTRODUCTION TO BIOTECHNOLOGY.pptx
 
Mercachem - eelco ebbers 22092011
Mercachem - eelco ebbers 22092011Mercachem - eelco ebbers 22092011
Mercachem - eelco ebbers 22092011
 
Crofton Evolution of Toxicology
Crofton Evolution of ToxicologyCrofton Evolution of Toxicology
Crofton Evolution of Toxicology
 
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC 131024
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC  131024Lab'InSight Toxicological Risk Assessment présentation UNamur NNC  131024
Lab'InSight Toxicological Risk Assessment présentation UNamur NNC 131024
 
How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...How to place your research questions or results into the context of the "Lega...
How to place your research questions or results into the context of the "Lega...
 
MODULE 01(1).pdf
MODULE 01(1).pdfMODULE 01(1).pdf
MODULE 01(1).pdf
 
RAPIDARC- NEW ERA IN RADIOTHERAPY
RAPIDARC- NEW ERA IN RADIOTHERAPYRAPIDARC- NEW ERA IN RADIOTHERAPY
RAPIDARC- NEW ERA IN RADIOTHERAPY
 
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
Progress in Using Big Data in Chemical Toxicity Research at the National Cent...
 
Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonlinePresentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
Presentation-French-clusters-Adebiotech-BMsystems-manuel-gea09052016-vonline
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
 
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
Applying Royal Society of Chemistry cheminformatics skills to support the Pha...
 
New Approach Methods - What is That?
New Approach Methods - What is That?New Approach Methods - What is That?
New Approach Methods - What is That?
 
Toxikon by kevin breesch
Toxikon by kevin breeschToxikon by kevin breesch
Toxikon by kevin breesch
 
PBIO Investor Presentation
PBIO Investor PresentationPBIO Investor Presentation
PBIO Investor Presentation
 
Ερευνητικό Κέντρο Βιοϊατρικών Επιστημών "Αλέξανδρος Φλέμινγκ"
Ερευνητικό Κέντρο Βιοϊατρικών Επιστημών "Αλέξανδρος Φλέμινγκ"Ερευνητικό Κέντρο Βιοϊατρικών Επιστημών "Αλέξανδρος Φλέμινγκ"
Ερευνητικό Κέντρο Βιοϊατρικών Επιστημών "Αλέξανδρος Φλέμινγκ"
 
Medical imaging Seminar Session 1
Medical imaging Seminar Session 1Medical imaging Seminar Session 1
Medical imaging Seminar Session 1
 
Clean Chemicals
Clean Chemicals Clean Chemicals
Clean Chemicals
 

More from SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentation
SSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
SSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
SSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
SSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
SSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
SSA KPI
 

More from SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Recently uploaded

The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 

Introduction to Chemoinformatics

  • 1. Introduction to Chemoinformatics Dr. Igor V. Tetko Helmholtz Zentrum München - German Research Center for Environmental Health (GmbH) Institute of Bioinformatics & Systems Biology (HMGU) Kyiv, 13 August 2010, V Summer School AACIMP 2010
  • 2. Helmholtz Zentrum München statistics •  Part of Helmholtz Network (€3 Milliards, 30000 people) •  Leading center for Environmental Health in Germany •  25 institutes (1789 people, 569 scientists & 280 PhD students)* •  70 contracts with EU •  Disciplines •  Biology 40% •  Chemistry/biochemistry 15% •  Physics 10% •  Medicine 8% •  Chemoinformatics group, Institute for Bioinformatics & Systems Biology   9 peoples, strong expertise in in silico data analysis, machine learning methods, chemoinfomatics software development, data dissemination *January 2010
  • 3.
  • 4. Layout •  Chemoiformatics??? What does it mean? •  Role of chemoinformatics •  in drug discovery •  Chemical Biology, NIH Roadmaps •  REACH, chemical safety •  Definitions of chemoinformatics •  OCHEM •  Overview of the demonstration course
  • 5. Cheminformatics & Bioinformatics keywords in SCOPUS 250 70000 2000 200 2001 60000 2000 2002 2001 50000 2002 150 2003 40000 2003 2004 2004 100 2005 30000 2005 2006 2006 20000 2007 50 2007 2008 10000 2008 2009 0 2009 0 Chemoinfomatics Cheminformatics Bioinfomatics There was no dedicated journal, about 10x lower if journals many authors does not explicitly use world and references are exclued chemoinformatics
  • 6. Dmitrii Ivanovich Mendeleev, 1834-1907 Discoverer of the Periodic Table — An Early “Chemoinformatician”
  • 7.
  • 8. Why Mendeleev? Faced with a large amount of data, with many gaps, Mendeleev: •  Sought patterns where none were obvious, •  Made predictions about properties of unknown chemical substances, based on observed properties of known substances, •  Created a great visualization tool! Why he? Why at that time? “The Law of transformation of Quantity into Quality” by Karl Marx
  • 9. Quantity aspects of chemoinformatics
  • 10. Major challenges of Chemoinformatics Millions of structures Thousand of publications Storage, organization and search of information QSAR / QSPR studies: prediction of properties and activities Chemical biology & drug discovery In silico environmental toxicology, REACH QSAR/QSPR - Quantitative Structure-Activity (Property) Relationship Studies
  • 11. Goal of chemistry: Synthesis of Properties The most fundamental and lasting objective of synthesis is not production of new compounds but production of properties George S. Hammond Norris Award Lecture, 1968
  • 12. Where Chemoinformatics is required? Pharma companies and public organisation •  data collection and handling •  model development QSAR/QSPR, in silico design •  In vitro, in vivo data analysis and interpretation REACH •  > 140,000 compounds •  Development of in silico methods to predict toxicity of chemical compounds Chemical industry •  Design of new properties; chemical synthesis •  Prediction of toxicity of chemicals BEFORE they enter market
  • 13. Pharma R&D: Cost and Productivity issues Compound numbers 1.000.000 Cpds <5000 Cpds < 500 Cpds < 5 Cpds Up to 15 Years: 1 drug Lead Lead Optimisation Lead Profiling Clinics Approval Target Discovery Discovery ADME/T ADME/T In vitro and in vivo property determination: Millions of screens for solubility, stability, absorption, metabolism, transport, reactive products, drug interactions, etc etc Preclinics Costs: > $300m PER COMPOUND to reach approval
  • 14. Pharma: Declining R&D productivity in the drug discovery industry Approved medicine See http://www.frost.com/prod/servlet/market-insight-top.pag?docid=128394740
  • 15. Pharma R&D Cost and productivity: Reasons for compound failure Pharmacokinetics Animal toxicity Adverse effects Lack of efficacy Commercial reasons Miscellaneous TOP four reasons are connected to compound Absorption, Distribution, Metabolism and Excretion, all of which may contribute to lack of efficacy and Toxicity: ADME/T issues Experimental measurements or computational predictions? Chemoinformatics!
  • 16. Public organizations: NIH Roadmaps Dr. Elias Zerhouni Former NIH director (2002-2008) $29.5 billions in 2008 for 27 Institutes Ukraine GDP is ca $127 billions Chemical Biology Belarus GDP is ca $48 billions
  • 17. Public organizations: NIH Roadmaps New pathways to discovery Molecular Libraries Screening Center Network •  understand complex biological (MLSCN) systems •  10 centers were created •  understand their connections •  250 assays to screen >200,000 molecules •  build better "toolbox" for researchers Chemoinfomatics! Research teams of the future •  PubChem database to handle data Re-engineering the clinical research team
  • 18. REACH Registration, Evaluation, Authorisation and Restriction of Chemical substances European Chemicals Agency (ECHA) in Helsinki
  • 19. REACH and Environmental Chemoinformatics > 140,000 chemicals to be registered It is expensive to measure all of them ($200,000 per compound), a lot of animal testing => €24 billions over next 10 years QSAR models can be used to prioritize compounds •  Compound is predicted to be toxic •  Biological testing will be done to prove/ disprove the models •  Compound is predicted to be not toxic •  tests can be avoided, saving money, animals •  but ... only if we are confident in the predictions
  • 20. "One can not embrace the unembraceable.” Possible: 1060 - 10100 molecules theoretically exist Achievable: 1020 - 1024 can be synthesized now by companies (weight of the Moon is ca 1023 kg) Available: 2*107 molecules are on the market Measured: 102 - 105 molecules with ADME/T data Kozma Prutkov Problem: To predict ADME/T properties of just molecules 1080 atoms in the Universe on the market we must extrapolate data from one to 1,000 - 100,000 molecules!
  • 21. Applicability domain: Key points There are NO universal computational models that work well If x is small, on the whole chemical space Sin(x) ≈ x
  • 22. Applicability domain: Key points There are NO universal computational models that work well on the whole chemical space Performance on the test set Performance on a dissimilar set
  • 23. Making predictions with the experimental accuracy: case study of 96,000 Pfizer molecules Non-reliable Reliable 60% of molecules are predicted with average accuracy of 0.3 log units Tetko et al, Chem. & Biodiver., 2009, 6(11), 197-202.
  • 24. Definitions of Chemoinformatics Chemoinformatics is a generic term that encompasses the design, creation, organization, management, retrieval, analysis, dissemination, visualization, and use of chemical information. G. Paris, 1988 Chemoinformatics is the mixing of those information resources to transform data in to information and information in to knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and optimization. F.K. Brown, 1998 Chemoinformatics is the application of informatics methods to solve chemical problems. J. Gasteiger, 2004 Chemoinformatics is a field dealing with molecular objects (graphs, vectors) in multidimensional chemical space. A. Varnek & A. Tropsha, 2007.
  • 26. What is required to create a good (chemoinformatics) model? •  Data is the most essential part! •  Appropriate representation of molecules (choice of descriptors is crucial) •  Good statistical methods
  • 27.
  • 28. >100,000 lines of code in Java
  • 29. Database schema Simplified overview Property Condition Filtering Toxicology, Biology, Temperature, logP = 0.5 Partition coefficient. pH, species, Melting Point = 100 tissue, method °C Tags Data Point Introducer Bill G., Sergey B. Toxicology, Biology, Partition coefficient. Date of modification Informationsystem Structure Article Manipulation Benzene, Urea, ... Editing Garberg, P Organization “In vitro models for …” Working sets<
  • 30.
  • 31. Practical course on using OCHEM and data modeling on-line Tomorrow 10:00-12:00 auditorium 235-6 Advice: make yourself familiar with the on-line tutorials available as http://ochem/tutorials If you have data (SMILES/sdf file and molecular activities) – bring it with you – we can try to build model.
  • 33. Interested in Short-Term stays (3-12 months)? Apply!!! http://www.eco-itn.eu