SlideShare una empresa de Scribd logo
1 de 31
Descargar para leer sin conexión
F T ra n sf o                                                                          F T ra n sf o
          PD                   rm                                                                PD                   rm
      Y                                                                                      Y
 Y




                                                                                        Y
                                er




                                                                                                                       er
ABB




                                                                                       ABB
                          y




                                                                                                                 y
                       bu




                                                                                                              bu
                                    2.0




                                                                                                                           2.0
                     to




                                                                                                            to
                  re




                                                                                                         re
                he




                                                                                                       he
           k




                                                                                                  k
          lic




                                                                                                 lic
      C




                                                                                             C
      w                        om                                                            w                        om
  w




                                                                                         w
          w.                                                                                     w.
               A B B Y Y.c                                                                            A B B Y Y.c




                                          Chemical Similarity


                                               Willie Peijnenburg
                                               RIVM – Laboratory for Ecological Risk
                                               Assessment
F T ra n sf o                                                                                                   F T ra n sf o
          PD                   rm                                                                                         PD                   rm
      Y                                                                                                               Y
 Y




                                                                                                                 Y
                                er




                                                                                                                                                er
ABB




                                                                                                                ABB
                          y




                                                                                                                                          y
                       bu




                                                                                                                                       bu
                                    2.0




                                                                                                                                                    2.0
                     to




                                                                                                                                     to
                  re




                                                                                                                                  re
                he




                                                                                                                                he
           k




                                                                                                                           k
          lic




                                                                                                                          lic
      C




                                                                                                                      C
      w                        om                                                                                     w                        om
  w




                                                                                                                  w
          w.                                                                                                              w.
               A B B Y Y.c                                                                                                     A B B Y Y.c




                                          Similarity : philosophers’ view
                                             exploiting the similarity concept is a sign of immature
                                             science (Quine)
                                             “it is ill defined to say “A is similar to B” and it is only
                                             meaningful to say “A is similar to B with respect to C”




                                             A chemical “A” cannot be similar to a chemical “B”
                                                               in absolute terms
                                             but only with respect to some measurable key feature

                                                                                                            2
F T ra n sf o                                                                                                                   F T ra n sf o
          PD                   rm                                                                                                         PD                   rm
      Y                                                                                                                               Y
 Y




                                                                                                                                 Y
                                er




                                                                                                                                                                er
ABB




                                                                                                                                ABB
                          y




                                                                                                                                                          y
                       bu




                                                                                                                                                       bu
                                    2.0




                                                                                                                                                                    2.0
                     to




                                                                                                                                                     to
                  re




                                                                                                                                                  re
                he




                                                                                                                                                he
           k




                                                                                                                                           k
          lic




                                                                                                                                          lic
      C




                                                                                                                                      C
      w                        om                                                                                                     w                        om
  w




                                                                                                                                  w
          w.                                                                                                                              w.
               A B B Y Y.c                                                                                                                     A B B Y Y.c




                                          Similarity : chemists’ view

                                               Intuitively, based on expert judgment
                                              A chemist would describe “similar” compounds in
                                                 terms of “approximately similar backbone and
                                                 almost the same functional groups”.
                                               Chemists have different views on similarity
                                              Experience, context
                                              Lajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists
                                                   in Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896.




                                                                                                                            3
F T ra n sf o                                                                                            F T ra n sf o
          PD                   rm                                                                                  PD                   rm
      Y                                                                                                        Y
 Y




                                                                                                          Y
                                er




                                                                                                                                         er
ABB




                                                                                                         ABB
                          y




                                                                                                                                   y
                       bu




                                                                                                                                bu
                                    2.0




                                                                                                                                             2.0
                     to




                                                                                                                              to
                  re




                                                                                                                           re
                he




                                                                                                                         he
           k




                                                                                                                    k
          lic




                                                                                                                   lic
      C




                                                                                                               C
      w                        om                                                                              w                        om
  w




                                                                                                           w
          w.                                                                                                       w.
               A B B Y Y.c                                                                                              A B B Y Y.c




                                          Chemical similarity

                                            Computerized similarity assessment needs
                                            unambiguous definitions
                                            Structurally similar molecules have similar biological
                                            activities
                                              The basic tenet of chemical similarity
                                              Long supporting experience
                                              Many exceptions Exceptions are important!
                                            Identification of the most informative representation
                                            of molecular structures Avoiding information loss is
                                            important!
                                            Similarity measures

                                                                                                     4
F T ra n sf o                                                                                    F T ra n sf o
          PD                   rm                                                                          PD                   rm
      Y                                                                                                Y
 Y




                                                                                                  Y
                                er




                                                                                                                                 er
ABB




                                                                                                 ABB
                          y




                                                                                                                           y
                       bu




                                                                                                                        bu
                                    2.0




                                                                                                                                     2.0
                     to




                                                                                                                      to
                  re




                                                                                                                   re
                he




                                                                                                                 he
           k




                                                                                                            k
          lic




                                                                                                           lic
      C




                                                                                                       C
      w                        om                                                                      w                        om
  w




                                                                                                   w
          w.                                                                                               w.
               A B B Y Y.c                                                                                      A B B Y Y.c




                                          Chemical similarity quantified
                                            Numerical representation of chemical structure
                                              Structural similarity
                                              Descriptor –based similarity
                                              3D similarity
                                              Field –based
                                              Spectral
                                              Quantum mechanics
                                              More…
                                            Comparison between numerical representations
                                              Distance-like
                                              Association,
                                              Correlation


                                                                                             5
F T ra n sf o                                                                                                           F T ra n sf o
          PD                   rm                                                                                                 PD                   rm
      Y                                                                                                                       Y
 Y




                                                                                                                         Y
                                er




                                                                                                                                                        er
ABB




                                                                                                                        ABB
                          y




                                                                                                                                                  y
                       bu




                                                                                                                                               bu
                                    2.0




                                                                                                                                                            2.0
                     to




                                                                                                                                             to
                  re




                                                                                                                                          re
                he




                                                                                                                                        he
           k




                                                                                                                                   k
          lic




                                                                                                                                  lic
      C




                                                                                                                              C
      w                        om                                                                                             w                        om
  w




                                                                                                                          w
          w.                                                                                                                      w.
               A B B Y Y.c                                                                                                             A B B Y Y.c




                                          Structural similarity

                                            Substructure searching
                                            Maximum Common Substructure
                                            Fragment approach
                                              Atom, bond or ring counts, degree of connectivity
                                              Atom-centred, bond-centred, ring-centred fragments
                                              Fingerprints, molecular holograms, atom environments
                                            Topological descriptors
                                              Hosoya’ Z, Wiener number, Randic index, indices on distance
                                              matrices of graph (Bonchev & Trinajstic), bonding connectivity
                                              indices (Basak), Balaban J indices, etc.
                                              Initially designed to account for branching, linearity, presence of
                                              cycles and other topological features
                                              Attempts to include 3D information (e.g. distance matrices
                                              instead of adjacency matrices)



                                                                                                                    6
F T ra n sf o                                                                                                                                                       F T ra n sf o
          PD                   rm                                                                                                                                             PD                   rm
      Y                                                                                                                                                                   Y
 Y




                                                                                                                                                                     Y
                                er




                                                                                                                                                                                                    er
ABB




                                                                                                                                                                    ABB
                          y




                                                                                                                                                                                              y
                       bu




                                                                                                                                                                                           bu
                                    2.0




                                                                                                                                                                                                        2.0
                     to




                                                                                                                                                                                         to
                  re




                                                                                                                                                                                      re
                he




                                                                                                                                                                                    he
           k




                                                                                                                                                                               k
          lic




                                                                                                                                                                              lic
      C




                                                                                                                                                                          C
      w                        om
                                                                                                                                  So: A single group makes                w                        om
  w




                                                                                                                                                                      w
          w.                                                                                                                                                                  w.
               A B B Y Y.c                                                                                                                                                         A B B Y Y.c




                                             Structural similarity                                                                   difference …but…

                                                                                                                               Isosteric replacements of
                                                                                          Oral LD50 for male                   groups
                                                                                          rats = 2.5g/kg
                                                                                          Dermal LD50 for male
                                                                                          rats = 3.54g/kg                      •Substituents:
                                          3-(2-chloro-4-(trifluoromethyl)phenoxy-)
                                                      phenyl acetate,                     Not irritating to eyes                   •F, Cl, Br, I, CF3,NO2
                                                    CAS# 50594-77-9
                                                                                          of rabbits                               •Methyl,Ethyl, Isoprpyl,
                                                                                          Slightly irritating to
                                                                                          skin of rabbits                          Cyclopropyl, t-Butyl,-OH,-
                                                                                                                                   SH,-NH2,-OMe,-N(Me)2
                                                                                          Not mutagenic in
                                                                                          Salmonella strains                   •Atoms and groups in rings:
                                                                                          Higher potential                         •-CH=,-N=
                                                                                          binding affinity to
                                                                                          the estrogen                             •-CH2-,-NH-,-O-,-S-
                                                                                          receptor than the
                                                                                          nitrophenyl acetate                  •More …
                                                                                          Higher potential to
                                                                                          cause cancer than
                                          5-(2-chloro-4-(trifluoromethyl)phenoxy)-2-
                                                     nitrophenyl acetate,                 the phenyl acetate                   Depends on the endpoint!
                                                      CAS# 50594-44-0
                                                     Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening,
                                                                                                                               (e.g. lipophilicity, receptor
                                                             Risk Assessment and Web Applications, SETAC Press
                                                                                                                               binding)

                                                                                                                                                                7
F T ra n sf o                                                                                                                                                F T ra n sf o
          PD                   rm                                                                                                                                      PD                   rm
      Y                                                                                                                                                            Y
 Y




                                                                                                                                                              Y
                                er




                                                                                                                                                                                             er
ABB




                                                                                                                                                             ABB
                          y




                                                                                                                                                                                       y
                       bu




                                                                                                                                                                                    bu
                                    2.0




                                                                                                                                                                                                 2.0
                     to




                                                                                                                                                                                  to
                  re




                                                                                                                                                                               re
                he




                                                                                                                                                                             he
           k




                                                                                                                                                                        k
          lic




                                                                                                                                                                       lic
      C




                                                                                                                                                                   C
      w                        om                                                                                                                                  w                        om
  w




                                                                                                                                                               w
          w.                                                                                                                                                           w.
               A B B Y Y.c                                                                                                                                                  A B B Y Y.c




                                          Structural similarity

                                            Rosenkranz H.S., Cunningham A.R.
                                            (2001) Chemical Categories for Health
                                            Hazard Identification: A feasibility Study,
                                            Regulatory Toxicology and
                                            Pharmacology 33, 313-318.
                                            Examined the reliability of using
                                            chemical categories to classify HPV
                                            chemicals as toxic or nontoxic
                                            Found: “most often only a proportion
                                            of chemicals in a category were toxic”
                                            Conclusion: "traditional organic
                                            chemical categories do not
                                            encompass groups of chemical that
                                            are predominately either toxic or
                                            nontoxic across a number of
                                            toxicological endpoints or even for
                                            specific toxic activities”
                                                                                            The bold portion of the chemical in the Category column
                                                                                               defined the fragment used to query each data set.
                                                                                             Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev,
                                                                                                       developmental toxicity;CA, rodent
                                                                                           carcinogenesis; Mnt, in vivo induction of micronuclei; Sal,
                                                                                                                   Salmonella
                                                                                          mutagenesis; MLA, mutagenesis in cultured mouse lymphoma
                                                                                                                      cells.
                                                                                                                                                         8
F T ra n sf o                                                                                                             F T ra n sf o
          PD                   rm                                                                                                   PD                   rm
      Y                                                                                                                         Y
 Y




                                                                                                                           Y
                                er




                                                                                                                                                          er
ABB




                                                                                                                          ABB
                          y




                                                                                                                                                    y
                       bu




                                                                                                                                                 bu
                                    2.0




                                                                                                                                                              2.0
                     to




                                                                                                                                               to
                  re




                                                                                                                                            re
                he




                                                                                                                                          he
           k




                                                                                                                                     k
          lic




                                                                                                                                    lic
      C




                                                                                                                                C
      w                        om                                                                                               w                        om
  w




                                                                                                                            w
          w.                                                                                                                        w.
               A B B Y Y.c                                                                                                               A B B Y Y.c




                                          3D Similarity

                                            Distance-based and angle-based descriptors (e.g. inter-atomic distance)
                                            Field similarity (not exhaustive list)
                                               Comparative Molecular Field Analysis (CoMFA), CoMSIA
                                               Electrostatic potential
                                               Shape
                                               Electron density
                                               Test probe
                                               Any grid-based structural property
                                            Molecular multi-pole moments (CoMMA)
                                            Shape descriptors (not exhaustive list)
                                               van der Waals volume and surface (reflect the size of substituents)
                                               Taft steric parameter
                                               STERIMOL
                                               Molecular Shape Analysis
                                               4D QSAR
                                               WHIM descriptors
                                            Receptor binding


                                                                                                                      9
F T ra n sf o                                                                                                                      F T ra n sf o
          PD                   rm                                                                                                            PD                   rm
      Y                                                                                                                                  Y
 Y




                                                                                                                                    Y
                                er




                                                                                                                                                                   er
ABB




                                                                                                                                   ABB
                          y




                                                                                                                                                             y
                       bu




                                                                                                                                                          bu
                                    2.0




                                                                                                                                                                       2.0
                     to




                                                                                                                                                        to
                  re




                                                                                                                                                     re
                he




                                                                                                                                                   he
           k




                                                                                                                                              k
          lic




                                                                                                                                             lic
      C




                                                                                                                                         C
      w                        om                                                                                                        w                        om
  w




                                                                                                                                     w
          w.                                                                                                                                 w.
               A B B Y Y.c                                                                                                                        A B B Y Y.c




                                          Structurally similar compounds can have very
                                          different 3D properties




                                                                        Kubinyi, H., Chemical Similarity and Biological activity



                                                                                                                              10
F T ra n sf o                                                                           F T ra n sf o
          PD                   rm                                                                 PD                   rm
      Y                                                                                       Y
 Y




                                                                                         Y
                                er




                                                                                                                        er
ABB




                                                                                        ABB
                          y




                                                                                                                  y
                       bu




                                                                                                               bu
                                    2.0




                                                                                                                            2.0
                     to




                                                                                                             to
                  re




                                                                                                          re
                he




                                                                                                        he
           k




                                                                                                   k
          lic




                                                                                                  lic
      C




                                                                                              C
      w                        om                                                             w                        om
  w




                                                                                          w
          w.                                                                                      w.
               A B B Y Y.c                                                                             A B B Y Y.c




                                          Physicochemical properties

                                           Molecular weight
                                           Octanol - water partition coefficient
                                           Total energy
                                           Heat of formation
                                           Ionization potential
                                           Molar refractivity
                                           More…


                                                                                   11
F T ra n sf o                                                                                               F T ra n sf o
          PD                   rm                                                                                     PD                   rm
      Y                                                                                                           Y
 Y




                                                                                                             Y
                                er




                                                                                                                                            er
ABB




                                                                                                            ABB
                          y




                                                                                                                                      y
                       bu




                                                                                                                                   bu
                                    2.0




                                                                                                                                                2.0
                     to




                                                                                                                                 to
                  re




                                                                                                                              re
                he




                                                                                                                            he
           k




                                                                                                                       k
          lic




                                                                                                                      lic
      C




                                                                                                                  C
      w                        om                                                                                 w                        om
  w




                                                                                                              w
          w.                                                                                                          w.
               A B B Y Y.c                                                                                                 A B B Y Y.c




                                          Quantum chemistry approaches

                                           The wave function and the density function contain
                                           all the information of a system.
                                             All the information about any molecule could be extracted
                                             from the electron density. Bond creation and bond breaking
                                             in chemical reactions, as well as the shape changes in
                                             conformational processes, are expressed by changes in
                                             the electronic density of molecules. The electronic density
                                             fully determines the nuclear distribution, hence the
                                             electronic density and its changes account for all the
                                             relevant chemical information about the molecule.
                                             In principle, quantum-chemical theory should be able
                                                           quantum-
                                             to provide precise quantitative descriptions of
                                             molecular structures and their chemical properties.




                                                                                                       12
F T ra n sf o                                                                                                  F T ra n sf o
          PD                   rm                                                                                        PD                   rm
      Y                                                                                                              Y
 Y




                                                                                                                Y
                                er




                                                                                                                                               er
ABB




                                                                                                               ABB
                          y




                                                                                                                                         y
                       bu




                                                                                                                                      bu
                                    2.0




                                                                                                                                                   2.0
                     to




                                                                                                                                    to
                  re




                                                                                                                                 re
                he




                                                                                                                               he
           k




                                                                                                                          k
          lic




                                                                                                                         lic
      C




                                                                                                                     C
      w                        om                                                                                    w                        om
  w




                                                                                                                 w
          w.                                                                                                             w.
               A B B Y Y.c                                                                                                    A B B Y Y.c




                                          Quantum chemistry approaches

                                           Quantum chemical descriptors - characterize the
                                           reactivity, shape and binding properties of a
                                           complete molecule or molecular fragments and
                                           substituents:
                                           HOMO and LUMO energies, total energy, number of filled
                                             orbitals, standard deviation of partial atomic charges and
                                             electron densities, dipole moment, partial atomic charges
                                           Approaches from The Theory of Atom in Molecules
                                           – BCP space, TAE/RECON, MEDLA, QShAR
                                           (additive density fragments)
                                           Quantum chemistry calculations depend on several
                                           levels of approximation
                                           Computationally intensive

                                                                                                          13
F T ra n sf o                                                                                F T ra n sf o
          PD                   rm                                                                      PD                   rm
      Y                                                                                            Y
 Y




                                                                                              Y
                                er




                                                                                                                             er
ABB




                                                                                             ABB
                          y




                                                                                                                       y
                       bu




                                                                                                                    bu
                                    2.0




                                                                                                                                 2.0
                     to




                                                                                                                  to
                  re




                                                                                                               re
                he




                                                                                                             he
           k




                                                                                                        k
          lic




                                                                                                       lic
      C




                                                                                                   C
      w                        om                                                                  w                        om
  w




                                                                                               w
          w.                                                                                           w.
               A B B Y Y.c                                                                                  A B B Y Y.c




                                          Reactivity

                                            Similarity between reactions
                                            Similarity of chemical structures assessed by
                                            generalized reaction types and by gross
                                            structural features. Two structures are
                                            considered similar if they can be converted
                                            by reactions belonging to the same
                                            predefined groups (for example oxidation or
                                            substitution reactions).


                                                                                        14
F T ra n sf o                                                                                                     F T ra n sf o
          PD                   rm                                                                                           PD                   rm
      Y                                                                                                                 Y
 Y




                                                                                                                   Y
                                er




                                                                                                                                                  er
ABB




                                                                                                                  ABB
                          y




                                                                                                                                            y
                       bu




                                                                                                                                         bu
                                    2.0




                                                                                                                                                      2.0
                     to




                                                                                                                                       to
                  re




                                                                                                                                    re
                he




                                                                                                                                  he
           k




                                                                                                                             k
          lic




                                                                                                                            lic
      C




                                                                                                                        C
      w                        om                                                                                       w                        om
  w




                                                                                                                    w
          w.                                                                                                                w.
               A B B Y Y.c                                                                                                       A B B Y Y.c




                                          Similarity indices

                                            Association, correlation, distance coefficients
                                            Most popular :
                                              Tanimoto distance (fingerprints)      TAB
                                                                                                  N AB

                                              Euclidean distance (descriptors)               NA   N B N AB

                                              Carbo index (fields)        C AB
                                                                                  Z AB
                                                                                 Z AA Z BB


                                            Essentially a classification problem has to be solved
                                            (decide if a query compound is closer to one or
                                            another set of compounds)
                                              Many methods available (Discriminant Analysis, Neural
                                              networks, SVM, Bayesian classification, etc.)
                                              Statistical assumptions and statistical error is involved


                                                                                                             15
F T ra n sf o                                                                                                                                          F T ra n sf o
          PD                   rm                                                                                                                                PD                   rm
      Y                                                                                                                                                      Y
 Y




                                                                                                                                                        Y
                                er




                                                                                                                                                                                       er
ABB




                                                                                                                                                       ABB
                          y




                                                                                                                                                                                 y
                       bu




                                                                                                                                                                              bu
                                    2.0




                                                                                                                                                                                           2.0
                     to




                                                                                                                                                                            to
                  re




                                                                                                                                                                         re
                he




                                                                                                                                                                       he
           k




                                                                                                                                                                  k
          lic




                                                                                                                                                                 lic
      C




                                                                                                                                                             C
      w                        om                                                                                                                            w                        om
  w




                                                                                                                                                         w
          w.                                                                                                                                                     w.
               A B B Y Y.c                                                                                                                                            A B B Y Y.c




                                          Similarity indices




                                                          Association indices                                         Correlation indices

                                          J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter-
                                          Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High
                                          Throughput Screening,5, 155-166 155

                                                                                                                                                  16
F T ra n sf o                                                                                                                    F T ra n sf o
          PD                   rm                                                                                                          PD                   rm
      Y                                                                                                                                Y
 Y




                                                                                                                                  Y
                                er




                                                                                                                                                                 er
ABB




                                                                                                                                 ABB
                          y




                                                                                                                                                           y
                       bu




                                                                                                                                                        bu
                                    2.0




                                                                                                                                                                     2.0
                     to




                                                                                                                                                      to
                  re




                                                                                                                                                   re
                he




                                                                                                                                                 he
           k




                                                                                                                                            k
          lic




                                                                                                                                           lic
      C




                                                                                                                                       C
      w                        om                                                                                                      w                        om
  w




                                                                                                                                   w
          w.                                                                                                                               w.
               A B B Y Y.c                                                                                                                      A B B Y Y.c




                                          Fingerprint similarity

                                                                                             Information loss – fragments
                                                                                             presence and absence instead
                                                                                             of counts
                                                                                             Bit string saturation – within a
                                                                                             large database almost all bits
                                                                                             are set
                                                                                             Can give nonintuitive results
                                                                                             The average similarity appears
                                                                                             to increase with the
                                                                                             complexity of the query
                                           The distribution of Tanimoto values               compound
                                            found in database searches with a                Larger queries are more
                                                range of query molecules                     discriminating (flatter curve,
                                                                                             Tanimoto values spread wider)
                                          Flower D., On the Properties of             Bit    Smaller queries have sharp
                                          String-Based Measures of Chemical Similarity, J.   peak, unable to distinguish
                                          Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998      between molecules

                                                                                                                            17
F T ra n sf o                                                                                                              F T ra n sf o
          PD                   rm                                                                                                    PD                   rm
      Y                                                                                                                          Y
 Y




                                                                                                                            Y
                                er




                                                                                                                                                           er
ABB




                                                                                                                           ABB
                          y




                                                                                                                                                     y
                       bu




                                                                                                                                                  bu
                                    2.0




                                                                                                                                                               2.0
                     to




                                                                                                                                                to
                  re




                                                                                                                                             re
                he




                                                                                                                                           he
           k




                                                                                                                                      k
          lic




                                                                                                                                     lic
      C




                                                                                                                                 C
      w                        om                                                                                                w                        om
  w




                                                                                                                             w
          w.                                                                                                                         w.
               A B B Y Y.c                                                                                                                A B B Y Y.c




                                          Distance indices
                                                                                 Euclidean distance



                                                                                 City-block distance



                                                                                 Mahalanobis distance

                                                                                 Distances obey triangle inequality


                                          Equidistant contours = Points on the
                                          equal distance from the query point


                                                                                                                      18
F T ra n sf o                                                                                                                   F T ra n sf o
          PD                   rm                                                                                                         PD                   rm
      Y                                                                                                                               Y
 Y




                                                                                                                                 Y
                                er




                                                                                                                                                                er
ABB




                                                                                                                                ABB
                          y




                                                                                                                                                          y
                       bu




                                                                                                                                                       bu
                                    2.0




                                                                                                                                                                    2.0
                     to




                                                                                                                                                     to
                  re




                                                                                                                                                  re
                he




                                                                                                                                                he
           k




                                                                                                                                           k
          lic




                                                                                                                                          lic
      C




                                                                                                                                      C
      w                        om                                                                                                     w                        om
  w




                                                                                                                                  w
          w.                                                                                                                              w.
               A B B Y Y.c                                                                                                                     A B B Y Y.c




                                          Similarity in descriptor space




                                           Comparison between a point and groups of points is a classification problem.
                                             Euclidean distance performs very well if groups are separable (left). Other
                                                            classification methods help in other cases.


                                                                                                                           19
F T ra n sf o                                                                    F T ra n sf o
          PD                   rm                                                          PD                   rm
      Y                                                                                Y
 Y




                                                                                  Y
                                er




                                                                                                                 er
ABB




                                                                                 ABB
                          y




                                                                                                           y
                       bu




                                                                                                        bu
                                    2.0




                                                                                                                     2.0
                     to




                                                                                                      to
                  re




                                                                                                   re
                he




                                                                                                 he
           k




                                                                                            k
          lic




                                                                                           lic
      C




                                                                                       C
      w                        om                                                      w                        om
  w




                                                                                   w
          w.                                                                               w.
               A B B Y Y.c                                                                      A B B Y Y.c




                                          What do we measure
                                           We compare numerical
                                           representations of
                                           chemical compounds
                                             The numerical
                                             representation is not unique
                                             The numerical
                                             representation includes only
                                             part of all the information
                                             about the compound
                                             A distance measure reflects
                                             “closeness” only if the data
                                             holds specific assumptions



                                                                            20
F T ra n sf o                                                                                                       F T ra n sf o
          PD                   rm                                                                                             PD                   rm
      Y                                                                                                                   Y
 Y




                                                                                                                     Y
                                er




                                                                                                                                                    er
ABB




                                                                                                                    ABB
                          y




                                                                                                                                              y
                       bu




                                                                                                                                           bu
                                    2.0




                                                                                                                                                        2.0
                     to




                                                                                                                                         to
                  re




                                                                                                                                      re
                he




                                                                                                                                    he
           k




                                                                                                                               k
          lic




                                                                                                                              lic
      C




                                                                                                                          C
      w                        om                                                                                         w                        om
  w




                                                                                                                      w
          w.                                                                                                                  w.
               A B B Y Y.c                                                                                                         A B B Y Y.c




                                          Example: Y. Martin et al ( 2002)
                                          Do structurally similar molecules have similar
                                          biological activity ?
                                               Set of 1645 chemicals with IC50s for monoamine
                                               oxidase inhibition
                                               Daylight fingerprints 1024 bits long ( 0-7 bonds)
                                               When using Tanimoto coefficient with a cut off value
                                               of 0.85 only 30 % of actives were detected

                                          Cutoff values   % of actives detected         % False positives

                                                                                  J. Med. Chem. 2002,45,4350-4358

                                                                                                               21
F T ra n sf o                                                                                                        F T ra n sf o
          PD                   rm                                                                                              PD                   rm
      Y                                                                                                                    Y
 Y




                                                                                                                      Y
                                er




                                                                                                                                                     er
ABB




                                                                                                                     ABB
                          y




                                                                                                                                               y
                       bu




                                                                                                                                            bu
                                    2.0




                                                                                                                                                         2.0
                     to




                                                                                                                                          to
                  re




                                                                                                                                       re
                he




                                                                                                                                     he
           k




                                                                                                                                k
          lic




                                                                                                                               lic
      C




                                                                                                                           C
                                          Chemical similarity caveats
      w                        om                                                                                          w                        om
  w




                                                                                                                       w
          w.                                                                                                                   w.
               A B B Y Y.c                                                                                                          A B B Y Y.c




                                            The similarity computation may not correctly
                                            represent the intuitive similarity between two
                                            chemical structures
                                              The properties of a chemical might not be implicit in its
                                              molecular structure
                                              Molecular structure might not be fully measured and
                                              represented by a set of numbers (information loss)
                                              Comparison by similarity indices may be counterintuitive
                                            Intuitively similar chemical structures may not have
                                            similar biological activity
                                              Bioisosteric compounds
                                              Structurally similar molecules may have different mechanisms of
                                              action

                                                                                                                22
F T ra n sf o                                                                                                   F T ra n sf o
          PD                   rm                                                                                         PD                   rm
      Y                                                                                                               Y
 Y




                                                                                                                 Y
                                er




                                                                                                                                                er
ABB




                                                                                                                ABB
                          y




                                                                                                                                          y
                       bu




                                                                                                                                       bu
                                    2.0




                                                                                                                                                    2.0
                     to




                                                                                                                                     to
                  re




                                                                                                                                  re
                he




                                                                                                                                he
           k




                                                                                                                           k
          lic




                                                                                                                          lic
      C




                                                                                                                      C
      w                        om                                                                                     w                        om
  w




                                                                                                                  w
          w.                                                                                                              w.
               A B B Y Y.c                                                                                                     A B B Y Y.c




                                          Similarity and Activity “Neighbourhood principle”

                                            Proximity with respect to       Similar activity
                                            descriptors does not necessary        values
                                            mean proximity with respect to
                                            the activity
                                            Depends on the relationship
                                            between descriptor and activity
                                               True if a continuous &
                                               monotonous (e.g. linear,…)
                                               relationship holds between
                                               descriptors and activity
                                               The linear relationship is only a
                                               special case, given the
                                               complexity of biochemical
                                               interactions. Its use should be
                                               justified in every specific case     Neighbourhood in the
                                               and/or used only locally                 descriptor space


                                                                                                           23
F T ra n sf o                                                                                                                                                      F T ra n sf o
          PD                   rm                                                                                                                                            PD                   rm
      Y                                                                                                                                                                  Y
 Y




                                                                                                                                                                    Y
                                er




                                                                                                                                                                                                   er
ABB




                                                                                                                                                                   ABB
                          y




                                                                                                                                                                                             y
                       bu




                                                                                                                                                                                          bu
                                    2.0




                                                                                                                                                                                                       2.0
                     to




                                                                                                                                                                                        to
                  re




                                                                                                                                                                                     re
                he




                                                                                                                                                                                   he
           k




                                                                                                                                                                              k
          lic




                                                                                                                                                                             lic
      C




                                                                                                                                                                         C
      w                        om                                                                                                                                        w                        om
  w




                                                                                                                                                                     w
          w.                                                                                                                                                                 w.
               A B B Y Y.c                                                                                                                                                        A B B Y Y.c




                                          Similarity vs. Activity
                                            Black square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98)
                                            Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included in
                                            original Debnath data set




                                          logP, Ehomo, Elumo




                                             logP, Ehomo, Elumo               Similar compounds, Relatively small data set


                                                                                                                                                              24
Chemical Similarity
Chemical Similarity
Chemical Similarity
Chemical Similarity
Chemical Similarity
Chemical Similarity
Chemical Similarity

Más contenido relacionado

La actualidad más candente

Eye's Delight (Qurratul-^Ayn)
Eye's Delight (Qurratul-^Ayn)Eye's Delight (Qurratul-^Ayn)
Eye's Delight (Qurratul-^Ayn)Da3wah
 
Hands Tool
Hands Tool Hands Tool
Hands Tool mokhtar
 
Search and Structure Design of Physiologically Active Compounds
Search and Structure Design of Physiologically Active CompoundsSearch and Structure Design of Physiologically Active Compounds
Search and Structure Design of Physiologically Active CompoundsSSA KPI
 
Unit0 J3103
Unit0 J3103Unit0 J3103
Unit0 J3103mokhtar
 
Practicing English
Practicing EnglishPracticing English
Practicing Englishguesteec4f8b
 
Unit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc WeldingUnit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc Weldingmokhtar
 
Unit6 Computer Numerical Control
Unit6 Computer Numerical ControlUnit6 Computer Numerical Control
Unit6 Computer Numerical Controlmokhtar
 
Unit1 Screw Thread
Unit1 Screw ThreadUnit1 Screw Thread
Unit1 Screw Threadmokhtar
 
Unit2 Gear
Unit2 GearUnit2 Gear
Unit2 Gearmokhtar
 
IBM Lotus Notes&Domino today
IBM Lotus Notes&Domino todayIBM Lotus Notes&Domino today
IBM Lotus Notes&Domino todayDiana Emely
 
Unit4 Surface Texture
Unit4 Surface TextureUnit4 Surface Texture
Unit4 Surface Texturemokhtar
 
Unit3 Gear
Unit3 GearUnit3 Gear
Unit3 Gearmokhtar
 
Unit5 Power Press Machine
Unit5 Power Press MachineUnit5 Power Press Machine
Unit5 Power Press Machinemokhtar
 

La actualidad más candente (14)

Eye's Delight (Qurratul-^Ayn)
Eye's Delight (Qurratul-^Ayn)Eye's Delight (Qurratul-^Ayn)
Eye's Delight (Qurratul-^Ayn)
 
Hands Tool
Hands Tool Hands Tool
Hands Tool
 
Search and Structure Design of Physiologically Active Compounds
Search and Structure Design of Physiologically Active CompoundsSearch and Structure Design of Physiologically Active Compounds
Search and Structure Design of Physiologically Active Compounds
 
Unit0 J3103
Unit0 J3103Unit0 J3103
Unit0 J3103
 
Practicing English
Practicing EnglishPracticing English
Practicing English
 
Budget2009
Budget2009Budget2009
Budget2009
 
Unit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc WeldingUnit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc Welding
 
Unit6 Computer Numerical Control
Unit6 Computer Numerical ControlUnit6 Computer Numerical Control
Unit6 Computer Numerical Control
 
Unit1 Screw Thread
Unit1 Screw ThreadUnit1 Screw Thread
Unit1 Screw Thread
 
Unit2 Gear
Unit2 GearUnit2 Gear
Unit2 Gear
 
IBM Lotus Notes&Domino today
IBM Lotus Notes&Domino todayIBM Lotus Notes&Domino today
IBM Lotus Notes&Domino today
 
Unit4 Surface Texture
Unit4 Surface TextureUnit4 Surface Texture
Unit4 Surface Texture
 
Unit3 Gear
Unit3 GearUnit3 Gear
Unit3 Gear
 
Unit5 Power Press Machine
Unit5 Power Press MachineUnit5 Power Press Machine
Unit5 Power Press Machine
 

Similar a Chemical Similarity

Unit1 Screw Thread
Unit1 Screw ThreadUnit1 Screw Thread
Unit1 Screw Threadguestb9b7f4
 
Unit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc WeldingUnit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc Weldingguestb9b7f4
 
Unit5 Power Press Machine
Unit5 Power Press MachineUnit5 Power Press Machine
Unit5 Power Press Machineguestb9b7f4
 
Unit4 Surface Texture
Unit4 Surface TextureUnit4 Surface Texture
Unit4 Surface Textureguestb9b7f4
 
Unit6 Computer Numerical Control
Unit6 Computer Numerical ControlUnit6 Computer Numerical Control
Unit6 Computer Numerical Controlguestb9b7f4
 
A Testers Role On Agile Projects - Janet Gregory
A Testers Role On Agile Projects - Janet GregoryA Testers Role On Agile Projects - Janet Gregory
A Testers Role On Agile Projects - Janet GregoryAGILEMinds
 
Egkekrimena sxedia comenius 2012
Egkekrimena sxedia comenius 2012Egkekrimena sxedia comenius 2012
Egkekrimena sxedia comenius 2012sfikasp
 
55种用google找乐子的方法
55种用google找乐子的方法55种用google找乐子的方法
55种用google找乐子的方法macyang1216
 
Testing Techniques For Agile Testers - Janet Gregory
Testing Techniques For Agile Testers - Janet GregoryTesting Techniques For Agile Testers - Janet Gregory
Testing Techniques For Agile Testers - Janet GregoryAGILEMinds
 
Islamic archi.ppt [compatibility mode]
Islamic archi.ppt [compatibility mode]Islamic archi.ppt [compatibility mode]
Islamic archi.ppt [compatibility mode]Karan Saharan
 
Domagoj Margetic
Domagoj MargeticDomagoj Margetic
Domagoj MargeticEmil Čić
 
Social Media for Social Change | SVP Tokyo 2/21/10
Social Media for Social Change | SVP Tokyo 2/21/10Social Media for Social Change | SVP Tokyo 2/21/10
Social Media for Social Change | SVP Tokyo 2/21/10SocialCompany, Inc.
 

Similar a Chemical Similarity (17)

Unit0 J3103
Unit0 J3103Unit0 J3103
Unit0 J3103
 
Unit2 Gear
Unit2 GearUnit2 Gear
Unit2 Gear
 
Unit1 Screw Thread
Unit1 Screw ThreadUnit1 Screw Thread
Unit1 Screw Thread
 
Unit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc WeldingUnit7 Shielded Gas Arc Welding
Unit7 Shielded Gas Arc Welding
 
Unit3 Gear
Unit3 GearUnit3 Gear
Unit3 Gear
 
Unit5 Power Press Machine
Unit5 Power Press MachineUnit5 Power Press Machine
Unit5 Power Press Machine
 
Unit4 Surface Texture
Unit4 Surface TextureUnit4 Surface Texture
Unit4 Surface Texture
 
Unit6 Computer Numerical Control
Unit6 Computer Numerical ControlUnit6 Computer Numerical Control
Unit6 Computer Numerical Control
 
13
1313
13
 
Pri of phs 9th
Pri of phs 9thPri of phs 9th
Pri of phs 9th
 
A Testers Role On Agile Projects - Janet Gregory
A Testers Role On Agile Projects - Janet GregoryA Testers Role On Agile Projects - Janet Gregory
A Testers Role On Agile Projects - Janet Gregory
 
Egkekrimena sxedia comenius 2012
Egkekrimena sxedia comenius 2012Egkekrimena sxedia comenius 2012
Egkekrimena sxedia comenius 2012
 
55种用google找乐子的方法
55种用google找乐子的方法55种用google找乐子的方法
55种用google找乐子的方法
 
Testing Techniques For Agile Testers - Janet Gregory
Testing Techniques For Agile Testers - Janet GregoryTesting Techniques For Agile Testers - Janet Gregory
Testing Techniques For Agile Testers - Janet Gregory
 
Islamic archi.ppt [compatibility mode]
Islamic archi.ppt [compatibility mode]Islamic archi.ppt [compatibility mode]
Islamic archi.ppt [compatibility mode]
 
Domagoj Margetic
Domagoj MargeticDomagoj Margetic
Domagoj Margetic
 
Social Media for Social Change | SVP Tokyo 2/21/10
Social Media for Social Change | SVP Tokyo 2/21/10Social Media for Social Change | SVP Tokyo 2/21/10
Social Media for Social Change | SVP Tokyo 2/21/10
 

Más de SSA KPI

Germany presentation
Germany presentationGermany presentation
Germany presentationSSA KPI
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energySSA KPI
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainabilitySSA KPI
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentSSA KPI
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering educationSSA KPI
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginersSSA KPI
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011SSA KPI
 
Talking with money
Talking with moneyTalking with money
Talking with moneySSA KPI
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investmentSSA KPI
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesSSA KPI
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice gamesSSA KPI
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security CostsSSA KPI
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsSSA KPI
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5SSA KPI
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4SSA KPI
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3SSA KPI
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2SSA KPI
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1SSA KPI
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biologySSA KPI
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsSSA KPI
 

Más de SSA KPI (20)

Germany presentation
Germany presentationGermany presentation
Germany presentation
 
Grand challenges in energy
Grand challenges in energyGrand challenges in energy
Grand challenges in energy
 
Engineering role in sustainability
Engineering role in sustainabilityEngineering role in sustainability
Engineering role in sustainability
 
Consensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable developmentConsensus and interaction on a long term strategy for sustainable development
Consensus and interaction on a long term strategy for sustainable development
 
Competences in sustainability in engineering education
Competences in sustainability in engineering educationCompetences in sustainability in engineering education
Competences in sustainability in engineering education
 
Introducatio SD for enginers
Introducatio SD for enginersIntroducatio SD for enginers
Introducatio SD for enginers
 
DAAD-10.11.2011
DAAD-10.11.2011DAAD-10.11.2011
DAAD-10.11.2011
 
Talking with money
Talking with moneyTalking with money
Talking with money
 
'Green' startup investment
'Green' startup investment'Green' startup investment
'Green' startup investment
 
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea wavesFrom Huygens odd sympathy to the energy Huygens' extraction from the sea waves
From Huygens odd sympathy to the energy Huygens' extraction from the sea waves
 
Dynamics of dice games
Dynamics of dice gamesDynamics of dice games
Dynamics of dice games
 
Energy Security Costs
Energy Security CostsEnergy Security Costs
Energy Security Costs
 
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environmentsNaturally Occurring Radioactivity (NOR) in natural and anthropic environments
Naturally Occurring Radioactivity (NOR) in natural and anthropic environments
 
Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5Advanced energy technology for sustainable development. Part 5
Advanced energy technology for sustainable development. Part 5
 
Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4Advanced energy technology for sustainable development. Part 4
Advanced energy technology for sustainable development. Part 4
 
Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3Advanced energy technology for sustainable development. Part 3
Advanced energy technology for sustainable development. Part 3
 
Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2Advanced energy technology for sustainable development. Part 2
Advanced energy technology for sustainable development. Part 2
 
Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1Advanced energy technology for sustainable development. Part 1
Advanced energy technology for sustainable development. Part 1
 
Fluorescent proteins in current biology
Fluorescent proteins in current biologyFluorescent proteins in current biology
Fluorescent proteins in current biology
 
Neurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functionsNeurotransmitter systems of the brain and their functions
Neurotransmitter systems of the brain and their functions
 

Último

Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxDhatriParmar
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationdeepaannamalai16
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 

Último (20)

Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptxMan or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
Man or Manufactured_ Redefining Humanity Through Biopunk Narratives.pptx
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of EngineeringFaculty Profile prashantha K EEE dept Sri Sairam college of Engineering
Faculty Profile prashantha K EEE dept Sri Sairam college of Engineering
 
Congestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentationCongestive Cardiac Failure..presentation
Congestive Cardiac Failure..presentation
 
4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 

Chemical Similarity

  • 1. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Chemical Similarity Willie Peijnenburg RIVM – Laboratory for Ecological Risk Assessment
  • 2. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity : philosophers’ view exploiting the similarity concept is a sign of immature science (Quine) “it is ill defined to say “A is similar to B” and it is only meaningful to say “A is similar to B with respect to C” A chemical “A” cannot be similar to a chemical “B” in absolute terms but only with respect to some measurable key feature 2
  • 3. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity : chemists’ view Intuitively, based on expert judgment A chemist would describe “similar” compounds in terms of “approximately similar backbone and almost the same functional groups”. Chemists have different views on similarity Experience, context Lajiness et al. (2004). Assessment of the Consistency of Medicinal Chemists in Reviewing Sets of Compounds, J. Med. Chem., 47(20), 4891-4896. 3
  • 4. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Chemical similarity Computerized similarity assessment needs unambiguous definitions Structurally similar molecules have similar biological activities The basic tenet of chemical similarity Long supporting experience Many exceptions Exceptions are important! Identification of the most informative representation of molecular structures Avoiding information loss is important! Similarity measures 4
  • 5. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Chemical similarity quantified Numerical representation of chemical structure Structural similarity Descriptor –based similarity 3D similarity Field –based Spectral Quantum mechanics More… Comparison between numerical representations Distance-like Association, Correlation 5
  • 6. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Structural similarity Substructure searching Maximum Common Substructure Fragment approach Atom, bond or ring counts, degree of connectivity Atom-centred, bond-centred, ring-centred fragments Fingerprints, molecular holograms, atom environments Topological descriptors Hosoya’ Z, Wiener number, Randic index, indices on distance matrices of graph (Bonchev & Trinajstic), bonding connectivity indices (Basak), Balaban J indices, etc. Initially designed to account for branching, linearity, presence of cycles and other topological features Attempts to include 3D information (e.g. distance matrices instead of adjacency matrices) 6
  • 7. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om So: A single group makes w om w w w. w. A B B Y Y.c A B B Y Y.c Structural similarity difference …but… Isosteric replacements of Oral LD50 for male groups rats = 2.5g/kg Dermal LD50 for male rats = 3.54g/kg •Substituents: 3-(2-chloro-4-(trifluoromethyl)phenoxy-) phenyl acetate, Not irritating to eyes •F, Cl, Br, I, CF3,NO2 CAS# 50594-77-9 of rabbits •Methyl,Ethyl, Isoprpyl, Slightly irritating to skin of rabbits Cyclopropyl, t-Butyl,-OH,- SH,-NH2,-OMe,-N(Me)2 Not mutagenic in Salmonella strains •Atoms and groups in rings: Higher potential •-CH=,-N= binding affinity to the estrogen •-CH2-,-NH-,-O-,-S- receptor than the nitrophenyl acetate •More … Higher potential to cause cancer than 5-(2-chloro-4-(trifluoromethyl)phenoxy)-2- nitrophenyl acetate, the phenyl acetate Depends on the endpoint! CAS# 50594-44-0 Walker . J. (2003) ,QSARs for pollution prevention, Toxicity Screening, (e.g. lipophilicity, receptor Risk Assessment and Web Applications, SETAC Press binding) 7
  • 8. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Structural similarity Rosenkranz H.S., Cunningham A.R. (2001) Chemical Categories for Health Hazard Identification: A feasibility Study, Regulatory Toxicology and Pharmacology 33, 313-318. Examined the reliability of using chemical categories to classify HPV chemicals as toxic or nontoxic Found: “most often only a proportion of chemicals in a category were toxic” Conclusion: "traditional organic chemical categories do not encompass groups of chemical that are predominately either toxic or nontoxic across a number of toxicological endpoints or even for specific toxic activities” The bold portion of the chemical in the Category column defined the fragment used to query each data set. Abbreviations: EyI,eye irritation;LD50, rat LD50; Dev, developmental toxicity;CA, rodent carcinogenesis; Mnt, in vivo induction of micronuclei; Sal, Salmonella mutagenesis; MLA, mutagenesis in cultured mouse lymphoma cells. 8
  • 9. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c 3D Similarity Distance-based and angle-based descriptors (e.g. inter-atomic distance) Field similarity (not exhaustive list) Comparative Molecular Field Analysis (CoMFA), CoMSIA Electrostatic potential Shape Electron density Test probe Any grid-based structural property Molecular multi-pole moments (CoMMA) Shape descriptors (not exhaustive list) van der Waals volume and surface (reflect the size of substituents) Taft steric parameter STERIMOL Molecular Shape Analysis 4D QSAR WHIM descriptors Receptor binding 9
  • 10. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Structurally similar compounds can have very different 3D properties Kubinyi, H., Chemical Similarity and Biological activity 10
  • 11. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Physicochemical properties Molecular weight Octanol - water partition coefficient Total energy Heat of formation Ionization potential Molar refractivity More… 11
  • 12. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Quantum chemistry approaches The wave function and the density function contain all the information of a system. All the information about any molecule could be extracted from the electron density. Bond creation and bond breaking in chemical reactions, as well as the shape changes in conformational processes, are expressed by changes in the electronic density of molecules. The electronic density fully determines the nuclear distribution, hence the electronic density and its changes account for all the relevant chemical information about the molecule. In principle, quantum-chemical theory should be able quantum- to provide precise quantitative descriptions of molecular structures and their chemical properties. 12
  • 13. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Quantum chemistry approaches Quantum chemical descriptors - characterize the reactivity, shape and binding properties of a complete molecule or molecular fragments and substituents: HOMO and LUMO energies, total energy, number of filled orbitals, standard deviation of partial atomic charges and electron densities, dipole moment, partial atomic charges Approaches from The Theory of Atom in Molecules – BCP space, TAE/RECON, MEDLA, QShAR (additive density fragments) Quantum chemistry calculations depend on several levels of approximation Computationally intensive 13
  • 14. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Reactivity Similarity between reactions Similarity of chemical structures assessed by generalized reaction types and by gross structural features. Two structures are considered similar if they can be converted by reactions belonging to the same predefined groups (for example oxidation or substitution reactions). 14
  • 15. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity indices Association, correlation, distance coefficients Most popular : Tanimoto distance (fingerprints) TAB N AB Euclidean distance (descriptors) NA N B N AB Carbo index (fields) C AB Z AB Z AA Z BB Essentially a classification problem has to be solved (decide if a query compound is closer to one or another set of compounds) Many methods available (Discriminant Analysis, Neural networks, SVM, Bayesian classification, etc.) Statistical assumptions and statistical error is involved 15
  • 16. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity indices Association indices Correlation indices J. D. Holliday, C-Y. Hu† and P. Willett,(2002) Grouping of Coefficients for the Calculation of Inter- Molecular Similarity and Dissimilarity using 2D Fragment Bit-Strings, Combinatorial Chemistry & High Throughput Screening,5, 155-166 155 16
  • 17. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Fingerprint similarity Information loss – fragments presence and absence instead of counts Bit string saturation – within a large database almost all bits are set Can give nonintuitive results The average similarity appears to increase with the complexity of the query The distribution of Tanimoto values compound found in database searches with a Larger queries are more range of query molecules discriminating (flatter curve, Tanimoto values spread wider) Flower D., On the Properties of Bit Smaller queries have sharp String-Based Measures of Chemical Similarity, J. peak, unable to distinguish Chem. Inf. Comput. Sci., Vol. 38, No. 3, 1998 between molecules 17
  • 18. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Distance indices Euclidean distance City-block distance Mahalanobis distance Distances obey triangle inequality Equidistant contours = Points on the equal distance from the query point 18
  • 19. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity in descriptor space Comparison between a point and groups of points is a classification problem. Euclidean distance performs very well if groups are separable (left). Other classification methods help in other cases. 19
  • 20. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c What do we measure We compare numerical representations of chemical compounds The numerical representation is not unique The numerical representation includes only part of all the information about the compound A distance measure reflects “closeness” only if the data holds specific assumptions 20
  • 21. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Example: Y. Martin et al ( 2002) Do structurally similar molecules have similar biological activity ? Set of 1645 chemicals with IC50s for monoamine oxidase inhibition Daylight fingerprints 1024 bits long ( 0-7 bonds) When using Tanimoto coefficient with a cut off value of 0.85 only 30 % of actives were detected Cutoff values % of actives detected % False positives J. Med. Chem. 2002,45,4350-4358 21
  • 22. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C Chemical similarity caveats w om w om w w w. w. A B B Y Y.c A B B Y Y.c The similarity computation may not correctly represent the intuitive similarity between two chemical structures The properties of a chemical might not be implicit in its molecular structure Molecular structure might not be fully measured and represented by a set of numbers (information loss) Comparison by similarity indices may be counterintuitive Intuitively similar chemical structures may not have similar biological activity Bioisosteric compounds Structurally similar molecules may have different mechanisms of action 22
  • 23. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity and Activity “Neighbourhood principle” Proximity with respect to Similar activity descriptors does not necessary values mean proximity with respect to the activity Depends on the relationship between descriptor and activity True if a continuous & monotonous (e.g. linear,…) relationship holds between descriptors and activity The linear relationship is only a special case, given the complexity of biochemical interactions. Its use should be justified in every specific case Neighbourhood in the and/or used only locally descriptor space 23
  • 24. F T ra n sf o F T ra n sf o PD rm PD rm Y Y Y Y er er ABB ABB y y bu bu 2.0 2.0 to to re re he he k k lic lic C C w om w om w w w. w. A B B Y Y.c A B B Y Y.c Similarity vs. Activity Black square: Salmonella mutagenicity of aromatic amines [Debnath et al. 1992] (log TA98) Red circle: Glende et al. 2001 set: alkyl-substituted (ortho to the amino function) derivatives not included in original Debnath data set logP, Ehomo, Elumo logP, Ehomo, Elumo Similar compounds, Relatively small data set 24