Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Integrative Biochemistry Workshop [Computational Biology]

488 visualizaciones

Publicado el

Sep 9-10, 2017
At Kamnoetvidya Science Academy

Publicado en: Ciencias
  • DOWNLOAD FULL BOOKS, INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL. doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí
  • Sé el primero en recomendar esto

Integrative Biochemistry Workshop [Computational Biology]

  1. 1. INTEGRATIVE BIOCHEMISTRY WORKSHOP INTEGRATIVE BIOCHEMISTRY WORKSHOP (section b: computational biology) kamnoetvidya science academy 9 - 10 SEP 2017 by: bundit boonyarit
  2. 2. BIOINFORMATICS
  3. 3. 3 Life sciences Computer sciences Structure RNA Proteome Sequence Metabolic Biochemistry Genetics Algorithm Database Software Data mining Modelling BIOINFORMATICS COMPUTATIONAL SCIENCE COMPUTATIONAL BIOLOGY BUNDIT BOONYARIT
  4. 4. 4 BIOINFORMATICS Bioinformatics past Bioinformatics now and future mathematics informatics biologyphysics chemistry medicine mathematics informatics biologyphysics chemistry medicine BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  5. 5. 5 Genome “Manual of life” BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  6. 6. 6 Human genome has 3.1 billion base pairs. ~2.9% of bases encode genes ~97% of the genome was previously called “junk” BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  7. 7. 7 BUNDIT BOONYARIT http://www.joelertola.com/grfx/dna/dna_dtl1.jpg COMPUTATIONAL BIOLOGY
  8. 8. 8 https://www.youtube.com/watch?v=Q_WRFw8KQk4 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  9. 9. 9 CELLULAR ORGANIZATION http://anatomyandphysiologyi.com/wp-content/ uploads/2013/05/levels-of-structural-organization.jpg BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  10. 10. 10 The Foundations of Biochemistry4 Proteobacteria (Purple bacteria) Cyanobacteria Flavobacteria Thermotogales Pyrodictium Thermoproteus Thermococcus celer Methanococcus Methanosarcina Halophiles Microsporidia Flagellates Trichomonads Plants Ciliates Fungi Diplomonads Animals Slime molds Entamoebae Archaea Gram- positive bacteria Bacteria Eukarya Green nonsulfur bacteria Methanobacterium FIGURE1–4 Phylogeny of the three domains of life. Phylogenetic rela- tionships are often illustrated by a “family tree” of this type.The basis for this tree is the similarity in nucleotide sequences of the ribosomal RNAs of each group; the more similar the sequence, the closer the location of the branches, with the distance between branches representing the de- gree of difference between two sequences. Phylogenetic trees can also be constructed from similarities across species of the amino acid se- quences of a single protein. For example, sequences of the protein GroEL (a bacterial protein that assists in protein folding) were compared to generate the tree in Figure 3–32. The tree in Figure 3–33 is a “con- sensus” tree, which uses several comparisons such as these to make the best estimates of evolutionary relatedness of a group of organisms. THREE DISTINCT DOMAINS OF LIFE Lehninger Principles of Biochemistry BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  11. 11. 11 The Foundations of Biochemistry4 by O2 diffusing into the cell. With increasing cell size, however, surface-to-volume ratio decreases, until me- tabolism consumes O2 faster than diffusion can supply it. Metabolism that requires O2 thus becomes impossible as cell size increases beyond a certain point, placing a theoretical upper limit on the size of cells. There AreThree Distinct Domains of Life All living organisms fall into one of three large groups (domains) that define three branches of evolution from a common progenitor (Fig. 1–4). Two large groups of single-celled microorganisms can be distinguished on genetic and biochemical grounds: Bacteria and Archaea. Bacteria inhabit soils, surface waters, and the tissues of other living or decaying organisms. Many of the Archaea, recognized as a distinct domain by Carl Woese in the 1980s, inhabit extreme environments—salt lakes, hot springs, highly acidic bogs, and the ocean depths. The available evidence suggests that the Archaea and Bacteria diverged early in evolution. All eukaryotic organisms, which make up the third domain, Eukarya, Proteobacteria (Purple bacteria) Cyanobacteria Flavobacteria Thermotogales Pyrodictium Thermoproteus Thermococcus celer Methanococcus Methanosarcina Halophiles Microsporidia Flagellates Trichomonads Plants Ciliates Fungi Diplomonads Animals Slime molds Entamoebae Archaea Gram- positive bacteria Bacteria Eukarya Green nonsulfur bacteria Methanobacterium FIGURE1–4 Phylogeny of the three domains of life. Phylogenetic rela- tionships are often illustrated by a “family tree” of this type.The basis for this tree is the similarity in nucleotide sequences of the ribosomal RNAs of each group; the more similar the sequence, the closer the location of the branches, with the distance between branches representing the de- gree of difference between two sequences. Phylogenetic trees can also be constructed from similarities across species of the amino acid se- quences of a single protein. For example, sequences of the protein GroEL (a bacterial protein that assists in protein folding) were compared to generate the tree in Figure 3–32. The tree in Figure 3–33 is a “con- sensus” tree, which uses several comparisons such as these to make the best estimates of evolutionary relatedness of a group of organisms. Reduced fuel All organisms Phototrophs Chemotrophs 1015 basepairs in > 165,000 organisms BIOLOGICAL DATABASE SAVE BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  12. 12. The flood of biological data 0 2000 4000 6000 8000 10000 12000 14000 16000 18000 20000 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year Entries InterPro 0 10000 20000 30000 40000 50000 60000 70000 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year Structures PDBe 0 1000000 2000000 3000000 4000000 5000000 6000000 7000000 8000000 9000000 10000000 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year Entries UniProt 0 50000 100000 150000 200000 250000 300000 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year Hybridisations ArrayExpress 0 20 40 60 80 100 120 140 160 180 200 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 YearGenomes Ensembl Ensembl Ensembl genomes 0 50 100 150 200 250 300 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 Year Gigabases EMBL-Bank 12 BUNDIT BOONYARIT The flood of biological data COMPUTATIONAL BIOLOGY
  13. 13. 13 BIOINFORMATICS CONSTRUCT DATABASE CREATE SOFTWARE USING DATABASE + SOFTWARE TO ANSWER + BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  14. 14. 14 BIO- IN BIOINFORMATICS GENOTYPE PHENOTYPE AGCTAGCTG TCGATCGAC DNA/Genome RNA Proteins Molecular Networks Cells Physiology/ Disease Sequence alignment Database similarity search Motif finding Protein interaction networks Transcriptional regulation networks Metabolic and signalling networks Network dynamics Gene finding Computational & comparative genomics Evolution DNA Differential expression Co-expression ncRNA Mass spectroscopy protein identification Structure prediction Structure alignment Population genetics Human genetics Virtual cell simulations Peking University BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  15. 15. 15 -INFORMATICS IN BIOINFORMATICS DATA DISCOVERY Data Management Data Computation Data Mining Modeling/ Simulation Databases Ontologies Meta-data Predictive models Systems simulation Algorithms Software tools Web servers Biological discoveries Peking University BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  16. 16. SUPPORTING INDUSTRY AGCTAGCTG TCGATCGAC AGCTAGCTG TCGATCGAC AGCTAGCTG TCGATCGAC BIOINFORMATICS FISHERIES FORESTRY ENVIRONMENTAL SCIENCES COMPUTING BIOTECHNOLOGY PHARMACEUTICAL SCIENCES MEDICAL SCIENCES COMETICS BIOFUELS AGRI-FOOD 16 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  17. 17. 17 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein function CATALYSIS TATA binding protein Ras Myoglobin HIV proteaseDNA polymerase "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) The TATA binding protein binds a specific DNA sequence and serves as the platform for a complex that initiates transcription of genetic information. (PDB 1tgh) Myoglobin binds a molecule of oxygen reversibly to the iron atom in its heme group (shown in grey with the iron in green). It stores oxygen for use in muscle tissues. (PDB 1a6k) Replication of the AIDS virus HIV depends on the action of a protein-cleaving enzyme called HIV protease. This enzyme is the target for protease-inhibitor drugs (shown in grey). (PDB 1a8k) unction and Architecture TATA binding protein Ras Myoglobin HIV proteaseDNA polymerase "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) The TATA binding protein binds a specific DNA sequence and serves as the platform for a complex that initiates transcription of genetic information. (PDB 1tgh) Myoglobin binds a molecule of oxygen reversibly to the iron atom in its heme group (shown in grey with the iron in green). It stores oxygen for use in muscle tissues. (PDB 1a6k) Replication of the AIDS virus HIV depends on the action of a protein-cleaving enzyme called HIV protease. This enzyme is the target for protease-inhibitor drugs (shown in grey). (PDB 1a8k) tion and Architecture BINDING TATA binding protein Ras Myoglobin HIV proteaseDNA polymerase Binding Catalysis Switching Essentially every chemical reaction in the living cell is catalyzed, and most of the catalysts are protein enzymes. The catalytic efficiency of enzymes is remarkable: reactions can be accelerated by as much as 17 orders of magnitude over simple buffer catalysis. Many structural features contribute to the catalytic power of enzymes: holding reacting groups together in an orientation favorable for reaction (proximity); binding the transition state of the reaction more tightly than ground state complexes (transition state stabilization); acid-base catalysis, and so on. Specific recognition of other molecules is central to protein function. The molecule that is bound (the ligand) can be as small as the oxygen molecule that coordinates to the heme group of myoglobin, or as large as the specific DNA sequence (called the TATA box) that is bound—and distorted—by the TATA binding protein. Specific binding is governed by shape complementarity and polar interactions such as hydrogen bonding. Proteins are flexible molecules and their conformation can change in response to "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) The TATA binding protein binds a specific DNA sequence and serves as the platform for a complex that initiates transcription of genetic information. (PDB 1tgh) Myoglobin binds a molecule of ox reversibly to the iron atom in its h group (shown in grey with the ir green). It stores oxygen for use in tissues. (PDB 1a6k) Replication of the AIDS virus HIV on the action of a protein-cleaving called HIV protease. This enzym target for protease-inhibitor drugs in grey). (PDB 1a8k) 1-0 Overview: Protein Function and Architecture TATA binding protein Ras Myoglobin HIV proteaseDNA polymerase Binding Catalysis Switching Essentially every chemical reaction in the living cell is catalyzed, and most of the catalysts are protein enzymes. The catalytic efficiency of enzymes is remarkable: reactions can be accelerated by as much as 17 orders of magnitude over simple buffer catalysis. Many structural features contribute to the catalytic power of enzymes: holding reacting groups together in an orientation favorable for reaction (proximity); binding the transition state of the reaction more tightly than ground state complexes (transition state stabilization); acid-base catalysis, and so on. Specific recognition of other molecules is central to protein function. The molecule that is bound (the ligand) can be as small as the oxygen molecule that coordinates to the heme group of myoglobin, or as large as the specific DNA sequence (called the TATA box) that is bound—and distorted—by the TATA binding protein. Specific binding is governed by shape complementarity and polar interactions such as hydrogen bonding. Proteins are flexible molecules and their conformation can change in response to "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) The TATA binding protein binds a specific DNA sequence and serves as the platform for a complex that initiates transcription of genetic information. (PDB 1tgh) Myoglobin binds a molecule of oxygen reversibly to the iron atom in its heme group (shown in grey with the iron in green). It stores oxygen for use in muscle tissues. (PDB 1a6k) Replication of the AIDS virus HIV depends on the action of a protein-cleaving enzyme called HIV protease. This enzyme is the target for protease-inhibitor drugs (shown in grey). (PDB 1a8k) 1-0 Overview: Protein Function and Architecture
  18. 18. 18 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein function STRUCTURAL PROTEINS SWITCHING Ras Silk F-actin Silk derives its strength and flexibility from its structure: it is a giant stack of antiparallel beta sheets. Its strength comes from the covalent and hydrogen bonds within each sheet; the flexibility from the van der Waals interactions that hold the sheets together. (PDB 1slk) "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) Replication of the AIDS virus HIV depends on the action of a protein-cleaving enzyme called HIV protease. This enzyme is the target for protease-inhibitor drugs (shown in grey). (PDB 1a8k) Actin fibers are important for muscle contraction and for the cytoskeleton. They are helical assemblies of actin and actin-associated proteins. (Courtesy of Ken Holmes) The GDP-bound ("off"; PDB 1pll) state of Ras differs significantly from the GTP-bound ("on"; PDB 121p) state. This difference causes the two states to be recognized by different proteins in signal transduction pathways. nctions performed by proteins Structure ©2004 New Science Press Ltd Ras Silk F-actin Switching Structural Proteins together in an orientation favorable for reaction (proximity); binding the transition state of the reaction more tightly than ground state complexes (transition state stabilization); acid-base catalysis, and so on. Proteins are flexible molecules and their conformation can change in response to changes in pH or ligand binding. Such changes can be used as molecular switches to control cellular processes. One example, which is critically important for the molecular basis of many cancers, is the conformational change that occurs in the small GTPase Ras when GTP is hydrolyzed to GDP. The GTP-bound conformation is an "on" state that signals cell growth; the GDP-bound structure is the "off" signal. Protein molecules serve as some of the major structural elements of living systems. This function depends on specific association of protein subunits with themselves as well as with other proteins, carbohydrates, and so on, enabling even complex systems like actin fibrils to assemble spontaneously. Structural proteins are also important sources of biomaterials, such as silk, collagen, and keratin. Silk derives its strength and flexibility from its structure: it is a giant stack of antiparallel beta sheets. Its strength comes from the covalent and hydrogen bonds within each sheet; the flexibility from the van der Waals interactions that hold the sheets together. (PDB 1slk) "off" "on" DNA replication is catalyzed by a specific polymerase that copies the genetic material and edits the product for errors in the copy. (PDB 1pbx) Replication of the AIDS virus HIV depends on the action of a protein-cleaving enzyme called HIV protease. This enzyme is the target for protease-inhibitor drugs (shown in grey). (PDB 1a8k) Actin fibers are important for muscle contraction and for the cytoskeleton. They are helical assemblies of actin and actin-associated proteins. (Courtesy of Ken Holmes) The GDP-bound ("off"; PDB 1pll) state of Ras differs significantly from the GTP-bound ("on"; PDB 121p) state. This difference causes the two states to be recognized by different proteins in signal transduction pathways. Figure 1-1 Four examples of biochemical functions performed by proteins Chapter 1 From Sequence to Structure2 ©2004 New Science Press Ltd
  19. 19. OXYHEMOGLOBIN PDB: 1GZX 19 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  20. 20. ras protein PDB: 4Q21 20 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  21. 21. 21 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY ras protein
  22. 22. ras protein Inactive form (Ras + GDP) Active form (Ras + GTP) PDB: 4Q21 PDB: 4Q21 +1 Pi (Phosphate ion) - Play an important role in cell division process - Active site is Magnesium (Mg) ion. - Bind to Guanosine diphosphate (GDP) and Guanosine triphosphate (GTP) in order to function PDB: 121P 22 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY PDB: 121P
  23. 23. References Geyer, M. and Wittinghofer A.: GEFs, GAPs, GDIs and effectors: taking a closer (3D) look at the regulation of Ras-related GTP-binding proteins. Curr. Opin. Struct.Biol. 1997,7:786–792. Biochem.Sci. 1998,23:257–262. Spoerner, M. et al.: Dynamic properties of the Ras switch I region and its importance for binding to effectors. Proc.Natl Acad.Sci.USA 2001,98:4944–4949. Sprang, S.R.: G protein mechanisms: insights from Vetter, I.R. and Wittinghofer, A.: Nucleoside triphos- phate-binding proteins: different scaffolds to achieve phosphoryl transfer. Q. Rev. Biophys. 1999, 32:1–56. Wittinghofer, A.: Signal transduction via Ras. Biol. Chem. 1998,379:933–937. RasGDP RasGTP cell growth, differentiationdownstream signaling molecules GTP GDP signal RasGEF RasGAP GAP1 Sos Pi 'off' 'on' cell-surface receptor to Ras. In the activated GTP-bound state Ras interacts with and activates several target proteins involved in intracellular signaling pathways. Ras is switched off by hydrolysis of the bound GTP. This reaction is facilitated by the action of specific GTPase-activating proteins (GAPs), the best studied of which are GAP1 (illustrated here), p120GAP and neurofibromin, the product of a tumor suppressor gene. ras protein PDB: 121P 23 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY Guanine-nucleotide exchange factor (GEF) GTPase-activating protein (GAP)
  24. 24. GDP GTP Nitrogen OxygenPhosphate Carbon Oxygen Nitrogen Carbon Phosphate 24 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY
  25. 25. 25 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein building block CPK (Corey-Pauling-Koultan) drawing method: Nitrogen(N) = Blue Carbon(C) = Cyan/Black Sulphur/Sulfur(S) = Yellow Oxygen(O) = Red Hydrogen(H) = White Phosphorus(P) = Brown R ␣ The chemical structure of an amin backbone is the same for all amin of the amino group (NH2), the alp carboxylic acid group (COOH). Dif are distinguished by their differen neutral form of an amino acid is s pH 7 the amino and carboxylic ac NH3 + and COO–. Except for glycin acids are chiral (that is, they have asymmetry). The form shown is t which is most common. Hydrogen Carbon Oxygen Sulfur Nitrogen C C N C H O C C N+ C H O– Peptide-bond resonance structures The double-bond character is also expressed in the length of the bond between the CO and the NH groups. The CON distance in a peptide bond is typically 1.32 Å, which is between the values expected for a CON single bond (1.49 Å) and a CPN double bond (1.27 Å), asshowninFigure2.19.Finally,thepeptidebondisuncharged, allowing polymers of amino acids linked by peptide bonds to form tightly packed globular structures. Two configurations are possible for a planar peptide bond. In the trans configuration, the two a-carbon atoms are on oppo- site sides of the peptide bond. In the cis configuration, these groups are on the same side of the peptide bond. Almost all pep- tide bonds in proteins are trans. This preference for trans over cis can be explained by the fact that steric clashes between groups attached to the a-carbon atoms hinder the formation of the cis form but do not arise in the trans configuration (Figure 2.20). By far the most common cis peptide bonds are XOPro linkages. Such bonds show less preference for the trans configuration 1.32 Å 1.24 Å 1.45 Å1.51 Å H Cα Cα C N 1.0 Å O Figure 2.19 Typical bond lengths within a peptide unit. The peptide unit is shown in the trans configuration. a peptide bond, six atoms lie in the same plane: the a-carbon atom and CO group of the first amino acid and the NH group and a-carbon atom of the second amino acid.The nature of the chemical bonding within a peptide accounts for the bond’s planarity. The bond resonates between a single bond and a double bond. Because of this double-bond character, rotation about this bond is prevented and thus the conformation of the peptide backbone is constrained. O Cα C Figure 2.18 Peptide bonds are planar. In a pair of linked amino acids, six atoms (Ca, C, O, N, H, and Ca) lie in a plane. Side chains are shown as green balls.
  26. 26. 26 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY zwitterionic form over time so that the L isomer became dominant in solution. Amino acids in solution at neutral pH exist predominantly as dipolar ions (also called zwitterions). In the dipolar form, the amino group is proto- nated (ONH3 1 ) and the carboxyl group is deprotonated (OCOO2 ). The ionization state of an amino acid varies with pH (Figure 2.6). In acid Figure 2.5 Only L amino acids ar in proteins. Almost all L amino acid S absolute configuration. The counte direction of the arrow from highest- priority substituents indicates that th center is of the S configuration. 0 2 4 6 8 10 12 14 Concentration pH Zwitterionic form Both groups protonated Both groups deprotonated H+ H+ COOH + H3N+ H3N COO– H2N COO– H+ H+ R H C R H C R H C Figure 2.6 Ionization state as a of pH. The ionization state of amino altered by a change in pH. The zwitt form predominates near physiologic 2.1 Proteins Are Built from a Rep Amino acids are the building blocks of pr of a central carbon atom, called the a carb carboxylic acid group, a hydrogen atom, R group is often referred to as the side ch connected to the tetrahedral a-carbon atom may exist in one or the other of two mirror and the D isomer (Figure 2.4). Only L amino acids are constituents o acids, the L isomer has S (rather (Figure 2.5). What is the basis for the pr answer is not known, but evidence show more soluble than is a racemic mixture of to form crystals. This small solubility diffe COO− C RH Cα NH3 + L isomer Figure 2.4 The L and D isomers of amino acids. The D isomers are mirror images of each other.
  27. 27. 27 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY amino acid (+)RHK (–)DECY Royal Hong Kong
  28. 28. 28 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY amino acid +NH COOH Glycine Gly G Alanine Ala A Valine Val V Leucine Leu L Isoleucine Ile I Serine Ser S Aspartic acid Asp D Glutamic acid Glu E bond to functional group (R) double bond partial double bond single bond Proline Pro P Arginine Arg R Phenylalanine Phe F Hydrophobic Hydrophilic COOH+ – – ␣ ␣␣ backbone is the same for all amino acids and consists of the amino group (NH2), the alpha carbon and the carboxylic acid group (COOH). Different amino acids are distinguished by their different side chains, R. The neutral form of an amino acid is shown: in solution at pH 7 the amino and carboxylic acid groups ionize, to NH3 + and COO– . Except for glycine, where R=H, amino acids are chiral (that is, they have a left–right asymmetry). The form shown is the L-configuration, which is most common. polypeptide chain. The R group is the side chain. The 20 different side chains that occur in proteins are depicted below. For proline, the side chain is fused back to the nitrogen of the backbone. The configuration about the alpha carbon is L for most amino acids in proteins. Hydrogen Carbon Oxygen Sulfur Nitrogen
  29. 29. 29 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY amino acid S – N– +NH NH2 O– COOH Alanine Ala A Valine Val V Leucine Leu L Isoleucine Ile I Serine Ser S Asparagine Asn N Aspartic acid Asp D Glutamic acid Glu E Glutamine Gln Q Threonine Thr T Cysteine Cys C Proline Pro P Lysine Lys K Arginine Arg R Histidine His H Phenylalanine Phe F Tyrosine Tyr Y Methionine Met M Tryptophan Trp W Hydrophilic Amphipathic COOH+ – – + ␣ From Sequence to Structure Chapter 1 5©2004 New Science Press Ltd Figure 1-3 Amino-acid structure and the chemical characters of the amino-acid side chains Charged side chains are shown in the form that predominates at pH 7. For proline, the nitrogen and alpha carbon are shown because the side chain is joined to the nitrogen atom to form a ring that includes these atoms.
  30. 30. 30 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY atomic annotation alpha beta gamma delta epsilon zeta
  31. 31. The Peptide Bond 1-3 The properties of the peptide bond have important effects on the stability and flexibility of polypeptide chains in water The properties of the amide bond account for several important properties of polypeptide chains in water. The stability of the peptide bond, as well as other properties important for the behavior of polypeptides, is due to resonance, the delocalization of electrons over several atoms. Resonance has two other important consequences. First, it increases the polarity of the peptide bond: the dipole moment of each peptide bond is shown in Figure 1-8. The polarity of the peptide bond can make an important contribution to the behavior of folded proteins, as discussed later in section 1-6. Second, because of resonance, the peptide bond has partial double-bond character, which means that the three non-hydrogen atoms that make up the bond (the carbonyl oxygen O, the carbonyl carbon C and the amide nitrogen N) are coplanar, and that free rotation about the bond is limited (Figure 1-9). The other two bonds in the basic repeating unit of the polypeptide backbone, the N–Ca and Ca–C bonds (where Ca is the carbon atom to which the side chain is attached), are single bonds and free rotation is permitted about them provided there is no steric interference from, for example, the side chains. The angle of the N–Ca bond to the Figure 1-8 Schematic diagram of an extended polypeptide chain The repeating backbone is shown, with schematized representations of the different side chains (R1, R2 and so on). Each peptide bond is shown in a shaded box. Also shown are the individual dipole moments (arrows) associated with each bond. The dashed lines indicate the resonance of the peptide bond. backbone R3 C O H H H N C C R2 R1 ⌿ ⌽ C´ O H H 118˚ 120˚ 122˚ 121˚ 123˚ 1.23Å 1.52Å 1.45Å 1.33Å 116˚ N N C␣ C␣ O H C peptide plane peptide plane – + – + – + – + R1 O C C N H H HO C C N H R2 R3 O C C N H N H O C C N H R4 H H ce to Structure ©2004 New Science Press Ltd nd and e bond on, he actual peptide ntrolled s nearly directed olypeptide s the h the rminus H R2 R1 O O C C N H H H H O O C C N H H H R1 O C C N H H H O O C C N H H O H H water amino terminus (N terminus) carboxyl terminus (C terminus) O H H water H R2 ed when a group with polypeptide alpha-carbon acid residue. s of peptide etween two separated charges that may be full or partial.Molecules or functional groups having a dipole moment are said to be polar. hydrolysis: breaking a covalent bond by addition of a molecule of water. peptide bond: another name for amide bond, a chemical bond formed when a carboxylic acid condenses with an amino group with the expulsion of a water molecule. The term peptide bond is used only when both groups come from amino acids. phi torsion angle: see torsion angle. polypeptide: a polymer of amino acids joined together by peptide bonds. psi torsion angle: see torsion angle. resonance: delocalization of bonding electrons over more than one chemical bond in a molecule.Resonance greatly increases the stability of a molecule. It can be represented, conceptually, as if the properties of the molecule were an average of several structures in which the chemical bonds differ. 31 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY peptide bond The Peptide Bond 1-3 The properties of the peptide bond have important effects on the stability and flexibility of polypeptide chains in water The properties of the amide bond account for several important properties of polypeptide chains in water. The stability of the peptide bond, as well as other properties important for the behavior of polypeptides, is due to resonance, the delocalization of electrons over several atoms. Resonance has two other important consequences. First, it increases the polarity of the peptide bond: the dipole moment of each peptide bond is shown in Figure 1-8. The polarity of the peptide bond can make an important contribution to the behavior of folded proteins, as discussed later in section 1-6. Second, because of resonance, the peptide bond has partial double-bond character, which means that the three non-hydrogen atoms that make up the bond (the carbonyl oxygen O, the carbonyl carbon C and the amide nitrogen N) are coplanar, and that free rotation about the bond is limited (Figure 1-9). The other two bonds in the basic repeating unit of the polypeptide backbone, the N–Ca and Ca–C bonds (where Ca is the carbon atom to which the side chain is attached), are single bonds and free rotation is permitted about them provided there is no steric interference from, for example, the side chains. The angle of the N–Ca bond to the Figure 1-8 Schematic diagram of an extended polypeptide chain The repeating backbone is shown, with schematized representations of the different side chains (R1, R2 and so on). Each peptide bond is shown in a shaded box. Also shown are the individual dipole moments (arrows) associated with each bond. The dashed lines indicate the resonance of the peptide bond. backbone R3 C O H H H N C C R2 R1 ⌿ ⌽ C´ O H H 118˚ 120˚ 122˚ 121˚ 123˚ 1.23Å 1.52Å 1.45Å 1.33Å 116˚ N N C␣ C␣ O H C peptide plane peptide plane – + – + – + – + R1 O C C N H H HO C C N H R2 R3 O C C N H N H O C C N H R4 H H Figure 1-9 Extended polypeptide chain showing the typical backbone bond lengths and angles The planar peptide groups are indicated as shaded regions and the backbone torsion angles are indicated with circular arrows, with the phi and psi torsion angles adjacent peptide bond is known as the phi torsion angle, and the angle of the C–Ca bond to
  32. 32. 32 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY chemical bonds Propertie In a randomly coiled polypeptide chain the dipole moments of the individual backbone amide Alpha helix 3-10 helix Pi-helix Polyproline I Polyproline II Polyproline III –57 –49 57 –83 –78 –80 –47 –26 –70 +158 +149 +150 180 180 180 0 180 180 3.6 3.0 4.4 3.33 3.0 3.0 1.5 2.0 1.15 1.9 3.12 3.1 Conformation Phi Psi Omega Residues per turn Translation per residue Average Conformational Parameters of Helical Elements ptide r the veral f the arity eins, hich , the t the ptide hain is no o the , s backbone R3 C O H H H N C C R2 R1 ⌿ ⌽ C´ O H H 118˚ 120˚ 122˚ 121˚ 123˚ 1.23Å 1.52Å 1.45Å 1.33Å 116˚ N N C␣ C␣ O H C peptide plane peptide plane N H Figure 1-9 Extended polypeptide chain showing the typical backbone bond lengths and angles The planar peptide groups are indicated as shaded regions and the backbone torsion angles are indicated with circular arrows, with the phi and psi torsion angles marked. The omega torsion angle about the nd to otein (B) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptid polypeptide can be adjusted by rotation about two singl about the bond between the nitrogen and the a-carbon rotation about the bond between the a-carbon and the the bond between the nitrogen and the a-carbon atoms down the bond between the a-carbon and the carbonyl Are all combinations of ␾ and Ramachandran recognized that many com of steric collisions between atoms. The all ways. The rotations about these bonds can be specified by torsion angles (Figure 2.22). The angle of rotation about the bond between the nitrogen and the a-carbon atoms is called phi (␾). The angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms is called psi (␺). A clockwise rotation about either bond as viewed from the nitrogen atom toward the a-carbon atom or from the carbonyl group toward the a-carbon atom corresponds to a positive value. The ␾ and ␺ angles determine the path of the polypeptide chain. ␾ = −80° ␾ (B) ␺ ␺ = +85° (C) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (␾) is the angle of rotation about the bond between the nitrogen and the a-carbon atoms, whereas psi (␺) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how ␾ is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how ␺ is measured. Torsion an A measure usually take 1180 degr times called Are all combinations of ␾ and ␺ possible? Gopalasamudram Ramachandran recognized that many combinations are forbidden because and the a-carbon atoms is called phi (␾). The angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms is called psi (␺). A clockwise rotation about either bond as viewed from the nitrogen atom toward the a-carbon atom or from the carbonyl group toward the a-carbon atom corresponds to a positive value. The ␾ and ␺ angles determine the path of the polypeptide chain. ␾ = −80° ␾ (B) ␺ ␺ = +85° (C) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (␾) is the angle of rotation about the bond between the nitrogen and the a-carbon atoms, whereas psi (␺) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how ␾ is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how ␺ is measured. Tor A m usu 11 tim Are all combinations of ␾ and ␺ possible? Gopalasamudram
  33. 33. 33 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY chemical bonds ␾ = −80° ␾ (B) ␺ ␺ = +85° (C) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (␾) is the angle of rotati about the bond between the nitrogen and the a-carbon atoms, whereas psi (␺) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how ␾ is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how ␺ is measure Are all combinations of ␾ and ␺ possible? Gopalasamudra Ramachandran recognized that many combinations are forbidden becau atom corresponds to a positive value. The ␾ and ␺ angles determine the path of the polypeptide chain. ␾ = −80° ␾ (B) ␺ ␺ = +85° (C) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (␾) is the angle of rotation about the bond between the nitrogen and the a-carbon atoms, whereas psi (␺) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how ␾ is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how ␺ is measured. 11 tim Are all combinations of ␾ and ␺ possible? Gopalasamudram Ramachandran recognized that many combinations are forbidden because of steric collisions between atoms. The allowed values can be visualized on a two-dimensional plot called a Ramachandran diagram (Figure 2.23). Three-quarters of the possible (␾, ␺) combinations are excluded simply by local steric clashes. Steric exclusion, the fact that two atoms cannot be in the same place at the same time, can be a powerful organizing principle. path of the polypeptide chain. ␾ = −80° ␾ (B) ␺ ␺ = +85° (C) N C C O RH N C C OH RH N H H C C O R H ␾ ␺ (A) Figure 2.22 Rotation about bonds in a polypeptide. The structure of each amino acid in a polypeptide can be adjusted by rotation about two single bonds. (A) Phi (␾) is the angle of rotation about the bond between the nitrogen and the a-carbon atoms, whereas psi (␺) is the angle of rotation about the bond between the a-carbon and the carbonyl carbon atoms. (B) A view down the bond between the nitrogen and the a-carbon atoms, showing how ␾ is measured. (C) A view down the bond between the a-carbon and the carbonyl carbon atoms, showing how ␺ is measured. Are all combinations of ␾ and ␺ possible? Gopalasamudram Ramachandran recognized that many combinations are forbidden because of steric collisions between atoms. The allowed values can be visualized on a two-dimensional plot called a Ramachandran diagram (Figure 2.23). Three-quarters of the possible (␾, ␺) combinations are excluded simply by local steric clashes. Steric exclusion, the fact that two atoms cannot be in the same place at the same time, can be a powerful organizing principle. mposition +180 +180 120 120 60 600 0 −60 −60 −120 −120 −180 −180 ␾ ␺ (␾ = 90°, ␺ = −90°) Disfavored Figure 2.23 A Ramachandran diagram showing the values of ␾ and ␺. Not all ␾ and ␺ values are possible without collisions between atoms. The most favorable regions are shown in dark green; borderline regions are shown in light green. The structure on the right is disfavored because of steric clashes.
  34. 34. 34 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY chemical bonds Any form of structural asymmetry in a molecule gives rise to differences in absorption of left-handed versus right-handed circularly polarized light. Measurement of this difference is called circular dichroism (CD) Antiparallel ␤ sheets Collagen triple helix Right-twisted ␤ sheets Parallel ␤ sheets Left-handed ␣ helix Right-handed ␣ helix ϩ180 120 60 0 Ϫ60 Ϫ120 Ϫ180 ϩ1800Ϫ180 ␺(degrees) ␾ (degrees)(a) ϩ180 120 ␤ turns are most p by the peptide occur more than have Gly as the eptide groups of ino acid residues 1
  35. 35. 35 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY chemical reaction more uncompensated partial or full charges. Thus, in protein structure nearly all potential hydrogen-bond donors and acceptors are participating in such interactions, either between polar groups of the protein itself or with water molecules. In a polypeptide chain of indeterminate sequence the most common hydrogen-bond groups are the peptide C=O and N–H; in the interior of a protein these groups cannot make hydrogen bonds with water, so they tend to hydro- gen bond with one another, leading to the secondary structure which stabilizes the folded state. Interaction Chemical Interactions that Stabilize Polypeptides Covalent bond Disulfide bond Salt bridge Hydrogen bond Long-range electrostatic interaction Van der Waals interaction Distance dependence - - Donor (here N), and acceptor (here O) atoms <3.5 Å Donor (here N), and acceptor (here O) atoms <3.5 Å Depends on dielectric constant of medium. Screened by water. 1/r dependence Short range. Falls off rapidly beyond 4 Å separation. 1/r6 dependence Typical distance 1.5 Å 2.2 Å 2.8 Å 3.0 Å Variable 3.5 Å Free energy (bond dissociation enthalpies for the covalent bonds) 356 kJ/mole (610 kJ/mole for a C=C bond) 167 kJ/mole 12.5–17 kJ/mole; may be as high as 30 kJ/mole for fully or partially buried salt bridges (see text), less if the salt bridge is external 2–6 kJ/mole in water; 12.5–21 kJ/mole if either donor or acceptor is charged Depends on distance and environment. Can be very strong in nonpolar region but very weak in water 4 kJ/mole (4–17 in protein interior) depending on the size of the group (for comparison, the average thermal energy of molecules at room temperature is 2.5 kJ/mole) –Cα–C– –Cys–S–S–Cys– H H I I –C–H H–C– I I H H Example – C – + O H–N–H H O – C – + O H–N–H H O N–H O C References Burley, S.K.and Petsko, G.A.: Weakly polar interactions in proteins.Adv.Prot.Chem. 1988,39:125–189. Dunitz, J.D.: Win some, lose some: enthalpy-entropy compensation in weak intermolecular interactions. Chem.Biol. 1995,2:709–712. proteins in solution. J.Biotechnol.2000,79:193–203. Pauling, L.C.: The Nature of the Chemical Bond and the Structure of Molecules and Crystals 3rd ed. Chapter 8 (Cornell Univ.Press,Ithaca,New York,1960). Sharp,K.A.and Englander,S.W.:How much is a stabiliz- ing bond worth? TrendsBiochem.Sci.1994,19:526–529.
  36. 36. 36 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY chemical reactionSTANDARD BOND LENGTHS Bond Structure Length (Å) COH R2CH2 1.07 Aromatic 1.08 RCH3 1.10 COC Hydrocarbon 1.54 Aromatic 1.40 CPC Ethylene 1.33 CqC Acetylene 1.20 CON RNH2 1.47 OPCON 1.34 COO Alcohol 1.43 Ester 1.36 CPO Aldehyde 1.22 Amide 1.24 COS R2S 1.82 NOH Amide 0.99 OOH Alcohol 0.97 OOO O2 1.21 POO Ester 1.56 SOH Thiol 1.33 SOS Disulfide 2.05
  37. 37. 37 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure Primary structure
  38. 38. 38 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure Secondary structure The simplest secondary structure element is the beta turn The simplest secondary structure element usually involves four residues but sometimes requires only three. It consists of a hydrogen bond between the carbonyl oxygen of one residue (n) and the amide N–H of residue n+3, reversing the direction of the chain (Figure 1-12). This pattern of hydrogen bonding cannot ordinarily continue because the turn is too tight. This tiny element of secondary structure is called a beta turn or reverse turn or, sometimes, a hairpin turn based on its shape. In a few cases, this interaction can be made between residue n and n+2, but such a turn is strained. Although the reverse turn represents a simple way to satisfy the hydrogen-bonding capability of a peptide group, inspection of this structure reveals that most of the C=O and N–H groups in the four residues that make up the turn are not making hydrogen bonds with other backbone atoms (Figure 1-12). Water molecules can donate and accept hydrogen bonds to these groups if the turn is not buried. Therefore, beta turns are found on the surfaces of folded proteins, where they are in contact with the aqueous environment, and by reversing the direction of the chain they can limit the size of the molecule and maintain a compact state. References Deane, C.M. et al.: Carbonyl-carbonyl interactions stabilize the partially allowed Ramachandran con- formations of asparagine and aspartic acid. Protein Eng. 1999,12:1025–1028. Mattos, C. et al.: Analysis of two-residue turns in proteins. J.Mol.Biol. 1994,238:733–747. Richardson, J.S. and Richardson, D.C.: Principles and patterns of protein conformation in Prediction of Protein Structure and the Principles of Protein Conformation 2nd ed. Fasman, G.D. ed. (Plenum Press, New York,1990),1–98. Figure 1-12 Typical beta turn Schematic diagram showing the interresidue backbone hydrogen bonds that stabilize the reversal of the chain direction. Side chains are depicted as large light-purple spheres. The tight geometry of the turn means that some residues, such as glycine, are found more commonly in turns than others. pink alpha-helical region on the right is actually for a left-handed helix, which is only rarely observed in short segments in proteins. The zero values of phi and psi are defined as the trans configuration. extended chain parallel ␤ sheet 310 helix ␣ helix ⌿(degrees) ⌽ (degrees) –180 –180 1800 0 4 3 2 1 to Structure turn or so of left-handed alpha helix has ever been observed in the structure of a real protein. There appears to be no practical limit to the length of an alpha helix; helices hundreds of Ångstroms long have been observed, such as in the keratin fibers that make up human hair. There are variants of the alpha helix with slightly different helical parameters (Figure 1-14), but they are much less common and are not very long because they are slightly less stable. with a at define umber of idue, and ht to be ndividual e end of the dipole is at the beginning (amino terminus) of the helix;the negative end is at the carboxyl terminus of the helical rod. lipid bilayer: the structure of cellular membranes, formed when two sheets of lipid molecules pack against each other with their hydrophobic tails forming the interior of the sandwich and their polar head- groups covering the outside. References Hol,W.G.:The role of the alpha helix dipole in protein function and structure. Prog. Biophys. Mol. Biol. 1985, 45:149–195. Pauling, L. et al.: The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl Acad. Sci. USA 1951, 37:205–211. Scott, J.E.: Molecules for strength and shape. Trends Biochem.Sci. 1987,12:318–321. ©2004 New Science Press Ltd path ed s and gen p to at th its the nd –H d has O nd at nds to ue is 1.5-Å rise 100o-rotation 5 Å H bond d+ d– R1 R2 R3 R4 (a) 1 2 3 4 5 6 7 8 9 (b) (c) R5 R6 R7 R8 R9 n+4 alpha helix Beta sheets are extended structures that sometimes form barrels In contrast to the alpha helix, the beta pleated sheet, whose name derives from the corrugated appearance of the extended polypeptide chain (Figure 1-17), involves hydrogen bonds between backbone groups from residues distant from each other in the linear sequence. In beta sheets, two or more strands that may be widely separated in the protein sequence are arranged side by side, with hydrogen bonds between the strands (Figure 1-17). The strands can run in the same direction (parallel beta sheet) or antiparallel to one another; mixed sheets with both parallel and antiparallel strands are also possible (Figure 1-17). Nearly all polar amide groups are hydrogen bonded to one another in a beta-sheet structure, except for the N–H and C=O groups on the outer sides of the two edge strands. Edge strands may make hydrogen bonds in any of several ways. They may simply make hydrogen bonds to water, if they are exposed to solvent; or they may pack against polar side chains in, for example, a neighboring alpha helix; or they may make hydrogen bonds to an edge strand in another protein chain, forming an extended beta structure that spans more than one subunit and thereby stabilizes quaternary structure (Figure 1-18). Or the sheet may curve round on itself to form a barrel structure, with the two edge strands hydrogen bonding to one another to complete the closed cylinder (Figure 1-19). Such beta barrels are a common feature of protein architecture. Parallel sheets are always buried and small parallel sheets almost never occur. Antiparallel sheets by contrast are frequently exposed to the aqueous environment on one face.These observations suggest that antiparallel sheets are more stable, which is consistent with their hydrogen bonds being more linear (see Figure 1-17). Silk, which is notoriously strong, is made up of stacks of antiparallel beta sheets. Antiparallel sheets most commonly have beta turns connecting the strands, although sometimes the strands may come from discontiguous regions of the linear sequence, in which case the connections are more complex and may include segments of alpha Definitions antiparallel beta sheet: a beta sheet, often formed from contiguous regions of the polypeptide chain, in parallel beta sheet: a beta sheet, formed from non- contiguous regions of the polypeptide chain, in which every strand runs in the same direction. 1-7 Properties of the Beta Sheet Figure 1-17 The structure of the beta sheet The left figure shows a mixed beta sheet, that is one containing both parallel and antiparallel segments. Note that the hydrogen bonds are more linear in the antiparallel sheet. On the right are edge-on views of antiparallel (top) and parallel sheets (bottom). The corrugated appearance gives rise to the name “pleated sheet” for these elements of secondary structure. Consecutive side chains, indicated here as numbered geometric symbols, point from alternate faces of both types of sheet. 42 Antiparallel Parallel 1 2 2 3 44 C N N C 1 2 3 3 C N N C 4 1 3 1 beta sheet
  39. 39. 39 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure e to Structure the interactions of the alpha helix both with other parts of a folded protein chain and with other protein molecules. The alpha helix is a compact structure, with approximate phi, psi values of –60° and –50° respectively: the distance between successive residues along the helical axis (translational rise) is only 1.5 Å (Figure 1-13a). It would take a helix 20 residues long to span a distance of 30 Å, the thickness of the hydrophobic portion of a lipid bilayer (alpha helices are common in the trans- membrane portions of proteins that span the lipid bilayer in cell membranes; see section 1-11). Alpha helices can be right-handed (clockwise spiral staircase) or left-handed (counterclockwise), but because all amino acids except glycine in proteins have the L-configuration, steric constraints favor the right-handed helix, as the Ramachandran plot indicates (see Figure 1-11), and only a turn or so of left-handed alpha helix has ever been observed in the structure of a real protein. There appears to be no practical limit to the length of an alpha helix; helices hundreds of Ångstroms long have been observed, such as in the keratin fibers that make up human hair. There are variants of the alpha helix with slightly different helical parameters (Figure 1-14), but they are much less common and are not very long because they are slightly less stable. x with a hat define number of sidue, and ght to be individual ve end of the dipole is at the beginning (amino terminus) of the helix;the negative end is at the carboxyl terminus of the helical rod. lipid bilayer: the structure of cellular membranes, formed when two sheets of lipid molecules pack against each other with their hydrophobic tails forming the interior of the sandwich and their polar head- groups covering the outside. References Hol,W.G.:The role of the alpha helix dipole in protein function and structure. Prog. Biophys. Mol. Biol. 1985, 45:149–195. Pauling, L. et al.: The structure of proteins: two hydrogen-bonded helical configurations of the polypeptide chain. Proc. Natl Acad. Sci. USA 1951, 37:205–211. Scott, J.E.: Molecules for strength and shape. Trends Biochem.Sci. 1987,12:318–321. ©2004 New Science Press Ltd path ted es and ogen op to s at with its f the end N–H nd has =O und at onds to due is 1.5-Å rise 100o-rotation 5 Å H bond d+ d– R1 R2 R3 R4 (a) 1 2 3 4 5 6 7 8 9 (b) (c) R5 R6 R7 R8 R9 n+4 alpha helix BMC Microbiology (2001) 1:1 Figure 2 Concentric Helical Wheel Projections of Proposed Lass for 18 amino acid segments of the proposed helices of Lassa and align the viral sequences, with the inner sequence that from Lass bank U31033). Rectangles indicate identical or highly similar resid charge exclusion (hydrophobic) subtending an angle of 160°
  40. 40. 40 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure 1 2 3 4 5 6 7 812 16 9 13 17 10 14 18 11 15100 degrees MLQSMVSLLQSLVSLIIQ
  41. 41. 41 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure Tertiary structure Four interactions stabilize the tertiary structure of a protein: (a) ionic bonding, (b) hydrogen bonding, (c) disulfide linkages, and (d) dispersion forces.
  42. 42. 42 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protein structure
  43. 43. 43 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protonation state of amino acid NH2CHC CH2 HO O C OH O NH2CHC CH2 HO O C O O NH2CHC CH2 HO O CH2 C OH O NH2CHC CH2 HO O CH2 C O O NH2CHC CH2 HO O CH2 CH2 CH2 NH2 NH2CHC CH2 HO O CH2 CH2 CH2 NH3 NH2CHC CH2 HO O N HN NH2CHC CH2 HO O NH N Aspartate Glutamate Lysine Histidine
  44. 44. 44 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protonation state of amino acid HA H+ + A- protonated form unprotonated form (conjugate base) Henderson/Hasselbalch equation
  45. 45. 45 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protonation state of amino acid3.1 Amino Acids 77 pKa values Abbreviation/ pK1 pK2 pKR Hydropathy Occurrence in Amino acid symbol Mr* (OCOOH) (ONHϩ 3 ) (R group) pI index† proteins (%)‡ Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 20.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 21.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 Isoleucine Ile I 131 2.36 9.68 6.02 4.5 5.3 Methionine Met M 149 2.28 9.21 5.74 1.9 2.3 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 2.8 3.9 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 Ϫ1.3 3.2 Tryptophan Trp W 204 2.38 9.39 5.89 Ϫ0.9 1.4 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 Ϫ0.8 6.8 Threonine Thr T 119 2.11 9.62 5.87 Ϫ0.7 5.9 Cysteine¶ Cys C 121 1.96 10.28 8.18 5.07 2.5 1.9 Asparagine Asn N 132 2.02 8.80 5.41 Ϫ3.5 4.3 Glutamine Gln Q 146 2.17 9.13 5.65 Ϫ3.5 4.2 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 Ϫ3.9 5.9 Histidine His H 155 1.82 9.17 6.00 7.59 Ϫ3.2 2.3 Arginine Arg R 174 2.17 9.04 12.48 10.76 Ϫ4.5 5.1 Negatively charged R groups Aspartate Asp D 133 1.88 9.60 3.65 2.77 Ϫ3.5 5.3 Glutamate Glu E 147 2.19 9.67 4.25 3.22 Ϫ3.5 6.3 *Mr values reflect the structures as shown in Figure 3–5. The elements of water (Mr 18) are deleted when the amino acid is incorporated into a polypeptide. † A scale combining hydrophobicity and hydrophilicity of R groups. The values reflect the free energy (DG) of transfer of the amino acid side chain from a hydrophobic solvent to water. This transfer is favorable (DG , 0; negative value in the index) for charged or polar amino acid side chains, and unfavorable (DG . 0; positive value in the index) for amino acids with nonpolar or more hydrophobic side chains. See Chapter 11. From Kyte, J. & Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. ‡ Average occurrence in more than 1,150 proteins. From Doolittle, R.F. (1989) Redundancies in protein sequences. In Prediction of Protein Structure and the Principles of Protein Conformation (Fasman, G.D., ed.), pp. 599–623, Plenum Press, New York. ¶ Cysteine is generally classified as polar despite having a positive hydropathy index. This reflects the ability of the sulfhydryl group to act as a weak acid and to form a weak hydrogen bond with oxygen or nitrogen. TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins AcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408AcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408 /Users/user-F408/Desktop/Users/user-F408/Desktop 3.1 Amino Acids 79 chains of alanine, valine, leucine, and isoleucine tend to cluster together within proteins, stabilizing pro- tein structure by means of hydrophobic interactions. Glycine has the simplest structure. Although it is most easily grouped with the nonpolar amino acids, its very small side chain makes no real contribution to hydropho- bic interactions. Methionine, one of the two sulfur- containing amino acids, has a slightly nonpolar thioether group in its side chain. Proline has an aliphatic side Aromatic R Groups Phenylalanine, tyrosine, and tryp- tophan, with their aromatic side chains, are relatively nonpolar (hydrophobic). All can participate in hydro- phobic interactions. The hydroxyl group of tyrosine can form hydrogen bonds, and it is an important functional group in some enzymes. Tyrosine and tryptophan are significantly more polar than phenylalanine, because of the tyrosine hydroxyl group and the nitrogen of the tryptophan indole ring. Nonpolar, aliphatic R groups Glycine Alanine Valine Aromatic R groups Phenylalanine Tyrosine Proline Tryptophan Polar, uncharged R groups Serine Positively charged R groups Lysine Arginine Histidine Negatively charged R groups Aspartate GlutamateGlutamineAsparagine Cysteine H3N ϩ C COOϪ H H H3N ϩ C COOϪ CH3 H H3N ϩ C COOϪ C CH3 CH3 H H Leucine H3N ϩ C COOϪ C C CH3 CH3 H H2 H Methionine H3N ϩ C COOϪ C C S CH3 H2 H2 H H3N ϩ C COOϪ CH2 H OH Threonine H3N ϩ C COOϪ H C CH3 OH H H3N ϩ C COOϪ C SH H2 H H2N ϩ H2C C COOϪ H C CH2 H 2 H3N ϩ C COOϪ C COOϪ H2 H H3N ϩ C COOϪ C C COOϪ H2 H2 H ϩ N C C C C H3N ϩ C COOϪ H H2 H2 H2 H2 H3 C N C C C H3N ϩ C COOϪ H H2 H2 H2 H NH2 N ϩ H2 H3N ϩ C COOϪ C C C H2N O H2 H2 HH3N ϩ C COOϪ C C H2N O H2 H H3N ϩ C COOϪ CH C NH 2 H C H N CH H3N ϩ C COOϪ CH2 H H3N ϩ C COOϪ C C CH H2 H NH Isoleucine H3 ϩ C COOϪ H C C CH3 H2 H HN C 3 H3N ϩ C COOϪ CH2 H OH FIGURE 3–5 The 20 common amino acids of proteins. The structural formulas show the state of ionization that would predominate at pH 7.0. The unshaded portions are those common to all the amino acids; the shaded portions are the R groups. Although the R group of histidine is shown uncharged, its pKa (see Table 3–1) is such that a small but signifi- cant fraction of these groups are positively charged at pH 7.0. The pro- tonated form of histidine is shown above the graph in Figure 3–12b. c03AminoAcidsPeptidesAndProteins.indd Page 79 11/08/12 12:47 PM user-F408c03AminoAcidsPeptidesAndProteins.indd Page 79 11/08/12 12:47 PM user-F408 /Users/user-F408/De/Users/user-F408/De
  46. 46. 46 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protonation state of amino acid NH2CHC CH2 HO O CH2 CH2 CH2 NH2 Lysine 3.1 Amino Acids 77 pKa values Abbreviation/ pK1 pK2 pKR Hydropathy Occurrence in Amino acid symbol Mr* (OCOOH) (ONHϩ 3 ) (R group) pI index† proteins (%)‡ Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 20.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 21.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 Isoleucine Ile I 131 2.36 9.68 6.02 4.5 5.3 Methionine Met M 149 2.28 9.21 5.74 1.9 2.3 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 2.8 3.9 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 Ϫ1.3 3.2 Tryptophan Trp W 204 2.38 9.39 5.89 Ϫ0.9 1.4 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 Ϫ0.8 6.8 Threonine Thr T 119 2.11 9.62 5.87 Ϫ0.7 5.9 Cysteine¶ Cys C 121 1.96 10.28 8.18 5.07 2.5 1.9 Asparagine Asn N 132 2.02 8.80 5.41 Ϫ3.5 4.3 Glutamine Gln Q 146 2.17 9.13 5.65 Ϫ3.5 4.2 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 Ϫ3.9 5.9 Histidine His H 155 1.82 9.17 6.00 7.59 Ϫ3.2 2.3 Arginine Arg R 174 2.17 9.04 12.48 10.76 Ϫ4.5 5.1 Negatively charged R groups TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins 3.1 Amino Acids 77 pKa values Abbreviation/ pK1 pK2 pKR Hydropathy Occurrence in Amino acid symbol Mr* (OCOOH) (ONHϩ 3 ) (R group) pI index† proteins (%)‡ Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 20.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 21.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins c03AminoAcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408c03AminoAcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408 /Users/user-F408/Des/Users/user-F408/Des Charge at pH 4, 7, 9 ? pH < pKa = Protonated state pH > pKa = Deprotonated state
  47. 47. 47 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY protonation state of amino acid Cysteine 3.1 Amino Acids 77 pKa values Abbreviation/ pK1 pK2 pKR Hydropathy Occurrence in Amino acid symbol Mr* (OCOOH) (ONHϩ 3 ) (R group) pI index† proteins (%)‡ Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 20.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 21.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 Isoleucine Ile I 131 2.36 9.68 6.02 4.5 5.3 Methionine Met M 149 2.28 9.21 5.74 1.9 2.3 Aromatic R groups Phenylalanine Phe F 165 1.83 9.13 5.48 2.8 3.9 Tyrosine Tyr Y 181 2.20 9.11 10.07 5.66 Ϫ1.3 3.2 Tryptophan Trp W 204 2.38 9.39 5.89 Ϫ0.9 1.4 Polar, uncharged R groups Serine Ser S 105 2.21 9.15 5.68 Ϫ0.8 6.8 Threonine Thr T 119 2.11 9.62 5.87 Ϫ0.7 5.9 Cysteine¶ Cys C 121 1.96 10.28 8.18 5.07 2.5 1.9 Asparagine Asn N 132 2.02 8.80 5.41 Ϫ3.5 4.3 Glutamine Gln Q 146 2.17 9.13 5.65 Ϫ3.5 4.2 Positively charged R groups Lysine Lys K 146 2.18 8.95 10.53 9.74 Ϫ3.9 5.9 Histidine His H 155 1.82 9.17 6.00 7.59 Ϫ3.2 2.3 Arginine Arg R 174 2.17 9.04 12.48 10.76 Ϫ4.5 5.1 Negatively charged R groups TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins 3.1 Amino Acids 77 pKa values Abbreviation/ pK1 pK2 pKR Hydropathy Occurrence in Amino acid symbol Mr* (OCOOH) (ONHϩ 3 ) (R group) pI index† proteins (%)‡ Nonpolar, aliphatic R groups Glycine Gly G 75 2.34 9.60 5.97 20.4 7.2 Alanine Ala A 89 2.34 9.69 6.01 1.8 7.8 Proline Pro P 115 1.99 10.96 6.48 21.6 5.2 Valine Val V 117 2.32 9.62 5.97 4.2 6.6 Leucine Leu L 131 2.36 9.60 5.98 3.8 9.1 TABLE 3–1 Properties and Conventions Associated with the Common Amino Acids Found in Proteins c03AminoAcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408c03AminoAcidsPeptidesAndProteins.indd Page 77 11/08/12 12:47 PM user-F408 /Users/user-F408/Des/Users/user-F408/Des Charge at pH 4, 7, 9 ? NH2CHC CH2 HO O SH
  48. 48. Bioinformatics01554571 Molecular Docking
 and Virtual Screening 
 Orathai Sawatdichaikul, Ph.D. 
 Functional Food Unit, Department of Nutrition and Health Institute of Food Research and Product Development Kasetsart University Email: ifrots@ku.ac.th !48
  49. 49. Bioinformatics01554571 Molecular docking • Docking used for finding binding modes of protein with ligands/inhibitors • In molecular docking, we attempt to predict the structure of the intermolecular complex formed between two or more molecules • Docking algorithms are able to generate a large number of possible structures • We use force field based strategy to carry out docking !49
  50. 50. Bioinformatics01554571 What molecular docking can do ? • Find potential drugs • Find active site of enzyme • Find potential inhibitor binding site • Find conformation of ligands in binding state • Predict change of conformation upon binding • Only can flexible the side chain but can’t change backbone conformation What molecular docking cannot do ? !50
  51. 51. Bioinformatics01554571 Types of Docking studies • Protein-Protein Docking • Both molecules usually considered rigid • First apply steric constraints to limit search space and the examine energetics of possible binding conformations • Protein-Ligand Docking • Flexible ligand, rigid-receptor • Search space much larger • Either reduce flexible ligand to rigid fragments connected by one or several hinges, or search the conformational space using Monte Carlo methods or molecular dynamics !51
  52. 52. Bioinformatics01554571 Why Ligand-Protein Docking?
 • Molecular recognition is a central phenomenon in biology • Enzymes ↔ Substrates • Receptors ↔ Signal inducing ligands • Antibodies ↔ Antigens • Classifying docking problems in biology • Protein-ligand docking – Rigid-body docking – Flexible docking • Protein-protein docking • Protein-DNA docking • DNA-ligand docking • Ligand-Protein Docking • Proteins ↔ Drugs • Proteins ↔ Natural Small Molecule Substrates !52
  53. 53. Bioinformatics01554571 The Molecular Docking Problem • Given two molecules with 3D conformations in atomic details: • Do the molecules bind to each other? If yes: • How does the molecule-molecule complex looks like? • How strong is the binding affinity? • Structures of protein-ligand complexes • X-ray • NMR • Importance of the protein 3D structures • Resolution < 2.5Å • Homology modeling problematic !53
  54. 54. Bioinformatics01554571 Basic Principles • The association of molecules is based on interactions • H-bonds, salt bridges, hydrophobic contacts, electrostatic • Very strong repulsive (VdW) interactions on short distances. • Association interactions are weak and short ranged. • Strong binding implies surface complementarity. • Most molecules are flexible. !54
  55. 55. Bioinformatics01554571 Steps of molecular docking • Three steps (1) Definition of the structure of the target molecule (2) Location of the binding site (3) Determination of the binding mode !55
  56. 56. Bioinformatics01554571 Details of search Level-of-Detail • Atom types • Terms of force field • Bond stretching • Bond-angle bending • Torsional potentials • Polarizability terms • Implicit solvation !56
  57. 57. Bioinformatics01554571 Kinds of search • Several algorithms can be used to do the docking • Monte Carlo • Simulated Annealing • Genetic Algorithms !57
  58. 58. Bioinformatics01554571 Genetic algorithm • Initially a population of conformations is generated • Scoring algorithm evaluates the fitness of each conformation • conformation=chromosome • Genetic operations occur • Crossing-over • Fit members of the population crossover and replace the worst member of the population • Migration • Mutation !58
  59. 59. Bioinformatics01554571 Search parameters •Population size •Crossover rate •Mutation rate •Local search • energy evals •Termination criteria • energy evals • generations !59
  60. 60. Bioinformatics01554571 Scoring Function • We would like to have a function which: >>> given a configuration of protein and the ligand <<< returns a number representing "goodness" or "energy" of the configuration. • Desired properties: • (Ideally) Lowest value when the ligand is naturally docked. • Higher value everywhere else • Should be able to distinguish between correctly and incorrectly docked structures. • Should be fast! to compute. !60
  61. 61. Bioinformatics01554571 Thermodynamic cycle for the binding of enzyme and inhibitor !61
  62. 62. Bioinformatics01554571 Scoring Functions • Shape & Chemical Complementary Scores • Empirical Scoring • Force Field Scoring • Knowledge-based Scoring • Consensus Scoring !62
  63. 63. Bioinformatics01554571 Shape & Chemical Complementary Scores • Divide accessible protein surface into zones: • Hydrophobic • Hydrogen-bond donating • Hydrogen-bond accepting • Do the same for the ligand surface • Find ligand orientation with best complementarity score !63
  64. 64. Bioinformatics01554571 Docking Accuracy • Check from Self-Docking • Choose a PDB structure complexing with ligand • Docking ligand back into own protein • Result check with RMSD comparing the docking result with real crystal conformation • Many factor make it deviated • Water effect • Crstal effect • Temperature effect • Force field effect !64
  65. 65. Bioinformatics01554571 Docking Software DOCK: (Kuntz et al. 1982) DOCK 4.0 (Ewing & Kuntz 1997) AutoDOCK (Goodsell & Olson 1990) AutoDOCK 3.0 (Morris et al. 1998) GOLD (Jones et al. 1997) FlexX: (Rarey et al. 1996) GLIDE: (Friesner et al. 2004) ADAM (Mizutani et al. 1994) CDOCKER (Wu et al. 2003) CombiDOCK (Sun et al. 1998) DIVALI (Clark & Ajay 1995) DockVision (Hart & Read 1992) FLOG (Miller et al. 1994) GEMDOCK (Yang & Chen 2004) Hammerhead (Welch et al. 1996) LIBDOCK (Diller & Merz 2001) MCDOCK (Liu & Wang 1999) PRO_LEADS (Baxter et al. 1998) SDOCKER (Wu et al. 2004) QXP (McMartin & Bohacek 1997) Validate (Head et al. 1996) ➢ de novo design tools LUDI (Boehm 1992), BUILDER (Roe & Kuntz 1995) SMOG (DeWitte et al. 1997) CONCEPTS (Pearlman & Murcko 1996) DLD/MCSS (Stultz & Karplus 2000) Genstar (Rotstein & Murcko 1993) Group-Build (Rotstein & Murcko 1993) Grow (Moon & Howe 1991) HOOK (Eisen et al. 1994) Legend (Nishibata & Itai 1993) MCDNLG (Gehlhaar et al. 1995) SPROUT (Gillet et al. 1993) !65
  66. 66. Bioinformatics01554571 Other Docking programs AutoDock • AutoDock was designed to dock flexible ligands into receptor binding sites • The strongest feature of AutoDock is the range of powerful optimization algorithms available GOLD • Genetic Optimization for Ligand Docking • Using Genetic algorithm • two scoring functions: GoldScore or ChemScore !66
  67. 67. Bioinformatics01554571 Sialic acid-Hemagglutinin !67
  68. 68. 68 DRUG DISCOVERY PROCESS GenomicsIdentify drug target Clone & Expression Purify drug target Random screening of corporate compounds, natural products or combinatorial libraries LEAD compound 3D coordinates of drug target Structure-function analysis of drug target Structure-based de novo design/ 3D database searching Potent compound Drug candidate Drug Structure-based drug design X-ray / NMR in vitro optimization in vivo optimization in vivo optimization in vitro optimization Traditional medicinal chemistry and/or template-based combinatorial libraries
  69. 69. 69 DRUG DEVELOPMENT PROCESS http://digitalunion.osu.edu/r2/summer09/jaeger/images/DDevelopment.jpg Market Introduction Validate effectiveness of treatment Evaluate effectiveness of treatment Determine side effects Assess toxicity Evaluate route of administration Determine safe dosage Animal trials 5-20 compounds 2-5 compounds ~50 Healthy Subjects 2 compounds ~250 Patients 1 compound ~3,000 patients FDA APPROVAL PHASE 3 PHASE 2 PHASE 1 PHASE 0 YEAR 8-12 YEAR 5-9 YEAR 3-6 YEAR 1-4 PRECLINICAL >10,000-20,000 compounds In vitro screening Molecular biology studies Molecular discovery and characterisation
  70. 70. Bioinformatics01554571 “DRUG-LIKENESS” AND COMPOUND FILTERS • Which features of drug molecules confer biological activity? • Substructure filters to eliminate molecules known to have problems • For a specific target, may have to modify or extend the filters • Analyze the values of simple properties (MW, logP, No. of rotatable bonds) !70
  71. 71. Bioinformatics01554571 Lipinski Rule of Five • Poor absorption or permeation is more likely when: • MW > 500 • LogP >5 • More than 5 H-bond donors (sum of OH and NH groups) • More than 10 H-bond acceptors (sum of N and O atoms) !71
  72. 72. Bioinformatics01554571 Other Findings • 70% of drug-like molecules have: • Between 0 and 2 H-bond donors • Between 2 and 9 H-bond acceptors • Between 2 and 8 rotatable bonds • Between 1 and 4 rings • Other techniques (neural networks, genetic algorithms, decision trees) consider more complex possibilities !72
  73. 73. Bioinformatics01554571 PREDICTION OF 
 ADMET PROPERTIES • Requirements for a drug: • Must bind tightly to the biological target in vivo • Must pass through one or more physiological bariers (cell membrane or blood-brain barrier) • Must remain long enough to take effect • Must be removed from the body by metabolism, excretion, or other means • ADMET: Absorption, Distribution, metabolism, Excretion (Elimination), Toxicity !73
  74. 74. 74 X-ray crystal structure Validation the protein structure Ligands structure Validation the ligands Optimization molecular self-docking with co- crystal ligand RMSD <= 2 No YesMolecular docking with ligands Analyses THE OVERVIEW METHODOLOGY
  75. 75. MOLECULAR DOCKING conformation. In the first step, the intramolecular energetics are estima these unbound states to the conformation of the ligand and protein in th step then evaluates the intermolecular energetics of combining the l bound conformation. The force field includes six pair-wise evaluations (V) and an estim entropy lost upon binding (ΔSconf): € ΔG = (Vbound L−L −Vunbound L−L )+(Vbound P−P −Vunbound P−P )+(Vbound P−L −Vunbound P−L +ΔSconf ) where L refers to the “ligand” and P refers to the “protein” in calculation. Force field: Each of the pair-wise energetic terms includes evaluations for dispersion/repulsion, hydrogen bonding, electrostatics, and desolvation: € V =Wvdw Aij rij 12 − Bij rij 6       i, j ∑ +Whbond E(t) Cij rij 12 − Dij rij 10       i, j ∑ +Welec qiqj e(rij )riji, j ∑ +Wsol SiVj +SjVi( ) i, j ∑ e (−rij 2 /2σ2 ) The weighting constants W have been optimized to calibrate the empirical free energy based on a set of experimentally determined binding constants. The first term is a typical 6/12 potential for dispersion/repulsion interactions. The parameters are based on the Amber force field. The second term is a directional H-bond term based on a 10/12 potential. The parameters C and D are assigned to give a maximal well depth of 5 kcal/mol at 1.9Å for hydrogen bonds with oxygen and nitrogen, and a well depth of 1 kcal/mol at 2.5Å for hydrogen bonds with sulfur. The function E(t) provides directionality based on the angle t from ideal H-bonding geometry. The third term is a screened Coulomb potential for electrostatics. The final term is a desolvation potential based on the volume of atoms (V) that surround a given atom and shelter it from solvent, weighted by a solvation parameter (S) and an exponential term with distance-weighting factor σ=3.5Å. For a detailed presentation of these functions, please see our published reports,75
  76. 76. http://www.slideshare.net/qchemforqespresso/lecture5-46436732 MD is a computer simulation technique that allows one to predict the time evolution of a system of interacting particles (atoms, molecules, granules, etc.) using forcefield for calculation. Simulations can provide the ultimate detail concerning individual particle motions as a function of time. For example, by what pathways does oxygen enter into and exit from the heme pocket in myoglobin? The first protein simulations appeared in 1977 with the simulation of the bovine pancreatic trypsin inhibitor (BPTI) (McCammon, et al., 1977). (uses thermal energy to move smoothly over surface) Molecular dynamics simulations 76
  77. 77. MD The forcefield is a collection of equations and associated constants designed to reproduce molecular geometry and selected properties of tested structures. Classical Dynamics (Newton’s Equation) properties of molecules directly from the underling interactions between the molecules. Classical Dynamics • Newton’s Equations of motion • Position, speed and acceleration are functions of time ri(t); vi(t); ai(t) • The force is related to the acceleration and, in turn, to the potential energy • Integration of the equations of motion => initial structure : ri(t=0); initial distribution of velocities: vi(t=0) Fi = mi !ai = mi ! dvi dt = m! d2 ri dt2 Fi = !"i E forcefield The idea • Mimic what atoms do in real life, assumin given potential energy function – The energy function allows us to calculate the experienced by any atom given the positions other atoms – Newton’s laws tell us how those forces will aff motions of the atoms Energy(U) PositionPosition where; F is force on an atom m is mass of the atom a is the atom’s acceleration R represents coordinates of all atoms U is the potential energy function Velocity (v) is the derivative of position (r), and acceleration (a) is the derivative of velocity (v). Force field properties of molecules directly from the underling interactions between the molecules. Classical Dynamics • Newton’s Equations of motion • Position, speed and acceleration are functions of time ri(t); vi(t); ai(t) • The force is related to the acceleration and, in turn, to the potential energy • Integration of the equations of motion => initial structure : ri(t=0); initial distribution of velocities: vi(t=0) Fi = mi !ai = mi ! dvi dt = m! d2 ri dt2 Fi = !"i E 77
  78. 78. MD forcefield MOLECULAR POTENTI Simple sum over many terms UUU All Bonds All Angles All partial charges All Nonbonded pairs All Torsion Angles Hooke 1635 Fourier 1768 Van der Waals 1837 Coulomb 1736 All Bonds = oscillations about the equilibrium bond length All Angles = oscillations of 3 atoms about an equilibrium angle All Torsion Angles = torsional rotation of 4 atoms about a central bond All Nonbonded pairs = non-bonded energy terms (electrostatics and Lenard-Jones) Birth and Future of Multiscale Modeling for Macromolecular Systems: Nobel Lecture Slides Michael Levitt Force field 78
  79. 79. Molecular docking approach 1 2 3 4 5 6 Procedure Preparinga protein Preparinga ligand Generatingagrid parameterfile Generatingmaps andgriddata files Generatinga dockingparameter file Runningdocking 79
  80. 80. Preparing a protein1 In this step, we will create a PDBQT file of protein (or receptor) that contain hydrogen atoms, as well as partial charges called“Gasteiger charges” Gasteiger charges are based on an empirical scheme that only requires knowledge of the topology of a molecule and therefore popular for its simple and fast computation (Gasteiger & Marsili, 1980). Input: protein.pdb Output: protein.pdbqt Command: >> python prepare_receptor4.py -r protein.pdb -o protein.pdbqt 80 Molecular docking approach
  81. 81. Preparing a ligand 2 This step is quite similar to the preparation of the receptor (step 1), we create a PDBQT file from a ligand molecule. Autodock uses a scoring function based on the AMBER force field, and estimates the free energy of binding of a ligand to its target. Input: ligand.pdb Output: ligand.pdbqt Command: >> python prepare_ligand4.py -l ligand.pdb -A bonds_hydrogens -U lps -o ligand.pdbqt 81 Molecular docking approach
  82. 82. Generating a grid parameter file3 Define the 3D space for docking, typically, a volume around a the potential bind site of a receptor. In this step, we will create the input file for“AutoGrid4”, which will create different“map”files and the grid data file Input: ligand.pdbqt protein.pdbqt Output: protein.gpf Command: >> python prepare_gpf4.py -l ligand.pdbqt -r protein.pdbqt -i ref.gpf -o protein.gpf npts 60 60 601 gridfld protein.maps.fld2 spacing 0.3753 receptor_types A C HD N NA OA SA4 ligand_types A C Cl F HD N NA OA5 receptor protein.pdbqt6 gridcenter 16.387 17.394 26.2187 smooth 0.58 map protein.A.map9 map protein.C.map 82 Molecular docking approach
  83. 83. Generating maps and grid data files 4 We created the grid parameter file in the previous step, and now we can use AutoGrid to generate a bunch of the different map files and the main grid data file. Input: protein.pdbqt protein.gpf Output: protein.*.map protein.maps.fld protein.d.map protein.e.map Command: >> autogrid4 -p protein.gpf -l protein.glg 83 Molecular docking approach
  84. 84. Generating a docking parameter file5 The last before we can run the actual re-scoring is to prepare the docking parameter file that bundles the information that is required by Autodock. Input: ligand.pdbqt protein.pdbqt Output: ligand.dpf Command: >> python prepare_dpf4.py -l ligand.pdbqt -r protein.pdbqt -i ref.dpf -o ligand.gpf set_ga35 sw_max_its 30036 sw_max_succ 437 sw_max_fail 438 sw_rho 1.039 sw_lb_rho 0.0140 ls_search_freq 0.0641 set_psw142 unbound_model extended43 ga_run 5044 analysis45 84 Molecular docking approach
  85. 85. Running Autodock 6 We should have created a whole bunch of different files and should be ready to score the protein-ligand complex. Input: ligand.dpf Output: ligand.dlg Command: >> autodock4 -p ligand.dpf -l ligand.dlg 85 Molecular docking approach
  86. 86. 86 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY problem PDB:1M17 - EPIDERMAL GROWTH FACTOR RECEPTOR (EGFR) [LUNG CANCER]
  87. 87. Extracellular segment Cytoplasm Plasma membrane JM-A JM-B Juxtamembrane segment Activation loop N-lobe C-lobe C-terminal tail Kinase domain 87 The basic structure of EGFR family protein
  88. 88. Plasma membrane EGF ATP ATP EGF EGF ATP C N C N N N C C N N C C P P P P 2. Trans-autophosphorylation 3. Active phosphorylated receptor dimer I II III IV I II III IV I II III IV II I III IV I II III IV II I III IV Egfr ErbB2 1. Inactive receptor monomer Dimerization arms Transmembrane helices Tyrosyl Phosphorylation sites ATP JM-A helical dimer JM-B N C 88 The EGFR heterodimer form
  89. 89. Active site β1 β2 β3 β4 β5 αC αD αE αF αG αH αI Activation loop SYR127063 N-lobe C-lobe P-loop Catalytic loop Hinge loop The basic structure of protein kinase domain of EGFR protein 89
  90. 90. Protein tyrosine kinase Protein tyrosine phosphatase P P Phosphorylated substrate Signal in Signal out ATP PTK PTP Substrate PHOSPHORYLATION
  91. 91. EGF AR TGF-α EPG EPR HB-EGF NRG1 NRG1 NRG4 NRG2 NRG3 EPG EPR HB-EGF HER family dimerization P P SOS RAS RAF MEK MAPK P PI3K AKTP LPA thrombin ET, etc. TGF-α epiregulin amphi- regulin NRG-1 cytokineEGF β–cellulin HB-EGF NRG-2 NRG-3 NRG-4 SRC CBL PLC PKC BAD S6K AKT PI3K SHP2 GRB7 GRB2GAP SHC SOS MAPK RAF RAC MEK PAK ABL JNK NCK VAV CRK STATERG1MYCSP1 FOS JUN ELK JAKRAS-GDP RAS-GTP JNKK plasma membrane cytoplasm nucleus EGFR/HER1 HER2 HER3 HER4 Nucleus HER4 ligand HER1 ligand HER3 ligand INPUTLAYERSIGNALPROCESSING growth factor ligands trans- membrane receptors adaptors and enzymes signaling cascades transcription factors Cytoplasm Plasma membrane
  92. 92. AFATINIB LAPATINIB VANDETANIB ATP HYDROPHOBIC POCKET II HYDROPHOBIC POCKET I HYDROPHOBIC POCKET I HYDROPHOBIC POCKET I HYDROPHOBIC POCKET II HYDROPHOBIC POCKET II ALLOSTERIC SITE ATP ANALOG 92
  93. 93. 93 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY problem i would like to select candidate herbal compounds which have ability to inhibit this protein before to take it with the wet lab. this compounds are extracted from natural products. please create your ideas to solve this problem, but you have one compound only to take it for molecular docking
  94. 94. 94 BUNDIT BOONYARIT COMPUTATIONAL BIOLOGY herbal compounds

×