SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Computational Protein Design
                         2. Computational Protein Design Techniques


                                           Pablo Carbonell
                           pablo.carbonell@issb.genopole.fr

                               iSSB, Institute of Systems and Synthetic Biology
                              Genopole, University d’Évry-Val d’Essonne, France



                                     mSSB: December 2010




Pablo Carbonell (iSSB)                    Computational Protein Design            mSSB: December 2010   1 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   2 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   3 / 45
Computational Protein Design




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   4 / 45
A Blueprint of CPD Approaches




∗ RS : research studies
           Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   5 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   6 / 45
Molecular Signature Descriptors



   A 2D representation of the molecular graphs                     Atomic signature :
   as an undirected colored graphs G(V , E, C),
                                                                                Xh
   with V : atoms, E : bonds, C : atom type                            h
                                                                         σ(G) =      σ(x)               (1)
   The signature descriptor of height h of atom x                                      x∈V
   in the molecular graph G, or h σ(x), is a
                                                                   The signature is a systematic
   canonical representation of the subgraph of
                                                                   codification of the molecular
   G containing all atoms that are at distance h
                                                                   graph [Faulon et al., 2004]
   from x


                                            σ(methylcyclopropane) =
                                            1   [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))
                                            2   [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))
                                            1   [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))
                                            1   [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])))
                                            4   [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])))
                                            3   [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H]))))




     Pablo Carbonell (iSSB)       Computational Protein Design                    mSSB: December 2010   7 / 45
Molecular Signature of Reactions and Proteins


   Signature of a reaction. The signature of reaction R

                              S1 + S2 + . . . + Sn        →       P1 + P2 + . . . + Pn                         (2)

   that transforms n substrates into m products is given by the difference between the
   signature of the products and the signature of the substrates:
                           h
                                        Xh           Xh
                             σ(R) =          σ(p) −       σ(s)                      (3)
                                                     p∈P                  s∈S

   Signature of protein sequences. The protein P is represented by the linear
   chain given by its collapsed graph at residue level, a reduced molecular graph
   representation G(V , E, C) known as string signature where V : residues a ∈ A,
   E : contiguous in sequence, C : amino acid type

                                          h
                                                               Xh
                                              σ(P)     =                 σ(a)                                  (4)
                                                               a∈A




     Pablo Carbonell (iSSB)               Computational Protein Design                   mSSB: December 2010   8 / 45
Protein Contact Maps




   The protein contact map is a graph
   representation of the 3D interactions
   at residue level G(V , E, C) where V :
   residues, E : contacts, C : amino acid
   type
   Two residues are considered to
   interact when atoms between both
   residues are at a distance lower than a
   predetermined threshold (tipically
   4.5 ∼ 5 Å)
   Contact maps can account for
   long-range interactions and
   conformational states

                                                 Song et al. [2010]




     Pablo Carbonell (iSSB)       Computational Protein Design        mSSB: December 2010   9 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   10 / 45
Sequence and Structure-Based CPD




   Sequence-based CPD methods are in some cases a good trade-off between
   complexity of the model and accuracy of the predictions



    Pablo Carbonell (iSSB)    Computational Protein Design   mSSB: December 2010   11 / 45
Sequence-based Knowledge-based potentials



   The simplest way to score a protein and to identify active regions is through amino
   acid scales or indexes
   AAindex is a database of
        544 amino acid indexes
        94 Amino Acid Matrices
        47 amino acid pair-wise contact potentials

  Examples: hydrophobicity,
  accessibility, van der Waals volume,
  secondary structure propensity,
  flexibility
  This approach is widely used when
  analyzing conserved motifs and
  correlated mutations in protein fold
  families through multiple alignments




    Pablo Carbonell (iSSB)          Computational Protein Design   mSSB: December 2010   12 / 45
Quantitative Structure-Activity Relationship (QSAR) Techniques

                                               The goal is to model causal relationships
   QSAR is a statistical method used
                                               between
   extensively by the chemical and
   pharmaceutical industries in                        structures of interacting molecules
   small-molecules and peptide                         measurables properties of scientific
   optimization                                        or commercial interest such as
                                                       ADME/Tox (absorption, distribution,
                                                       metabolism, excretion, and toxicity) of
                                                       drugs




     Pablo Carbonell (iSSB)     Computational Protein Design              mSSB: December 2010   13 / 45
QSAR Model Evaluation




   Model predictability is generally evaluated through the leave-one-out (LOO)
   cross-validation correlation coefficient q 2
   Partial least-squares (PLS) regression is commonly used
   Additional nonlinear terms can be added through the use of nonlinear regression
   or machine learning techniques (kernel methods, random forests, etc)




    Pablo Carbonell (iSSB)       Computational Protein Design     mSSB: December 2010   14 / 45
QSAR Modeling Workflow




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   15 / 45
Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   16 / 45
Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   17 / 45
The ProSAR Algorithm




   An extension of SAR-based approaches to CPD
   It formalizes the decision-making processes about which mutations to include in
   combinatorial libraries
                                                    N
                                                    XX
                                      y      =                  cij xij                          (5)
                                                     i=1 j∈A


        y : the predicted function (activity) of the protein sequence
        cij : the regression coefficients corresponding to the mutational effect of having residue
        j among the 20 amino acids A at postion i
        xij : binary variable indicating the presence or absence of residue j at position i




    Pablo Carbonell (iSSB)           Computational Protein Design          mSSB: December 2010   18 / 45
Improving Catalytic Function by ProSAR-driven Enzyme Evolution




                                                     Statistical analysis of protein sequence
                                                               activity relationships




                                                       Bacterial biocatalysis of
                                                        Atorvastatin (Lipitor)
                                                     (cholesterol-lowering drug)
                   Codexis Inc.


     Pablo Carbonell (iSSB)       Computational Protein Design               mSSB: December 2010   19 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   20 / 45
Structure-based CPD




  Energy functions and molecular force fields
  Local conformational restrictions
  Predicting entropic factors
  Protein topological properties




                                                                  From Narasimhan et al. [2010]




    Pablo Carbonell (iSSB)         Computational Protein Design                            mSSB: December 2010   21 / 45
Energy Functions and Molecular Force Fields




   In structure-based CPD, folds are usually
   represented by the spatial coordinates of the
   backbone atoms or design scaffold
   Protein design is done by amino acid side
   chains along the scaffold

   Side chains are only permitted to assume a
   discrete set of statistically preferred
   conformations: rotamers
   Rotamer/backbone and rotamer/rotamer
   interaction energies are tabulated
   These potential energies can then be
   approximated by using any of the standard
   force fields : CHARMM, AMBER, GROMOS



     Pablo Carbonell (iSSB)       Computational Protein Design   mSSB: December 2010   22 / 45
Molecular Force Fields

AMBER: a classical force field for energy and MD calculations:

                         X 1                       X 1                      X 1
     V (r N )     =              kb (l − l0 )2 +          ka (θ − θ0 )2 +             Vn [1 + cos(nω − γ)]
                               2                        2                           2
                         bonds                   angles                   torsions
                            N−1 X
                                       ( "„ «                  „ «6 #                  )
                            X N                  r0ij
                                                       12
                                                                 r0ij           qi qj
                         +                i,j             −2             +                                 (6)
                                                  rij             rij        4π 0 rij
                             j=1 i=j+1

      P
 1                (·): energy between covalently bonded atoms.
      Pbonds
        angles (·): energy due to the geometry of electron orbitals involved in covalent
 2

      bonding.
      P
        torsions (·): energy for twisting a bond due to bond order (e.g. double bonds) and
 3

      neighboring bonds or lone pairs of electrons.
      PN−1 PN
                  i=j+1 (·): non-bonded energy between all atom pairs:
 4
        j=1
         1      van der Waals energies
         2      Electrostatic energies



         Pablo Carbonell (iSSB)                Computational Protein Design            mSSB: December 2010   23 / 45
Structure-based Knowledge-based Potentials

       They are built by performing a large-scale statistical study of structural databases
       such as PDB (Protein Data Bank)
               Rotamer libraries (∼ 150 rotameric states)
               Binary patterning: only some type of amino acids are allowed based on the
               hydrophobic environment
               An implicit solvation model
               Secondary structure propensity
               Frequency of small segments in the PDB
               Pairwise potentials
               van der Waals interactions
               Hydrogen bonding
               Electrostatics
               Entropy-based penalties for flexible side-chains




From Boas and Harbury [2007]

          Pablo Carbonell (iSSB)          Computational Protein Design       mSSB: December 2010   24 / 45
Energy Functions



   Design along the backbone or scaffold
   Rotamer/backbone and rotamer/rotamer interact. energies tabulated
   Precomputed from molecular force fields : CHARMM, AMBER, GROMOS

Total energy of the protein
                                       X                   X
                              ETOT =         Ek (rk ) +            Ekl (rk , rl )                         (7)
                                         k                  k =l


   N : length of the protein
   rk : the rotamer of the kth side chain
   Ek (rk ) : the self-energy of a particular rotamer rk
   Ekl (rk , rl ) : the pair energy of rotamers rk , rj




     Pablo Carbonell (iSSB)            Computational Protein Design                 mSSB: December 2010   25 / 45
The Role of Dynamics

   Besides protein structure, protein dynamics can play a direct role in molecular
   recognition
   Flexible proteins recognize their targets through induced fit or conformational
   selection, likely showing promiscuity
   Binding is commonly enthalpy-driven, but in some cases entropy is important, for
   instance:
          Proteins with multiple binding sites
          Small hydrophobic molecules
   Two types of source of protein motions:
          Protein flexibility: intraconformational dynamics (fast time scale motions)
          Conformational heterogeneity: interconformational dynamics

  Gibbs free energy:

               ∆G         =    ∆H − T ∆S                                        (8)
                ∆S        =    ∆Ssolv + ∆Sconf + ∆Srt                           (9)
  ∆Sconf : conformational entropy of protein and ligand

  ∆Srtf : rotational and translational degree of freedoms


     Pablo Carbonell (iSSB)                      Computational Protein Design         mSSB: December 2010   26 / 45
Predicting Side-chain Dynamics from Structural Descriptors




   The Lipari-Szabo model free approach approach allows to quantify motions from
   NMR experiments by computing the generalized order parameter S 2
         Protein backbone dynamics : 15 NH and 13 Cα H NMR relaxation methods
         Protein side chain methyl dynamics : 13 Cα H NMR relaxation methods (side-chain
         motions in the picosecond-to-nanosecond time regime)
   From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or
   more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal
   nuclease, pin1, sh3 domain, MSG
   This technique provides only measurements for the Cα of methyl groups in side
   chains : ALA, LEU, ILE, MET, THR, VAL




     Pablo Carbonell (iSSB)         Computational Protein Design         mSSB: December 2010   27 / 45
Structural Descriptors of Methyl Dynamics



   We consider the following parameters influencing side-chain dynamics :
         Packing density at the methyl site i and its neighboring residues j within a sphere of
         r =5Å
                                                           0             1
                                    X                 X B X
                         Pi =             Cj e−rij =               e−rjk A e−rij             (10)
                                                                         C
                                                           @
                                     rij <5Å               rij <5Å   rjk <5Å

         Side chain stiffness : number of dihedral angles separating the backbone from the
         methyl carbon. weighted by the side-chain packing
         Rotameric state : angular distance ∆χ = χ − χ0 to the closest rotameric state χ0 in
         the library
         Elongation : distance from the methyl site to the Cα
         Pairwise contact potential : a knowledge-based potential of frequence of contacts
         between residues at several distances computed from the PDB
         Solvation effect : DSSP accessibility and residue hydrophobicity
         Van der Waals contacts
         Hydrogen bonds (in the case of Threonine)




     Pablo Carbonell (iSSB)           Computational Protein Design             mSSB: December 2010   28 / 45
Predicting Methyl Side-chain Dynamics
Algorithm : neural network
Cross-validation : r = 0.71 ± 0.029                                  Example : experimental and predicted
(p-value = 4.6 × 10−87 )                                             changes in ∆S 2 of barnase after binding
                                                                     barstar




           Protein        MD method   r (MD)   r (nnet)

           ubiquitin      AMBER99SB   0.81        0.81
           TNfn3          CHARMM 22   0.62        0.79                          ∆S 2 > 0                ∆S 2 < 0
           FNfn10         CHARMM 22   0.51        0.64                         rigidification          flexibilization
           barnase        OPLS-AA/L   0.55        0.64
           calmodulin     FDPB        0.60        0.72


[Carbonell and del Sol, 2009]

           Pablo Carbonell (iSSB)                     Computational Protein Design             mSSB: December 2010     29 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   30 / 45
Search Algorithms in CPD




    Pablo Carbonell (iSSB)   Computational Protein Design   mSSB: December 2010   31 / 45
Search Algorithms



   Objective: finding the best design within the space of all possible amino
   acid/rotameric states
   A vast search space: 20N or pN
        N: number of positions to mutate
        p: number of rotameric states
   Strategies
        Deterministic algorithms
                Dead-end elimination (DEE) algorithm: a pruning method.
                Some accelerations of the DEE algorithm: upper-bound estimation; the “magic bullet” metric;
                conformational splitting; background optimization
        Stochastic algorithms
                Monte Carlo
                Simulated annealing
                Genetic algorithms




    Pablo Carbonell (iSSB)                Computational Protein Design              mSSB: December 2010   32 / 45
The DEE Algorithm



   It assumes that the energy of the protein can be written as
                                     X             X
                         ETOT =         Ek (rk ) +   Ekl (rk , rl )                                (11)
                                               k                      k =l

   N : length of the protein
   rk : the rotamer of the kth side chain
   Ek (rk ):" the self-energy of a particular rotamer rk
   Ekl (rk , rl ): the pair energy of the rotamers rk , rj
   Complexity:
         Single search scales quadratically with total number of rotamers O((p × N)2 )
         Pair search scales cubically O((p × N)3 )
         Brute force enumeration : O(pN )




     Pablo Carbonell (iSSB)            Computational Protein Design          mSSB: December 2010    33 / 45
The DEE Algorithm


   Single rotamers and rotamer pairs are eliminated during the computational cycles
        Single elimination : eliminate rotamer if some other rotamer in the side chain gives
        better energy
                                      N
                                      X                                                   N
                                                                                          X
                              A
                         Ek (rk ) +         min Ekl (rk , rlX )
                                                      A
                                                                      >           B
                                                                             Ek (rk ) +         max Ekl (rk , rlX )
                                                                                                          B
                                                                                                                            (12)
                                             X                                                   X
                                      l=1                                                 l=1

        Pairs elimination : eliminate pair of rotamers in two positions if there exists another
        pair that gives better energy
                                                    def
                                            Ukl = Ek (rk ) + El (rlB ) + Ekl (rk , rlB )
                                             AB        A                       A
                                                                                                                            (13)

                                             N
                                             X         “                                  ”
                                  AB
                                 Ukl +              min Eki (rk , riX ) + Elj (rlB , rjX ) >
                                                              A
                                                       X
                                              i=1
                                                 N
                                                 X        “                                   ”
                                   CD
                                  Ukl +                max Eki (rk , riX ) + Elj (rlD , rjX )
                                                                 C
                                                                                                                            (14)
                                                           X
                                                 i=1

   Values are precomputed and stored in energy matrices



    Pablo Carbonell (iSSB)                        Computational Protein Design                        mSSB: December 2010   34 / 45
Stochastic Algorithms

   Search in the space of feasible designs by making a series of combinations of
   random and directed moves
   Monte Carlo Metropolis: a move consists of exchanging one rotamer for another
   at a randomly chosen position, a modification is accepted if it lowers the energy
   Simulated Annealing allows to explore nearby solutions at the initial cycles of the
   search
   Genetic Algorithms: a population of models is propagated (evolved) throughout
   the course of the run and genetic operators, such as recombination, are used to
   create new models from existing parents
   They are fast, can be scaled up to problems of large complexity
   They are not guaranteed to converge to the optimal solution




     Pablo Carbonell (iSSB)      Computational Protein Design      mSSB: December 2010   35 / 45
The SCHEMA Algorithm




  Equivalent to an in silico directed evolution
  Consists of scoring libraries of hybrid protein
  sequences against the parental sequence
  Scoring:
       Calculate the number of interactions between residues
       (contacts within 4.5 Å) that are disrupted in the creation
       of hybrid proteins
       Hybrids are scored for stability by counting the number of
       disruptions
       Protein is partitioned into blocks that should not
                                                                    From [Meyer et al., 2006]
       interrupted by crossovers (analog to genetic algorithms)




    Pablo Carbonell (iSSB)          Computational Protein Design               mSSB: December 2010   36 / 45
The OPTCOM and IPRO Algorithms for Library Design

       The OPTCOM algorithm:             The IPRO algorithm:
               Balances size and                 Identify point mutations in the parent sequences
               quality of the library            using energy-based scoring fuctions
                                                 Residue and rotamer choices are driven by a
                                                 mixed-integer linear programming formulation
                                                 (MILP)




From [Saraf et al., 2006]


           Pablo Carbonell (iSSB)       Computational Protein Design          mSSB: December 2010   37 / 45
Some Web Resources


   IPRO: Iterative Protein Redesign and Optimization.
   http://maranas.che.psu.edu/IPRO.htm
   EGAD: A Genetic Algorithm for protein Design.
   http://egad.ucsd.edu/software.php
   RosettaDesign: A software package.
   http://rosettadesign.med.unc.edu/
   SCHEMA A pair-wise energy function for scoring protein chimeras made from
   homologous proteins. http://www.che.caltech.edu/groups/fha/
   schema-tools/schema-overview.html
   SHARPEN: Systematic Hierarchical Algorithms for Rotamers and Proteins on
   an Extended Network.
   http://koko.che.caltech.edu/sharpenabout.html
   WHAT IF: Software for protein modelling, design, validation, and
   visualisation. http://swift.cmbi.ru.nl/whatif/
   FoldX: A force field for energy calculations and protein design.
   http://foldx.crg.es/


    Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   38 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   39 / 45
De Novo-Designed Proteins



   In de novo designs, some assumptions are needed in order to make the search
   space tractable
   Usually we start from some basic motifs or domains as scaffolds for the design
   Examples:
        βαβ motif resembling a zinc finger
        3 and 4 helix bundles
        Helical coiled-coils
   Helix bundle motifs can be parametrized using a few global variables that
   describe the global structure
   Applications:
        New metal-binding sites
        Nonbiological cofactors for novel biomaterials and electromechanical devices
        Novel enzymatic activities




    Pablo Carbonell (iSSB)          Computational Protein Design         mSSB: December 2010   40 / 45
Example: De Novo Design of a Metalloprotein




   Computational de novo design of a four-helix (108 residues) bundle containing the
   non-biological cofactor iron diphenyl porphyrin (DPP-Fe) [Bender et al., 2007]
         The initial helix bundle was selected as low-energy structure computed with MCSA
         STITCH: a program to select loops connecting helices from PDB Select
         CHARMM and PROCHECK for removing overlaps
         4 His and the 4 Thr residues to support the 6-point coordination of the Fe(III) cations
         SCADS: provides side-dependent amino acid probabilities in each round



     Pablo Carbonell (iSSB)           Computational Protein Design         mSSB: December 2010   41 / 45
Outline



1   Introduction

2   Computational Protein Descriptors

3   Sequence-based CPD

4   Structure-based CPD

5   Search Algorithms in CPD

6   De Novo Design

7   Challenges in Sequence and Structure-Based CPD




       Pablo Carbonell (iSSB)     Computational Protein Design   mSSB: December 2010   42 / 45
Challenges in Sequence and Structure-Based CPD



Modeling
    Greater availability of 3D protein structural information
    More accurate energy functions
    Improvement of rigid and flexible docking


Design
    Improvement in search algorithms
    Parametrization for non-natural amino acids

Prediction
    Beyond additive models: using machine-learning algorithms
    More complete environment descriptors




     Pablo Carbonell (iSSB)       Computational Protein Design   mSSB: December 2010   43 / 45
Computational Protein Design
                         2. Computational Protein Design Techniques


                                           Pablo Carbonell
                           pablo.carbonell@issb.genopole.fr

                               iSSB, Institute of Systems and Synthetic Biology
                              Genopole, University d’Évry-Val d’Essonne, France



                                     mSSB: December 2010




Pablo Carbonell (iSSB)                    Computational Protein Design            mSSB: December 2010   44 / 45
Bibliography I



Gretchen M. Bender, Andreas Lehmann, Hongling Zou, Hong Cheng, H. Christopher Fry, Don Engel, Michael J. Therien, J. Kent Blasie, Heinrich Roder,
    Jeffrey G. Saven, and William F. DeGrado. De Novo Design of a Single-Chain Diphenylporphyrin Metalloprotein. Journal of the American Chemical
    Society, 129(35):10732–10740, September 2007. ISSN 0002-7863. doi: 10.1021/ja071199j. URL http://dx.doi.org/10.1021/ja071199j.
F. Edward Boas and Pehr B. Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, April 2007. ISSN
    0959-440X. doi: 10.1016/j.sbi.2007.03.006. URL http://dx.doi.org/10.1016/j.sbi.2007.03.006.
Pablo Carbonell and Antonio del Sol. Methyl side-chain dynamics prediction based on protein structure. Bioinformatics, pages btp463+, July 2009. doi:
    10.1093/bioinformatics/btp463. URL http://dx.doi.org/10.1093/bioinformatics/btp463.
Jean-Loup L. Faulon, Michael J. Collins, and Robert D. Carr. The signature molecular descriptor. 4. Canonizing molecules using extended valence
   sequences. Journal of chemical information and computer sciences, 44(2):427–436, 2004. ISSN 0095-2338. doi: 10.1021/ci0341823. URL
   http://dx.doi.org/10.1021/ci0341823.
Michelle M. Meyer, Lisa Hochrein, and Frances H. Arnold. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Engineering
    Design and Selection, 19(12):563–570, December 2006. ISSN 1741-0126. doi: 10.1093/protein/gzl045. URL
    http://dx.doi.org/10.1093/protein/gzl045.
Diwahar Narasimhan, Mark R. Nance, Daquan Gao, Mei-Chuan Ko, Joanne Macdonald, Patricia Tamburi, Dan Yoon, Donald M. Landry, James H. Woods,
   Chang-Guo Zhan, John J. G. Tesmer, and Roger K. Sunahara. Structural analysis of thermostabilizing mutations of cocaine esterase. Protein
   Engineering Design and Selection, 23(7):537–547, July 2010. doi: 10.1093/protein/gzq025. URL http://dx.doi.org/10.1093/protein/gzq025.
Manish C. Saraf, Gregory L. Moore, Nina M. Goodey, Vania Y. Cao, Stephen J. Benkovic, and Costas D. Maranas. IPRO: an iterative computational protein
   library redesign and optimization procedure. Biophysical journal, 90(11):4167–4180, June 2006. ISSN 0006-3495. doi: 10.1529/biophysj.105.079277. URL
   http://dx.doi.org/10.1529/biophysj.105.079277.
Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, Michael M. Gromiha, and Tatsuya Akutsu. Prediction of Protein Folding Rates from Structural
    Topology and Complex Network Properties. IPSJ Transactions on Bioinformatics, 3:40–53, 2010. doi: 10.2197/ipsjtbio.3.40. URL
    http://dx.doi.org/10.2197/ipsjtbio.3.40.




           Pablo Carbonell (iSSB)                             Computational Protein Design                                mSSB: December 2010           45 / 45

Más contenido relacionado

La actualidad más candente

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure predictionSamvartika Majumdar
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug designADAM S
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methodsratanvishwas
 
Lecture 6 –active site identification
Lecture 6 –active site identificationLecture 6 –active site identification
Lecture 6 –active site identificationRAJAN ROLTA
 
Energy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingEnergy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingChandni Pathak
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionArindam Ghosh
 
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...SELF-EXPLANATORY
 
Protein folding
Protein foldingProtein folding
Protein foldingFacebook
 
Introduction to CpG island power point presentation
Introduction to CpG island power point presentationIntroduction to CpG island power point presentation
Introduction to CpG island power point presentationjkhdfhk
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis Nitin Naik
 
Protein motif. by KK Sahu sir
Protein motif. by KK Sahu sirProtein motif. by KK Sahu sir
Protein motif. by KK Sahu sirKAUSHAL SAHU
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignmentharshita agarwal
 

La actualidad más candente (20)

Protein 3 d structure prediction
Protein 3 d structure predictionProtein 3 d structure prediction
Protein 3 d structure prediction
 
Structure based drug design
Structure based drug designStructure based drug design
Structure based drug design
 
Molecular modelling
Molecular modellingMolecular modelling
Molecular modelling
 
Threading modeling methods
Threading modeling methodsThreading modeling methods
Threading modeling methods
 
Applications of Proteomics Science
Applications of Proteomics ScienceApplications of Proteomics Science
Applications of Proteomics Science
 
Kegg
KeggKegg
Kegg
 
String.pptx
String.pptxString.pptx
String.pptx
 
Lecture 6 –active site identification
Lecture 6 –active site identificationLecture 6 –active site identification
Lecture 6 –active site identification
 
Energy minimization methods - Molecular Modeling
Energy minimization methods - Molecular ModelingEnergy minimization methods - Molecular Modeling
Energy minimization methods - Molecular Modeling
 
Protein docking
Protein dockingProtein docking
Protein docking
 
Ab Initio Protein Structure Prediction
Ab Initio Protein Structure PredictionAb Initio Protein Structure Prediction
Ab Initio Protein Structure Prediction
 
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
Protein structure classification/domain prediction: SCOP and CATH (Bioinforma...
 
Protein folding
Protein foldingProtein folding
Protein folding
 
Introduction to CpG island power point presentation
Introduction to CpG island power point presentationIntroduction to CpG island power point presentation
Introduction to CpG island power point presentation
 
Phylogenetic analysis
Phylogenetic analysis Phylogenetic analysis
Phylogenetic analysis
 
Introduction to Biological databases
Introduction to Biological databasesIntroduction to Biological databases
Introduction to Biological databases
 
Protein motif. by KK Sahu sir
Protein motif. by KK Sahu sirProtein motif. by KK Sahu sir
Protein motif. by KK Sahu sir
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
multiple sequence alignment
multiple sequence alignmentmultiple sequence alignment
multiple sequence alignment
 
Molecular docking
Molecular dockingMolecular docking
Molecular docking
 

Destacado

Computational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExerciseComputational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExercisePablo Carbonell
 
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyComputational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyPablo Carbonell
 
Protein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataProtein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataLars Juhl Jensen
 
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Abhay Kumar Gaurav
 
Plegable Proteínas - Biología Molecular
Plegable Proteínas - Biología MolecularPlegable Proteínas - Biología Molecular
Plegable Proteínas - Biología Molecularsyepesa95
 
Antigen processing lecture-nkn
Antigen processing  lecture-nknAntigen processing  lecture-nkn
Antigen processing lecture-nknNavreet Nanda
 
Computational Protein Design. Overview
Computational Protein Design. OverviewComputational Protein Design. Overview
Computational Protein Design. OverviewPablo Carbonell
 
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...Arghya Narendra Dianastya
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringPablo Carbonell
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsChristoph Steinbeck
 
Plant tissue culture march 2
Plant tissue culture march 2Plant tissue culture march 2
Plant tissue culture march 2Dr. sreeremya S
 
Natural and artificial regeneration
Natural and artificial regenerationNatural and artificial regeneration
Natural and artificial regenerationVivek Srivastava
 
Plasma protiens and their clinical significance
Plasma protiens and their clinical significancePlasma protiens and their clinical significance
Plasma protiens and their clinical significanceHussan Sheikh
 
Protein engineering saurav
Protein engineering sauravProtein engineering saurav
Protein engineering sauravSaurav Das
 
Sars Presentation
Sars PresentationSars Presentation
Sars Presentationcglace
 
Protein engineering
Protein engineeringProtein engineering
Protein engineeringbansalaman80
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedSlideShare
 

Destacado (20)

Computational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical ExerciseComputational Protein Design. 4. A Practical Exercise
Computational Protein Design. 4. A Practical Exercise
 
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic BiologyComputational Protein Design. 3. Applications in Systems and Synthetic Biology
Computational Protein Design. 3. Applications in Systems and Synthetic Biology
 
Protein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other dataProtein networks as a scaffold for structuring other data
Protein networks as a scaffold for structuring other data
 
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
Cryotherapy for Pathogen Free Planting Material in Ornamental Crops
 
Plegable Proteínas - Biología Molecular
Plegable Proteínas - Biología MolecularPlegable Proteínas - Biología Molecular
Plegable Proteínas - Biología Molecular
 
Antigen processing lecture-nkn
Antigen processing  lecture-nknAntigen processing  lecture-nkn
Antigen processing lecture-nkn
 
Computational Protein Design. Overview
Computational Protein Design. OverviewComputational Protein Design. Overview
Computational Protein Design. Overview
 
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot,and...
Effect of The Different Auxins and Cytokinins in Callus Induction, Shoot, and...
 
zahid hussain ajk
zahid hussain ajkzahid hussain ajk
zahid hussain ajk
 
Computational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein EngineeringComputational Protein Design. 1. Challenges in Protein Engineering
Computational Protein Design. 1. Challenges in Protein Engineering
 
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for MetabolomicsDeveloping an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
Developing an Efficient Infrastruture, Standards and Data-Flow for Metabolomics
 
Plant tissue culture march 2
Plant tissue culture march 2Plant tissue culture march 2
Plant tissue culture march 2
 
Plant tissue culture
Plant tissue culturePlant tissue culture
Plant tissue culture
 
Natural and artificial regeneration
Natural and artificial regenerationNatural and artificial regeneration
Natural and artificial regeneration
 
Plasma protiens and their clinical significance
Plasma protiens and their clinical significancePlasma protiens and their clinical significance
Plasma protiens and their clinical significance
 
Protein engineering saurav
Protein engineering sauravProtein engineering saurav
Protein engineering saurav
 
Sars Presentation
Sars PresentationSars Presentation
Sars Presentation
 
Protein engineering
Protein engineeringProtein engineering
Protein engineering
 
Plant Tissue Culture, Methods and Applications
Plant Tissue Culture, Methods and ApplicationsPlant Tissue Culture, Methods and Applications
Plant Tissue Culture, Methods and Applications
 
LinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-PresentedLinkedIn SlideShare: Knowledge, Well-Presented
LinkedIn SlideShare: Knowledge, Well-Presented
 

Similar a Protein Design Techniques and Applications

EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...ChemAxon
 
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Temple University
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelLaboratoire Statistique et génome
 
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ijbbjournal
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...ijcsit
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...AIRCC Publishing Corporation
 
PCA-CompChem_seminar
PCA-CompChem_seminarPCA-CompChem_seminar
PCA-CompChem_seminarAnne D'cruz
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierCSCJournals
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Michel Dumontier
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Gota Morota
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsNTNU
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceCSCJournals
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPsWALEBUBLÉ
 
Piepho et-al-2003
Piepho et-al-2003Piepho et-al-2003
Piepho et-al-2003Juaci Cpac
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachGualberto Asencio Cortés
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?Rajarshi Guha
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsSelman Bozkır
 

Similar a Protein Design Techniques and Applications (20)

EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
EUGM15 - Michael J. Bodkin (Evotec): Algorithms, Evolution and Network-Based ...
 
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
Oral presentation at Protein Folding Consortium Workshop in Berkeley (2017)
 
Structured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical modelStructured Regularization for conditional Gaussian graphical model
Structured Regularization for conditional Gaussian graphical model
 
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
ENHANCED POPULATION BASED ANT COLONY FOR THE 3D HYDROPHOBIC POLAR PROTEIN STR...
 
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
COLOURED ALGEBRAS AND BIOLOGICAL RESPONSE IN QUANTUM BIOLOGICAL COMPUTING ARC...
 
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
Coloured Algebras and Biological Response in Quantum Biological Computing Arc...
 
PCA-CompChem_seminar
PCA-CompChem_seminarPCA-CompChem_seminar
PCA-CompChem_seminar
 
A Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target ClassifierA Non Parametric Estimation Based Underwater Target Classifier
A Non Parametric Estimation Based Underwater Target Classifier
 
Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)Increasingly Accurate Representation of Biochemistry (v2)
Increasingly Accurate Representation of Biochemistry (v2)
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Locally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet MetricsLocally Averaged Bayesian Dirichlet Metrics
Locally Averaged Bayesian Dirichlet Metrics
 
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour SpaceA Novel High Accuracy Algorithm for Reference Assembly in Colour Space
A Novel High Accuracy Algorithm for Reference Assembly in Colour Space
 
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
2017 - Plausible Bioindicators of Biological Nitrogen Removal Process in WWTPs
 
Piepho et-al-2003
Piepho et-al-2003Piepho et-al-2003
Piepho et-al-2003
 
Protein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors ApproachProtein Distance Map Prediction based on a Nearest Neighbors Approach
Protein Distance Map Prediction based on a Nearest Neighbors Approach
 
What can your library do for you?
What can your library do for you?What can your library do for you?
What can your library do for you?
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Ef24836841
Ef24836841Ef24836841
Ef24836841
 
Presentation
PresentationPresentation
Presentation
 
Probabilistic information retrieval models & systems
Probabilistic information retrieval models & systemsProbabilistic information retrieval models & systems
Probabilistic information retrieval models & systems
 

Último

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Último (20)

What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Protein Design Techniques and Applications

  • 1. Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 1 / 45
  • 2. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 2 / 45
  • 3. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 3 / 45
  • 4. Computational Protein Design Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 4 / 45
  • 5. A Blueprint of CPD Approaches ∗ RS : research studies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 5 / 45
  • 6. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 6 / 45
  • 7. Molecular Signature Descriptors A 2D representation of the molecular graphs Atomic signature : as an undirected colored graphs G(V , E, C), Xh with V : atoms, E : bonds, C : atom type h σ(G) = σ(x) (1) The signature descriptor of height h of atom x x∈V in the molecular graph G, or h σ(x), is a The signature is a systematic canonical representation of the subgraph of codification of the molecular G containing all atoms that are at distance h graph [Faulon et al., 2004] from x σ(methylcyclopropane) = 1 [C]([H][C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H])) 2 [C]([H][H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H])) 1 [C]([H][H][H][C]([H][C]([H][H][C,0])[C,0]([H][H]))) 1 [H]([C]([C]([H][H][C,0])[C,0]([H][H])[C]([H][H][H]))) 4 [H]([C]([H][C]([H][C,0][C]([H][H][H]))[C,0]([H][H]))) 3 [H]([C]([H][H][C]([H][C]([H][H][C,0])[C,0]([H][H])))) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 7 / 45
  • 8. Molecular Signature of Reactions and Proteins Signature of a reaction. The signature of reaction R S1 + S2 + . . . + Sn → P1 + P2 + . . . + Pn (2) that transforms n substrates into m products is given by the difference between the signature of the products and the signature of the substrates: h Xh Xh σ(R) = σ(p) − σ(s) (3) p∈P s∈S Signature of protein sequences. The protein P is represented by the linear chain given by its collapsed graph at residue level, a reduced molecular graph representation G(V , E, C) known as string signature where V : residues a ∈ A, E : contiguous in sequence, C : amino acid type h Xh σ(P) = σ(a) (4) a∈A Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 8 / 45
  • 9. Protein Contact Maps The protein contact map is a graph representation of the 3D interactions at residue level G(V , E, C) where V : residues, E : contacts, C : amino acid type Two residues are considered to interact when atoms between both residues are at a distance lower than a predetermined threshold (tipically 4.5 ∼ 5 Å) Contact maps can account for long-range interactions and conformational states Song et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 9 / 45
  • 10. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 10 / 45
  • 11. Sequence and Structure-Based CPD Sequence-based CPD methods are in some cases a good trade-off between complexity of the model and accuracy of the predictions Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 11 / 45
  • 12. Sequence-based Knowledge-based potentials The simplest way to score a protein and to identify active regions is through amino acid scales or indexes AAindex is a database of 544 amino acid indexes 94 Amino Acid Matrices 47 amino acid pair-wise contact potentials Examples: hydrophobicity, accessibility, van der Waals volume, secondary structure propensity, flexibility This approach is widely used when analyzing conserved motifs and correlated mutations in protein fold families through multiple alignments Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 12 / 45
  • 13. Quantitative Structure-Activity Relationship (QSAR) Techniques The goal is to model causal relationships QSAR is a statistical method used between extensively by the chemical and pharmaceutical industries in structures of interacting molecules small-molecules and peptide measurables properties of scientific optimization or commercial interest such as ADME/Tox (absorption, distribution, metabolism, excretion, and toxicity) of drugs Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 13 / 45
  • 14. QSAR Model Evaluation Model predictability is generally evaluated through the leave-one-out (LOO) cross-validation correlation coefficient q 2 Partial least-squares (PLS) regression is commonly used Additional nonlinear terms can be added through the use of nonlinear regression or machine learning techniques (kernel methods, random forests, etc) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 14 / 45
  • 15. QSAR Modeling Workflow Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 15 / 45
  • 16. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 16 / 45
  • 17. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 17 / 45
  • 18. The ProSAR Algorithm An extension of SAR-based approaches to CPD It formalizes the decision-making processes about which mutations to include in combinatorial libraries N XX y = cij xij (5) i=1 j∈A y : the predicted function (activity) of the protein sequence cij : the regression coefficients corresponding to the mutational effect of having residue j among the 20 amino acids A at postion i xij : binary variable indicating the presence or absence of residue j at position i Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 18 / 45
  • 19. Improving Catalytic Function by ProSAR-driven Enzyme Evolution Statistical analysis of protein sequence activity relationships Bacterial biocatalysis of Atorvastatin (Lipitor) (cholesterol-lowering drug) Codexis Inc. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 19 / 45
  • 20. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 20 / 45
  • 21. Structure-based CPD Energy functions and molecular force fields Local conformational restrictions Predicting entropic factors Protein topological properties From Narasimhan et al. [2010] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 21 / 45
  • 22. Energy Functions and Molecular Force Fields In structure-based CPD, folds are usually represented by the spatial coordinates of the backbone atoms or design scaffold Protein design is done by amino acid side chains along the scaffold Side chains are only permitted to assume a discrete set of statistically preferred conformations: rotamers Rotamer/backbone and rotamer/rotamer interaction energies are tabulated These potential energies can then be approximated by using any of the standard force fields : CHARMM, AMBER, GROMOS Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 22 / 45
  • 23. Molecular Force Fields AMBER: a classical force field for energy and MD calculations: X 1 X 1 X 1 V (r N ) = kb (l − l0 )2 + ka (θ − θ0 )2 + Vn [1 + cos(nω − γ)] 2 2 2 bonds angles torsions N−1 X ( "„ « „ «6 # ) X N r0ij 12 r0ij qi qj + i,j −2 + (6) rij rij 4π 0 rij j=1 i=j+1 P 1 (·): energy between covalently bonded atoms. Pbonds angles (·): energy due to the geometry of electron orbitals involved in covalent 2 bonding. P torsions (·): energy for twisting a bond due to bond order (e.g. double bonds) and 3 neighboring bonds or lone pairs of electrons. PN−1 PN i=j+1 (·): non-bonded energy between all atom pairs: 4 j=1 1 van der Waals energies 2 Electrostatic energies Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 23 / 45
  • 24. Structure-based Knowledge-based Potentials They are built by performing a large-scale statistical study of structural databases such as PDB (Protein Data Bank) Rotamer libraries (∼ 150 rotameric states) Binary patterning: only some type of amino acids are allowed based on the hydrophobic environment An implicit solvation model Secondary structure propensity Frequency of small segments in the PDB Pairwise potentials van der Waals interactions Hydrogen bonding Electrostatics Entropy-based penalties for flexible side-chains From Boas and Harbury [2007] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 24 / 45
  • 25. Energy Functions Design along the backbone or scaffold Rotamer/backbone and rotamer/rotamer interact. energies tabulated Precomputed from molecular force fields : CHARMM, AMBER, GROMOS Total energy of the protein X X ETOT = Ek (rk ) + Ekl (rk , rl ) (7) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ) : the self-energy of a particular rotamer rk Ekl (rk , rl ) : the pair energy of rotamers rk , rj Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 25 / 45
  • 26. The Role of Dynamics Besides protein structure, protein dynamics can play a direct role in molecular recognition Flexible proteins recognize their targets through induced fit or conformational selection, likely showing promiscuity Binding is commonly enthalpy-driven, but in some cases entropy is important, for instance: Proteins with multiple binding sites Small hydrophobic molecules Two types of source of protein motions: Protein flexibility: intraconformational dynamics (fast time scale motions) Conformational heterogeneity: interconformational dynamics Gibbs free energy: ∆G = ∆H − T ∆S (8) ∆S = ∆Ssolv + ∆Sconf + ∆Srt (9) ∆Sconf : conformational entropy of protein and ligand ∆Srtf : rotational and translational degree of freedoms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 26 / 45
  • 27. Predicting Side-chain Dynamics from Structural Descriptors The Lipari-Szabo model free approach approach allows to quantify motions from NMR experiments by computing the generalized order parameter S 2 Protein backbone dynamics : 15 NH and 13 Cα H NMR relaxation methods Protein side chain methyl dynamics : 13 Cα H NMR relaxation methods (side-chain motions in the picosecond-to-nanosecond time regime) From the BMRB we compiled S 2 data for 18 proteins, including 10 proteins in 2 or more different states : calmodulin, barnase, pdz, mup, dfhr, staphylococcal nuclease, pin1, sh3 domain, MSG This technique provides only measurements for the Cα of methyl groups in side chains : ALA, LEU, ILE, MET, THR, VAL Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 27 / 45
  • 28. Structural Descriptors of Methyl Dynamics We consider the following parameters influencing side-chain dynamics : Packing density at the methyl site i and its neighboring residues j within a sphere of r =5Å 0 1 X X B X Pi = Cj e−rij = e−rjk A e−rij (10) C @ rij <5Å rij <5Å rjk <5Å Side chain stiffness : number of dihedral angles separating the backbone from the methyl carbon. weighted by the side-chain packing Rotameric state : angular distance ∆χ = χ − χ0 to the closest rotameric state χ0 in the library Elongation : distance from the methyl site to the Cα Pairwise contact potential : a knowledge-based potential of frequence of contacts between residues at several distances computed from the PDB Solvation effect : DSSP accessibility and residue hydrophobicity Van der Waals contacts Hydrogen bonds (in the case of Threonine) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 28 / 45
  • 29. Predicting Methyl Side-chain Dynamics Algorithm : neural network Cross-validation : r = 0.71 ± 0.029 Example : experimental and predicted (p-value = 4.6 × 10−87 ) changes in ∆S 2 of barnase after binding barstar Protein MD method r (MD) r (nnet) ubiquitin AMBER99SB 0.81 0.81 TNfn3 CHARMM 22 0.62 0.79 ∆S 2 > 0 ∆S 2 < 0 FNfn10 CHARMM 22 0.51 0.64 rigidification flexibilization barnase OPLS-AA/L 0.55 0.64 calmodulin FDPB 0.60 0.72 [Carbonell and del Sol, 2009] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 29 / 45
  • 30. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 30 / 45
  • 31. Search Algorithms in CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 31 / 45
  • 32. Search Algorithms Objective: finding the best design within the space of all possible amino acid/rotameric states A vast search space: 20N or pN N: number of positions to mutate p: number of rotameric states Strategies Deterministic algorithms Dead-end elimination (DEE) algorithm: a pruning method. Some accelerations of the DEE algorithm: upper-bound estimation; the “magic bullet” metric; conformational splitting; background optimization Stochastic algorithms Monte Carlo Simulated annealing Genetic algorithms Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 32 / 45
  • 33. The DEE Algorithm It assumes that the energy of the protein can be written as X X ETOT = Ek (rk ) + Ekl (rk , rl ) (11) k k =l N : length of the protein rk : the rotamer of the kth side chain Ek (rk ):" the self-energy of a particular rotamer rk Ekl (rk , rl ): the pair energy of the rotamers rk , rj Complexity: Single search scales quadratically with total number of rotamers O((p × N)2 ) Pair search scales cubically O((p × N)3 ) Brute force enumeration : O(pN ) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 33 / 45
  • 34. The DEE Algorithm Single rotamers and rotamer pairs are eliminated during the computational cycles Single elimination : eliminate rotamer if some other rotamer in the side chain gives better energy N X N X A Ek (rk ) + min Ekl (rk , rlX ) A > B Ek (rk ) + max Ekl (rk , rlX ) B (12) X X l=1 l=1 Pairs elimination : eliminate pair of rotamers in two positions if there exists another pair that gives better energy def Ukl = Ek (rk ) + El (rlB ) + Ekl (rk , rlB ) AB A A (13) N X “ ” AB Ukl + min Eki (rk , riX ) + Elj (rlB , rjX ) > A X i=1 N X “ ” CD Ukl + max Eki (rk , riX ) + Elj (rlD , rjX ) C (14) X i=1 Values are precomputed and stored in energy matrices Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 34 / 45
  • 35. Stochastic Algorithms Search in the space of feasible designs by making a series of combinations of random and directed moves Monte Carlo Metropolis: a move consists of exchanging one rotamer for another at a randomly chosen position, a modification is accepted if it lowers the energy Simulated Annealing allows to explore nearby solutions at the initial cycles of the search Genetic Algorithms: a population of models is propagated (evolved) throughout the course of the run and genetic operators, such as recombination, are used to create new models from existing parents They are fast, can be scaled up to problems of large complexity They are not guaranteed to converge to the optimal solution Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 35 / 45
  • 36. The SCHEMA Algorithm Equivalent to an in silico directed evolution Consists of scoring libraries of hybrid protein sequences against the parental sequence Scoring: Calculate the number of interactions between residues (contacts within 4.5 Å) that are disrupted in the creation of hybrid proteins Hybrids are scored for stability by counting the number of disruptions Protein is partitioned into blocks that should not From [Meyer et al., 2006] interrupted by crossovers (analog to genetic algorithms) Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 36 / 45
  • 37. The OPTCOM and IPRO Algorithms for Library Design The OPTCOM algorithm: The IPRO algorithm: Balances size and Identify point mutations in the parent sequences quality of the library using energy-based scoring fuctions Residue and rotamer choices are driven by a mixed-integer linear programming formulation (MILP) From [Saraf et al., 2006] Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 37 / 45
  • 38. Some Web Resources IPRO: Iterative Protein Redesign and Optimization. http://maranas.che.psu.edu/IPRO.htm EGAD: A Genetic Algorithm for protein Design. http://egad.ucsd.edu/software.php RosettaDesign: A software package. http://rosettadesign.med.unc.edu/ SCHEMA A pair-wise energy function for scoring protein chimeras made from homologous proteins. http://www.che.caltech.edu/groups/fha/ schema-tools/schema-overview.html SHARPEN: Systematic Hierarchical Algorithms for Rotamers and Proteins on an Extended Network. http://koko.che.caltech.edu/sharpenabout.html WHAT IF: Software for protein modelling, design, validation, and visualisation. http://swift.cmbi.ru.nl/whatif/ FoldX: A force field for energy calculations and protein design. http://foldx.crg.es/ Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 38 / 45
  • 39. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 39 / 45
  • 40. De Novo-Designed Proteins In de novo designs, some assumptions are needed in order to make the search space tractable Usually we start from some basic motifs or domains as scaffolds for the design Examples: βαβ motif resembling a zinc finger 3 and 4 helix bundles Helical coiled-coils Helix bundle motifs can be parametrized using a few global variables that describe the global structure Applications: New metal-binding sites Nonbiological cofactors for novel biomaterials and electromechanical devices Novel enzymatic activities Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 40 / 45
  • 41. Example: De Novo Design of a Metalloprotein Computational de novo design of a four-helix (108 residues) bundle containing the non-biological cofactor iron diphenyl porphyrin (DPP-Fe) [Bender et al., 2007] The initial helix bundle was selected as low-energy structure computed with MCSA STITCH: a program to select loops connecting helices from PDB Select CHARMM and PROCHECK for removing overlaps 4 His and the 4 Thr residues to support the 6-point coordination of the Fe(III) cations SCADS: provides side-dependent amino acid probabilities in each round Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 41 / 45
  • 42. Outline 1 Introduction 2 Computational Protein Descriptors 3 Sequence-based CPD 4 Structure-based CPD 5 Search Algorithms in CPD 6 De Novo Design 7 Challenges in Sequence and Structure-Based CPD Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 42 / 45
  • 43. Challenges in Sequence and Structure-Based CPD Modeling Greater availability of 3D protein structural information More accurate energy functions Improvement of rigid and flexible docking Design Improvement in search algorithms Parametrization for non-natural amino acids Prediction Beyond additive models: using machine-learning algorithms More complete environment descriptors Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 43 / 45
  • 44. Computational Protein Design 2. Computational Protein Design Techniques Pablo Carbonell pablo.carbonell@issb.genopole.fr iSSB, Institute of Systems and Synthetic Biology Genopole, University d’Évry-Val d’Essonne, France mSSB: December 2010 Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 44 / 45
  • 45. Bibliography I Gretchen M. Bender, Andreas Lehmann, Hongling Zou, Hong Cheng, H. Christopher Fry, Don Engel, Michael J. Therien, J. Kent Blasie, Heinrich Roder, Jeffrey G. Saven, and William F. DeGrado. De Novo Design of a Single-Chain Diphenylporphyrin Metalloprotein. Journal of the American Chemical Society, 129(35):10732–10740, September 2007. ISSN 0002-7863. doi: 10.1021/ja071199j. URL http://dx.doi.org/10.1021/ja071199j. F. Edward Boas and Pehr B. Harbury. Potential energy functions for protein design. Current opinion in structural biology, 17(2):199–204, April 2007. ISSN 0959-440X. doi: 10.1016/j.sbi.2007.03.006. URL http://dx.doi.org/10.1016/j.sbi.2007.03.006. Pablo Carbonell and Antonio del Sol. Methyl side-chain dynamics prediction based on protein structure. Bioinformatics, pages btp463+, July 2009. doi: 10.1093/bioinformatics/btp463. URL http://dx.doi.org/10.1093/bioinformatics/btp463. Jean-Loup L. Faulon, Michael J. Collins, and Robert D. Carr. The signature molecular descriptor. 4. Canonizing molecules using extended valence sequences. Journal of chemical information and computer sciences, 44(2):427–436, 2004. ISSN 0095-2338. doi: 10.1021/ci0341823. URL http://dx.doi.org/10.1021/ci0341823. Michelle M. Meyer, Lisa Hochrein, and Frances H. Arnold. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Engineering Design and Selection, 19(12):563–570, December 2006. ISSN 1741-0126. doi: 10.1093/protein/gzl045. URL http://dx.doi.org/10.1093/protein/gzl045. Diwahar Narasimhan, Mark R. Nance, Daquan Gao, Mei-Chuan Ko, Joanne Macdonald, Patricia Tamburi, Dan Yoon, Donald M. Landry, James H. Woods, Chang-Guo Zhan, John J. G. Tesmer, and Roger K. Sunahara. Structural analysis of thermostabilizing mutations of cocaine esterase. Protein Engineering Design and Selection, 23(7):537–547, July 2010. doi: 10.1093/protein/gzq025. URL http://dx.doi.org/10.1093/protein/gzq025. Manish C. Saraf, Gregory L. Moore, Nina M. Goodey, Vania Y. Cao, Stephen J. Benkovic, and Costas D. Maranas. IPRO: an iterative computational protein library redesign and optimization procedure. Biophysical journal, 90(11):4167–4180, June 2006. ISSN 0006-3495. doi: 10.1529/biophysj.105.079277. URL http://dx.doi.org/10.1529/biophysj.105.079277. Jiangning Song, Kazuhiro Takemoto, Hongbin Shen, Hao Tan, Michael M. Gromiha, and Tatsuya Akutsu. Prediction of Protein Folding Rates from Structural Topology and Complex Network Properties. IPSJ Transactions on Bioinformatics, 3:40–53, 2010. doi: 10.2197/ipsjtbio.3.40. URL http://dx.doi.org/10.2197/ipsjtbio.3.40. Pablo Carbonell (iSSB) Computational Protein Design mSSB: December 2010 45 / 45