SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
gjp00 | ACSJCA | JCA10.0.1465/W Unicode | research.3f (R3.4.i1:3887 | 2.0 alpha 39) 2012/09/13 09:54:00 | PROD-JCA1 | rq_816271 | 11/19/2012 12:15:06 | 16




                                                                                                                                                                                 Review

                                                                                                                                                                         pubs.acs.org/CR




           1   Protein Contact Networks: An Emerging Paradigm in Chemistry
           2   L. Di Paola,† M. De Ruvo,‡ P. Paci,‡ D. Santoni,§ and A. Giuliani*,∥
               †
           3    Faculty of Engineering, Università CAMPUS BioMedico, Via A. del Portillo, 21, 00128 Roma, Italy
               ‡
           4    BioMathLab, §CNR-Institute of Systems Analysis and Computer Science (IASI), viale Manzoni 30, 00185 Roma, Italy
               ∥
           6
           5     Environment and Health Department, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Roma, Italy
                                                                                   species as nodes and covalent bonds as edges connecting them.                                                49
                                                                                   Structural formulas constitute an extremely efficient symbolic                                                 50
                                                                                   language carrying a very peculiar idea of what a structure is.                                               51
                                                                                   While in physics structures are generally considered as                                                      52
                                                                                   consequences of a force field shaping a continuous space, so                                                  53
           7   CONTENTS                                                            that the emerging structures are simply “energetically allowed”                                              54

           9   1. Introduction                                                 A   configurations in this mainly continuous space, chemistry                                                     55

          10   2. Graph Theory and Protein Contact Networks                    C   assigns to a given structure an autonomous meaning by itself                                                 56

          11       2.1. Elements of Graph Theory                               C   and not only as a consequence of an external force field.                                                     57

          12       2.2. Protein Contact Networks (PCNs)                        C      The molecular graph (structural formula) relative to a given                                              58

          13       2.3. Shortest Paths, Average Path Length, and                   organic molecule is a condensate of the knowledge relative to                                                59

          14            Diameter                                               E   that molecule: no other “scientific language” has an information                                              60

          15       2.4. Clustering on Graphs                                   E   storage and retrieval efficiency comparable to structural                                                      61

          16          2.4.1. Spectral Clustering                               E   formulas. As a matter of fact, they can be used as the sole                                                  62

          17          2.4.2. Intracluster and Extracluster Parameters          E   input for the computation of thousands of chemicophysical                                                    63

          18       2.5. Network Centralities                                   F   descriptors ranging from quantum chemistry to “bulk”                                                         64
          19          2.5.1. Path-Based Centralities: Closeness and                properties, like melting point or partition coefficients,2 and                                                 65
          20                 Between-ness                                      G   the knowledge of structural formula alone is, in many cases,                                                 66
          21       2.6. Network Assortativity and Nodes Property                   sufficient to predict the interaction of the molecule with                                                     67
          22            Distribution                                           G   biological systems.3 Descriptors based on bidimensional                                                      68
          23       2.7. Models of Graphs                                       H   molecular graphs were demonstrated to outperform on many                                                     69
          24          2.7.1. Random Graphs                                     H   occasions, as in the prediction of receptor binding,                                                         70
          25          2.7.2. Scale-Free Graphs                                  I  sophisticated three-dimensional models, thus giving another                                                  71
          26   3. Applications                                                 J   proof of the unique role played by pure topology in chemistry.4                                              72
          27       3.1. Networks and Interactions                              J   Thus, chemical scholars could safely (and proudly) consider the                                              73
          28       3.2. Protein Structure Classification                        J   recent surge of interest in graph-theoretical and, in general,                                               74
          29          3.2.1. Modularity in Allosteric Proteins                 K   network-based approaches in both physics and biology as                                                      75
          30          3.2.2. Protein Folding                                   L   nothing particularly novel for them.                                                                         76
          31   4. Conclusions                                                  L      Chemistry has already exploited graph theory methods: on                                                  77
          32   Author Information                                             M    the molecular scale, the chemical graph theory5,6 has been                                                   78
          33       Corresponding Author                                       M    harnessing the topological sketch of molecules into nodes                                                    79
          34       Notes                                                      M    (atoms) and links (chemical bonds) to derive mathematical                                                    80
          35       Biographies                                                M    descriptors of molecular structures, trying to delineate an                                                  81
          36   References                                                     N    ontology of molecules and predict their properties, on the sole                                              82
                                                                                   basis of the molecular graph wiring. This method has been                                                    83
                                                                                   applied to derive the chemicophysical properties of alkanes,                                                 84
               1. INTRODUCTION                                                     similarly to other methods that rely on the properties prevision                                             85
                                                                                   from a group contribution application (UNIFAC7 and                                                           86
          37   Topology is at the very heart of chemistry. This stems from the
          38   fact that chemical thought, since its prescientific alchemic         UNIQUAC8).                                                                                                   87

          39   origins, focused on the mutual relations between different              Biological chemistry, additionally, poses intriguing issues                                               88

          40   entities expressed in terms of natural numbers instead of           regarding the analysis of complex kinetic schemes, made up of                                                89

          41   continuous quantities. This is the case in the concept of valence   several chemical reactions with nonlinear kinetic expression for                                             90

          42   (e.g., atomic species A combines with atomic species B in the       the corresponding reaction rate (Michaelis−Menten kinetic                                                    91

          43   ratio 1:2 or 2:3) as well as of the periodic table, in which the    rate for enzymatic reactions). In this framework, the classical                                              92

          44   discrete character of the atoms is implicit in the very same        analytical approach to derive the dynamics of reactive systems9                                              93

          45   structure of a two-entry (period and group) matrix.1 Chemical       is unsatisfactory, due to high computational and modelistic                                                  94

          46   “primitives” are thus very often relational concepts that are
          47   naturally translated into the most widespread topological object    Received: June 11, 2012
          48   of the whole science: the structural formula having atomic

                                                  © XXXX American Chemical Society                A                              dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                 Review




           Figure 1. Recoverin 3D structure (left) and correspondent adjacency matrix.

     95    burden required for a complete kinetic representation.                     expression correlation networks) for which no such support is                        139
     96    Mathematics and chemistry meet on the common ground of                     possible.                                                                            140
     97    the chemical reaction network theory (CNRT) that is explicitly                The inter-residue contact network has been yet largely                            141
     98    aimed at analyzing complex biochemical reaction networks in                explored in terms of inter-residue contacts frequencies under                        142
     99    terms of their topological emerging features.10−13                         the quasichemical approximation;18−20 as a matter of fact, in                        143
     100      Nowadays, many different fields of investigation ranging                  the seminal work of Miyazawa and Jernigan,18 the amino acid                          144
     101   from systems biology to electrical engineering, sociology, and             hydrophobicity is assessed on the basis of the frequency of                          145
     102   statistical mechanics converge into the shared operational                 contacts of the corresponding residues as emerging from the                          146
     103   paradigm of complex network analysis.14 A massive advance-                 analysis of a large number of structures.                                            147
     104   ment in the elucidation of general behavior of network systems                In this way, residues involved more frequently in noncovalent                     148
     105   made possible the generation of brand new graph theoretical                interactions (mainly of hydrophobic nature, for hypothesis) are                      149
     106   descriptors, at both single node and entire graph level, that              addressed to be of similar hydrophobic character. Application                        150
     107   could be useful in many fields of chemistry.                                and confirmation of this view emerge from more recent                                 151
     108      More specifically, in this review we will deal with the protein          works,19,21 where the thermal stability of proteins belonging to                     152
     109   3D structures in terms of contact networks between amino acid              thermophiles or psychrophiles has been inspected through the                         153
     110   residues. This case allows for a straightforward formalization in          inter-residue interaction potential. The main result is that a                       154
     111   topological terms: the role of nodes (residues) and edges                  characteristic distribution of inter-residues is able to provide the                 155
     112   (contacts) is devoid of any ambiguity and the introduction of              protein structure with the required flexibility to adapt to the                       156
     113   van der Waals radii of amino acids allows us to assign a                   environment.                                                                         157
     114   motivated threshold for assigning contacts and building the                   The two above referenced works19,21 are, in any case, only a                      158
     115   network.15−17
                                                                                      statistical application over a huge number of proteins; what we                      159
     116      On the other hand, the Protein Data Bank (PDB) collects
                                                                                      really want to know is the character of information about a                          160
     117   thousands of very reliable X-ray-resolved molecular structures,
                                                                                      single and specific molecule that we can derive from its residue                      161
     118   allowing scientists to perform sufficiently populated statistical
                                                                                      contact graph.                                                                       162
     119   enquiries to highlight relevant shared properties of protein
                                                                                         A very immediate example of this single molecule                                  163
     120   structures or to go in-depth into specific themes (e.g.,
     121   topological signatures of allostery), as well as to identify               information is the fact that protein secondary structure can                         164

     122   residues potentially crucial for activity and stability of proteins.       be reproduced with no errors on the sole basis of an adjacency                       165

     123      From a purely theoretical point of view, the reduction of a             matrix.22 Similar considerations hold true for protein folding                       166

     124   protein structure (that in its full rank corresponds to the three-         rate,23−26 while normal-mode analysis confirmed that mean                             167

     125   dimensional coordinates relative to all the atoms of the                   square displacement of highly contacted residues is substantially                    168

     126   molecule) to a binary contact matrix between the α-carbons of              limited (nearly 20% of maximal movement range27). From                               169

     127   the residues represents a dramatic collapse. How many relevant             another perspective, the presence of highly invariant patterns of                    170
     128   properties of protein 3D structures (and consequently of                   graph descriptors shared by all the proteins, irrespective of their                  171
     129   possible consequences in terms of protein physiological role)              general shape and size, points to still unknown mesoscopic                           172
     130   are kept alive (and, hopefully, exalted by the filtering out of not         invariants (formally an analogue to valence considerations) on                       173
     131   relevant information) by the consideration of a protein as a               the very basis of protein-like behavior, irrespective for both                       174
     132   contact network? How firmly based is the guess that adjacency               fibrous and globular structures.28,29 The scope of this review is,                    175
     133   matrices having as rows and columns amino acid residues (see               by briefly discussing some applications in this rapidly emerging                      176
f1   134   Figure 1) could in the future play the same role the structural            field, to sketch an at least initial answer to the quest for a new                    177
     135   formula plays for organic chemistry? Relying on a single                   “structural formula” language for proteins. This quest will be                       178
     136   nonambiguous and physically motivated ordering of nodes (the               pursued in the following chapters by presenting side-by-side the                     179
     137   primary structure) dramatically enlarges the realism of contact            different complex network invariants developed by graph                               180
     138   networks with respect to other kinds of networks (e.g., gene               theory and their protein counterparts.                                               181

                                                                                  B                         dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                Review

           2. GRAPH THEORY AND PROTEIN CONTACT                                               ⎧ Ai , j = w(vi , vj) if (vi , vj) ∈ E
                                                                                             ⎪
     182      NETWORKS                                                                 Aij = ⎨
                                                                                             ⎪
                                                                                             ⎩ Ai , j = 0          otherwise
           2.1. Elements of Graph Theory
     183   The classic Könisberg bridge problem introduced graph theory              The degree of a node in a weighted graph is defined as                               220
     184   in 18th century. The problem had the following formulation:
                                                                                                N
     185   does there exist a walk crossing each of the seven bridges of
     186   Könisberg exactly once? The solution to this problem appeared              k(v) =   ∑ w(ui , v)
     187   in “Solutio Problematis ad geometriam situs pertinentis” in                          i=1
     188   1736 by Euler.30 This was the first time a problem was codified
                                                                                    where ui ∈ N(v).                                                                      221
     189   in terms of nodes and edges linking nodes. This structure was
     190   called a graph.                                                          2.2. Protein Contact Networks (PCNs)
     191      A graph G is a mathematical object used to model complex              A protein structure is a complex three-dimensional object,                            222
     192   structures and it is made of a finite set of vertices (or nodes) V        formally defined by the coordinates in 3D space of its                                 223
     193   and a collection of edges E connecting two vertices.                     atoms.31,32 Since the first works on the subject in the early                          224
     194      A graph G = (V, E) can be represented as a plane figure by             1960s,33 a large number of protein molecular structures has                           225
     195   drawing a line between two nodes u and v and an edge e = (u,             been resolved, now accessible on devoted web databases.34 The                         226
f2   196   v) ∈ E (Figure 2).                                                       large availability of protein molecular structures has not solved                     227
                                                                                    yet many of the issues regarding the strict relationship between                      228
                                                                                    structure and function in the protein universe.                                       229
                                                                                       Thus, an emerging need in protein science is to define simple                       230
                                                                                    descriptors, able to describe each protein structure with few                         231
                                                                                    numerical variables, hopefully representative of the functionally                     232
           Figure 2. Example of an undirect graph comprising two nodes and an       relevant properties of the analyzed structure.                                        233
           edge.                                                                       Protein structure and function rely on the complex network                         234
                                                                                    of inter-residue interactions that intervene in forming and                           235
                                                                                    keeping the molecular structure and in the protein biological                         236
                                                                                    activity.                                                                             237
     197     A graph G = (V, E) can be represented by its adjacency                    Thus, the residues interactions are a good starting point to                       238
     198   matrix A; given an order of V = {v1, v2, ... vn}, we define the           define the protein interaction network;20,27,35 in this frame-                         239
     199   generic element of the matrix Ai,j as follows:                           work, the molecular structure needs to be translated into a                           240
                                                                                    simpler picture, cutting out the redundant information                                241
                    ⎧ Ai , j = 1 if (vi , vj) ∈ E
                    ⎪
                                                                                    embedded in the complete spatial position of all atoms.                               242
              Aij = ⎨                                                                  The most immediate choice is collapsing it into its α-carbon                       243
                    ⎪
                    ⎩ Ai , j = 0 otherwise                                          location (thereinafter indicated as Cα): correspondingly, the                         244
                                                                                    position of the entire amino acid in the sequence is collapsed                        245
     200     The adjacency matrix of a graph is unique with respect to the          into the corresponding Cα.                                                            246
     201   chosen ordering of nodes. In the case of proteins, where the                The spatial position of Cα is still reminiscent of the protein                     247
     202   ordering of nodes (residues) corresponds to the residue                  backbone; thus residues that are immediately close in sequence                        248
     203   sequence (primary structure), we can state that its correspond-          are separated by a length of 3−4 Å, corresponding to the                              249
     204   ing network is unique. This is one extremely strong                      peptide bond length36 (see Figure 3); other α-carbons have a                          250 f3
     205   consequence that establishes a 1 to 1 correspondence between             position that recalls the secondary domains and still reproduce,                      251
     206   the molecule and its corresponding graph.                                even in a very bare representation, the key features of the three-                    252
     207     Let v ∈ V be a vertex of a graph G; the neighborhood of v is           dimensional structure.                                                                253
     208   the set N(v) = {u ∈ G | e(u,v) ∈ E}. Two vertices u and v are               As soon as the complex protein structure architecture has                          254
     209   adjacent or neighbors, when e = (u,v) ∈ E (u ∈ N(v) or v ∈               been reduced to a simpler picture in terms of Cα position, the                        255
     210   N(u)). The degree ki of the ith node is the number of its                spatial topology can be further reduced to a contact topology                         256
     211   neighbors, defined on the basis of the adjacency matrix as                that represents the network of inter-residue interactions,                            257
                                                                                    primarily responsible for the protein’s three-dimensional                             258
                     N
                                                                                    structure and activity. Thus, the interaction topology is derived                     259
              ki =   ∑ Aij                                                          by the spatial distribution of residues in the crystal three-                         260
                     j=1                                                            dimensional structure and represents the overall intramolecular                       261
                                                                                    potential.                                                                            262
     212   When ki = 0, the ith node is said to be isolated in G, whereas if           Specifically, starting from the Cα spatial distribution, the                        263
     213   ki = 1, it is said to be a leaf of the graph.                            distance matrix d = {dij} is computed, the generic element dij                        264
     214      Information may be attached to edges, in this case we call the        being the Euclidean distance in the 3D space between the ith                          265
     215   graph weighted and we refer to the weights as “costs”. A                 and jth residues (holding the sequence order). The interaction                        266
     216   weighted graph is defined as G = (V, E, W), where W is a                  topology is then computed on the basis of d: if the distance dij                      267
     217   function assigning to each edge of the graph a weight:                   falls into a given spatial interval (said cutoff), a link exists                       268

              W: E →                                                               between the ith and the jth residues. The definition of the type                       269
                                                                                    of the graph (unweighted or weighted) is made in order to                             270
     218      The adjacency matrix A of a weighted graph is defined as               describe a given kind of interaction, in a more or less detailed                      271
     219   follows:                                                                 fashion.                                                                              272

                                                                                C                          dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                               Review

                                                                                represented by all of its atoms. Then, the distance matrix is                       291
                                                                                computed over all protein atoms, which are labeled according                        292
                                                                                to the residue they belong to. The strength of the interaction                      293
                                                                                between two residues is measured as the number of their atoms                       294
                                                                                whose distance lies within .15,39,46−49                                             295
                                                                                   Eventually, a straightforward way to establish a weighted                        296
                                                                                protein contact network is to take the inverse of the distance                      297
                                                                                among two residues as a direct measurement of their mutual                          298
                                                                                interaction: the closer they lie, the stronger their mutual                         299
                                                                                interaction.50,51                                                                   300
                                                                                   Another kind of representation is based on the same criterion                    301
                                                                                but adopts as nodes the 20 different amino acids, which are                          302
                                                                                combined through the peptide bond backbone in the protein                           303
                                                                                primary structure. The link between two residues is represented                     304
                                                                                by the number of links the residues of those types establish in                     305
                                                                                the three-dimensional structure, according to the distance                          306
                                                                                matrix d and the cutoff interval , as a rule. This method can be                     307
                                                                                applied on an ensemble of protein structures,43,52 in order to                      308
                                                                                find a common rule of protein structure construction, in terms                       309
                                                                                of more probable contacts between residues.                                         310
                                                                                   This representation, while keeping track of the nature of the                    311
                                                                                interacting residues, destroys the one-to-one correspondence                        312
      Figure 3. Geometry of the peptide bond: the upper threshold of 8 Å,
                                                                                with the original 3D structure, given that different structures                      313
      commonly introduce in the of PCNs, roughly corresponds to two
      peptide bond lengths.36                                                   can give rise to the same representation in a way analogously to                    314
                                                                                the structure isomerism in organic chemistry. Figure 4 reports                      315 f4

273      The choice of determines the kind of interactions included             the two kind of formulas.                                                           316

274   in the analysis.17,37 Most authors15,16,38,39 consider only an               The first emerging property of the PCNs is the degree of the                      317

275   upper threshold (around 8 Å) to cut off negligible interactions;           corresponding graph, i.e., the average number of links each                         318

276   some others, conversely, introduce also a lower limit, around 4           node (residue) establishes with neighbors. It is a direct measure                   319
277   Å, that corresponds to the average value of the peptide bond              of connectivity attitude of residues within the interaction                         320
278   length, so to eliminate the “noise” due to the “obliged” contacts         network and it is strictly linked to the attitude of residue to                     321
279   coming from sequence proximity. In this way, only significant              establish noncovalent interactions with other resi-                                 322
280   noncovalent interactions are included in the analysis, with the           dues.28,38,40,46,49,54−56 The average degree, on the other hand,                    323
281   purpose of including only those interactions that may be                  is a measure of the overall protein connectivity that is a rough                    324
282   modified upon slight environment changes, such in the case of              index of the protein stability.                                                     325
283   biological response to environment stimuli.                                  The contact density of a protein decreases exponentially with                    326
284      Many authors use unweighted graphs to represent                        the number of residues; thus, bigger proteins are much less                         327
285   PCNs,16,35,40−46 in order to infer several properties while               compact than smaller ones, giving rise to bigger cavities and a                     328
286   keeping minimal information. On the other hand, some other                more fuzzy distinction between internal and external                                329
287   groups propose a description for the PCNs using weighted                  milieu.57,58                                                                        330
288   graphs that is based on a side chain level reduction of the whole            The degree distribution defines the graph model, allowing us                      331
289   protein structure. In this case, all the information regarding the        to classify the network into already established network classes                    332
290   spatial position of atoms is kept and the single residue is               endowed with specific features (e.g., random graphs, scale-free                      333




      Figure 4. Graph protein formulas: (a) contact map53 and (b) wheel diagram.27

                                                                            D                        dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                Review

334   networks, regular lattices) as we will show in the next                            3 × (the number of triangles on a graph)
335   paragraphs.                                                                C=
                                                                                        the number of connected triples of vertices                        (2.2)     391
      2.3. Shortest Paths, Average Path Length, and Diameter
336   In a graph G, the distance spv,u between any two vertices v,u ∈         where a “triangle” corresponds to three vertices that are each                         392
337   V is given by the length of the shortest path between the               connected to each other and a “connected triple” means a                               393
338   vertices, that is, the minimal number of edges that need to be          vertex that is connected to an (unordered) pair of other                               394
339   crossed to travel from vertex v to vertex u. The shortest path          vertices. The factor of 3 in the numerator accounts for the fact                       395
340   between two vertices is not necessarily unique, since different          that each triangle contributes to three connected triples of                           396
341   paths may exist with identical length. In a graph, if no path           vertices, one for each of its three vertices; thus, the value of C                     397
342   exists connecting two nodes v, u ∈ V, we say that those nodes           lies strictly in the range from zero to one.                                           398
343   belong to different connected components; in such a case, we                With regard to PCNs, the clustering coefficient referred to                           399
344   call the graph disconnected.                                            the ith residue measures the triangles number insisting on                             400
345      All PCNs are connected graphs at first glance. The so-called          it;15,28,29,38,40,45−47,49,56,75 thus, high clustering coefficient nodes                 401
346   “percolation threshold” of a PCN can be estimated as the                are central in communities with a large number of                                      402
347   number of edges to be destroyed in order for the PCN to lose            interconnecting links, corresponding to high local stability. In                       403
348   its connectivity.                                                       other words, we can infer that mutation producing depletion of                         404
349      This concept becomes relevant when we focus on long-range            such nodes may cause dramatic changes in the protein                                   405
350   contacts (i.e., contacts between residues far away on the               structure.29                                                                           406
351   sequence,23,59 which were demonstrated to be of crucial                     2.4.1. Spectral Clustering. The spectral analysis of a graph                       407
352   importance in protein folding rates,23,25 as we will see in             allows one to identify clusters in the network by minimizing the                       408
353   more detail below.                                                      value of parameter Z defined as45                                                       409
354      The diameter diam(G) = max{spv,u|v, u ∈ VG} of a graph is                           n   n
      defined as the maximal distance of any pair of vertices. The                       1
355
      average or characteristic length l(G) = ⟨spv,u⟩ is defined as the
                                                                                 Z=         ∑ ∑ (xi − xj)2 Aij
356                                                                                     2   i=1 j=1
357   average distance between all pairs of vertices; the average
358   inverse path length (efficiency) is defined as eff(G) = ⟨1/spv,u⟩;          where xi and xj represent the position of nodes i and j in the                         410
359   this descriptor is particularly suitable when components are            network and Aij is the adjacency matrix. The minimum of Z                              411
360   disconnected (in this case, the contribution of infinite distances       corresponds to the second smallest eigenvalue of the Laplacian                         412
361   corresponds to zero efficiency).                                          matrix L of {Aij}, also known as the Kirchoff matrix, defined as                         413
362      The shortest path spv,u between two residues of a PCNs
363   represents a molecular shortcut that connect the residues                  L=D−A
364   through a mutual interaction pathway. In this sense, the smaller
365   the spu,v, the tighter the relationship between the two nodes,          where D is the degree matrix, which is a diagonal matrix in                            414
366   which are strictly correlated, regardless of their distance in a        which {Dii} = ki. Once L eigenvalues λ are computed, the                               415
367   sequence. These tight relations are thought to be responsible           second smallest eigenvalue λ2 corresponds to the minimum                               416
368   for the allosteric response in protein ligand binding42,60−65 and       value of Z (the first one provides a trivial solution45). The                           417
369   in the concerted motions of distinct protein regions in protein         components of the corresponding eigenvector v2, known as the                           418
370   dynamics.64,66−71                                                       Fiedler eigenvector, refer to single nodes and define two                               419
371      In general, still preliminary evidence from our group (work          clusters depending on the sign of each component. Nodes are                            420
372   in progress) points to the average shortest path as the most            parted into two clusters according to the sign of the                                  421
373   crucial network invariant to link topology to both molecules’           corresponding component in v2. This process can be iterated                            422
374   dynamics and the general thermodynamical properties of the              on both subnetworks until all the components of v2 show the                            423
375   protein molecules.                                                      same sign.                                                                             424

      2.4. Clustering on Graphs                                                  Identification of clusters in PCNs has a strong impact on                            425
                                                                              detecting structural and functional domains in pro-                                    426
376   Identifying clusters on a network is a more complicated task            teins.50,54,76,77 The presence of folding clusters is a key point                      427
377   than computing the average shortest path. The clustering                in the molecular development of the funnel folding pathways                            428
378   coefficient measures the cliquishness of a typical neighborhood           theory, which provides the most reasonable molecular                                   429
379   (a local property). One possible definition is the follow-               mechanism for protein folding, out of the random approaches                            430
380   ing:29,30,40,72 let us define the clustering coefficient of the ith        of residues, in order to form the favorable inter-residue                              431
381   node Ci as                                                              interaction network, providing stability to the tertiary                               432
                the number of connected neighbor pairs                        structure.78,79                                                                        433
         Ci =                1                                                   The reliable identification of clusters in the PCNs allows for                       434
                               k (k
                             2 i i
                                      − 1)                        (2.1)       the definition of descriptors at the single residue level, relying                      435
382
                                                                              on the PCNs partition structure.                                                       436
383   where ki is the degree of the ith vertex; the average clustering           2.4.2. Intracluster and Extracluster Parameters. Once                               437
384   coefficient C of the graph is the average of Ci values over all           the clustering process is performed, two parameters, zi and Pi,                        438
385   nodes.                                                                  representing the modularity rate for each node,80 can be                               439
386      For social networks, Ci and C have intuitive meanings: Ci            computed. These two parameters are defined as                                           440
387   reflects the extent to which friends of i are also friends of each
388   other; thus, C measures the cliquishness of a typical friendship                  kis − kis
                                                                                                ̅
389   circle.                                                                    zi =
390      Another definition for C is73,74                                                    σis                                                            (2.3)     441

                                                                          E                           dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                 Review

                         nM⎛ kis ⎞2
              Pi = 1 − ∑ ⎜ ⎟
                       s=1 ⎝ i ⎠
                             k                                           (2.4)
     442

     443   where kis is the number of links the ith node establishes with
     444   nodes belonging to its own cluster si; k̅si is the average degree
     445   for nodes in cluster si, σsi is the corresponding standard
     446   deviation, and nM is the number of clusters to which the ith
     447   node belongs to. The spectral clustering performs a “crispy”
     448   partition, namely, clusters are disjoint sets of nodes, thus eq 2.4
     449   becomes

                       ⎛ kis ⎞2
              Pi = 1 − ⎜ ⎟
     450
                       ⎝ ki ⎠                                            (2.5)

     451      These parameters have been introduced to discriminate
     452   nodes according to their topological role in the so-called
     453   Guimerà and Amaral’s cartography,80 the aim of which is the
     454   classification of nodes in a modular network, relying on intra-
     455   and intermodule connectivities.81
     456      In their seminal work,80 Guimerà and Amaral demonstrated
     457   that the relative importance of each node in maintaining the
     458   global graph connectivity can be traced back to its location in
     459   the P, z plane.
     460      Once the network is partitioned into a set of meaningful
     461   communities, it is possible to compute statistics for how
     462   connected each hub (a hub is a node having an extremely high
     463   degree of connectivity) is both within its own community and
     464   to other communities: hubs endowed with strong connections
     465   within functional modules were assumed to be interacting with
     466   their partners at once (party hubs); conversely, those with a
     467   low correlation were assumed to link together multiple modules
     468   (date hubs), playing a global role in the network. It is worth
     469   stressing that although both hub types have similar essentiality
     470   in the network, as the characteristic path length increases,              Figure 5. P vs z plot (dentist’s chair) (a) for a single protein, where
     471   deleting given hubs, the network begins to disintegrate, since            each point identifies a residue, and (b) the superposition of structures
     472   hubs provide the coordination between functional modules. To              analysis of 1420 proteins58.
     473   make a comparison, party hubs should correspond to Guimerà
     474   “provincial hubs”, which have many links within their module
                                                                                        The strong invariance of the P, z portraits of PCNs                                499
     475   but few outside, whereas date hubs could be “nonhub
                                                                                     irrespective of both protein general shape and size is extremely                      500
     476   connectors” or “connector hubs”, both of which have links to
                                                                                     intriguing, given it suggests the existence of still hidden                           501
     477   several different modules; they could also fall into the “kinless”
                                                                                     mesoscopic principles of protein structures analogous to                              502
     478   roles, since very few nodes are actually found in these
                                                                                     valence rules in general chemistry.                                                   503
     479   categories.82 Considering network motifs, it was observed
     480   that party hub network motifs control a local topological                 2.5. Network Centralities
     481   structure and stay together inside protein complexes, at a lower          The centrality of a node deals with its topological features in                       504
     482   level of the network. On the other hand, date hub network                 network wiring. The term “central” stems from the origin of                           505
     483   motifs control the global topological structure and act as the            this concept in the definition of key, central indeed, nodes of                        506
     484   connectors among signal pathways, at a high level of the                  social networks: people, in other words, that are responsible for                     507
     485   network. Network motifs should not be merely considered as a              the stability and activity of the network. This “social science”                      508
     486   connection pattern derived from topological structures but also           origin of the concept of centrality was found to have a                               509
     487   as functional elements organizing the modules for biological              correlation in PCNs by Csermely.83                                                    510
     488   processes.67                                                                 Centrality can be computed in different ways, using different                        511
     489      Spectral clustering of PCNs produces characteristic P−z                weights to evaluate and compare the importance of a node                              512
     490   diagrams, referred to as “dentist’s chair”, due to their                  (degree, clustering coefficient, for instance). They are almost                         513
     491   shape.28,29,58 This shape is strongly invariant with respect to           equivalent definitions that point to the same attitude of central                      514
f5   492   the protein molecule, as shown in Figure 5; panel a refers to a           nodes, to establish strong local interactions, in their own local                     515
     493   typical diagram derived from the analysis of a single protein             community, able to stabilize the whole network structure.                             516
     494   structure, while panel b shows the superposition of a structure              The role of central nodes in modifying the network structure                       517
     495   analyses of 1420 proteins. The fact the general shape of the              according to their centrality values is the starting point to define                   518
     496   graph remains substantially invariant on going from one to                the property of centrality−lethality,81,84 which emerges as a key                     519
     497   1420 proteins is an impressive proof of the robustness of the P,          element in the analysis of biological networks, where central                         520
     498   z organization of PCNs.                                                   nodes represent a prerequisite for the organism survival: for                         521

                                                                                 F                          dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                 Review

522   instance, a shortage or a depletion of a central protein in a                                                   σv , u(s)
523   protein−protein interaction network does lead to the death of               betw(s) =        ∑         ∑
                                                                                                                        σv , u
524   the organism.85                                                                            v∈V ,v≠s u∈V ,u≠s
525     Central nodes in PCNs correspond to residues crucial for
                                                                                  In biological networks (e.g., protein−protein interaction                           580
526   both the protein structure folding and stability. Thus, the
                                                                               network), the nodes with higher between-ness were demon-                               581
527   centrality of a node can be a measure of the biological                  strated to be the main regulators.84,88,89                                             582
528   consequences of its mutation; for instance, the highly                      In PCNs, since the between-ness centrality is based on                              583
529   detrimental mutation of hemoglobin that causes sickle cell               shortest paths, it comes immediately clear that this index is                          584
530   anemia is due to just a substitution of one residue (glutamic            strongly linked to the centrality of nodes (residues) in terms of                      585
531   acid is replaced in position 6 by valine) that produces dramatic         their capability to transfer signals throughout the protein                            586
532   changes in the protein structure and function. This effect is             molecule.43,56,72,86 Thus, the depletion of residues having high                       587
533   widely reflected in the high centrality value of this specific             between-ness centrality values is supposed to interrupt the                            588
534   residue.29                                                               allosteric communication among regions of the proteins that lie                        589
535     The easiest and the most natural way to define centrality is            far apart.                                                                             590
536   provided by the so-called degree centrality that simply counts
                                                                               2.6. Network Assortativity and Nodes Property
537   the number of connections for each node, its degree, i.e., the
                                                                               Distribution                                                                           591
538   number of nodes it is directly connected with; in this case, the                                 90
539   degree centrality of a node vi corresponds to its degree. Hubs           Newman suggested that an important driving factor in the                               592
540   are, thus, the central nodes of a network, according to this             formation of communities was the preference of nodes to                                593
541   paradigm.                                                                connect to other nodes that possess similar characteristics; he                        594
542      2.5.1. Path-Based Centralities: Closeness and Be-                     defined this behavior as assortativity. The concept of                                  595

543   tween-ness. Closeness centrality, as well as between-ness,               assortativity is a very general one, so in the case of protein                         596

544   belong to the class of shortest path-based centrality measures.          structures, we could identify the “behavior” of different residues                      597

545   The closeness centrality provides information about how close            in terms of their hydrophobic/hydrophilic character so that an                         598

546   a node is to all other nodes. The closeness of a node vi is              assortative structure will correspond to a network in which                            599

547   defined as                                                                similar hydrophobicity residues will be preferentially in contact                      600
                                                                               with each other compared to what is expected by pure chance.                           601
                          1                                                    In this example, the “behavior” of the nodes corresponds to a                          602
         c(vi) =      n                                                        feature (hydrophobicity) independent of the pure network                               603
                   ∑ j = 1 spi , j
                                                                               wiring and can be equated to a “coloring” of the nodes, whose                          604
                                                                               relations with the underlying topological support constituted by                       605
548      The closeness centrality is connected to the aptitude of a            network wiring is investigated. Along similar lines, we could                          606
549   node to participate in the signal transmission throughout the            think of assortative social networks in which friends (nodes in                        607
550   protein structure. High closeness centrality nodes were                  direct contact) tends to share the same political ideas, income                        608
551   demonstrated to correspond to residues located in the active             classes, or professional activities. On a different heading, we can                     609
552   site of ligand-binding proteins or to evolutionary conserved             think of assortativity as an “internal” description of network                         610
553   residues.41,52,72,86,87                                                  wiring in which nodes are defined in terms of their connection                          611
554      It is worth noting that closeness centrality, at odds with            patterns. Actually, in some networks high-degree nodes                                 612
555   degree centrality, it is not solely based on local features of the       preferentially connect to other high-degree nodes (assortative                         613
556   network but takes into account the location of the node in the           networks), whereas in other types of networks high-degree                              614
557   global context of the network it is embedded into. In this               nodes connect to low-degree nodes (disassortative networks);                           615
558   respect, closeness, as well as between-ness, are genuine systemic        in particular, numerical evidence from experimental data have                          616
559   properties that are computed at the single node level, thus              shown that many biological networks exhibit a negative                                 617
560   establishing a “top-down” causative process. This is probably            assortativity coefficient and are therefore claimed to be                                618
561   the reason for the efficiency of this kind of network invariants           examples of disassortative mixing.40,91                                                619
562   to single out relevant general properties of the proteins.                  Assortativity r is defined as the Pearson correlation                                620
563      This is formally analogous to what happens in basic                   coefficient of degrees at either ends of an edge, and it varies                          621
564   chemistry, where the properties (i.e., acidity, electronegativity)       as −1 ≤ r ≤ 1;92 r is a very simple measure of the probability of                      622
565   of the hydrogen atom in the CH4 molecule are different from               a high-degree node to form edges with other high-degree                                623
566   those of the hydrogen atoms in H2O or H2 molecules, because              nodes. When the r value is close to 1, the network is addressed                        624
567   of the general molecular context they are embedded into.                 to as assortative, whereas values of r close to −1 are                                 625
568      This is the same philosophy of single node (residue)                  characteristic of disassortative networks. Random graphs are                           626
569   descriptors, implicitly taking into account the whole context            purely nonassortative networks, since by definition, links                              627
570   and so overcoming a purely reductionist view.                            between nodes, in this case,                                                           628
571      Between-ness measures the ability of a vertex to monitor                          N
572   communication between other vertices; every vertex that is part
573   of a shortest path between two other vertices can monitor and
                                                                                  k(v) =   ∑ w(ui , v)
                                                                                           i=1
574   influence communication between them. In this view, a vertex
575   is central if lots of shortest paths connecting any two other            are placed at random.                                                                  629
576   nodes cross it. Let σv,u denote the number of shortest paths                In the case of external “coloring” assortativity, the index r,                      630
577   between two vertices v, u ∈ V and let σv,u(s), where s ∈ V, be           instead of being computed on the nodes degree, can be                                  631
578   the number of shortest paths between v and u crossing s;                 computed over the feature of interest, the one used to “color”                         632
579   trivially σv,u ≥ σv,u(s).                                                the nodes.                                                                             633

                                                                           G                           dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                             Review

634      Thus, in a recent work,57 Di Paola et al. demonstrated the             One is called Gn,m and is the set of all graphs consisting of n                   687
635   lack of any clearly defined “hydrophobic core” in proteins, for          vertices and m edges, and it is built by throwing down m edges                      688
636   which the arrangement of fractal structures was demonstrated            between vertex pairs chosen at random from n initially                              689
637   not to have a clear-cut separation between internal and external        unconnected vertices.                                                               690
638   milieu by means of network assortativity measures based on the            The other is called Gn,p and it is the set of all graphs                          691
639   hydrophobicity of nodes. Moreover, the presence of both                 consisting of n vertices, where each pair is connected together                     692
640   assortative and disassortative structuring (hydrophobic−hydro-          with independent probability p. In order to generate a graph                        693
641   phobic and hydrophilic−hydrophobic) in proteins highlighted             sampled uniformly at random from the set Gn,p, initially                            694
642   the presence of different “folding logic” contemporarily present         unconnected vertices are taken and each pair of them is joined                      695
643   in the protein world, probably as a consequence of the varying          with an edge with probability p (1 − p being the probability of                     696
644   relevance of hydrophobic and electronic forces in the folding           being unconnected). Thus, the presence or absence of an edge                        697
645   process.                                                                between two vertices is independent of the presence or absence                      698
646      Generally speaking, the distribution of a given feature of the       of any other edge, so that each edge may be considered to be                        699
647   nodes can be explored through the combined definition of                 present with independent probability p. The two models are                          700
648   diadicity and heterophilicity,93 measuring the tendency of              essentially equivalent in the limit of a large number of nodes n.                   701
649   nodes with similar properties to form links. Given a key                Since Gn,p is somewhat simpler to work with than Gn,m, it is                        702
650   physical property, if nodes show an attitude to establish               usual to refer to it as a random graph Gn,p.                                        703
651   preferentially links with similar nodes, the network is named as          A vertex in a random graph is connected with equal                                704
652   dyadic, otherwise it is said to be antidyadic or heterophilic.93        probability p to each of the N − 1 other vertices in the graph,                     705
653      Let n1 and n0 respectively denote the number of node                 and hence, the probability pk that it has degree k is given by the                  706
654   possessing or not a specific property; e10 and e11 are the number        binomial distribution                                                               707
655   of edges connecting homologous and heterelogous nodes,
                                                                                      ⎛N ⎞
656   respectively. The heterophilicity score H is then defined as                pk = ⎜ ⎟pk (1 − p)N − k
                 e                                                                    ⎝k⎠                                                              (2.10)     708
          H = 10
                e10,r                                             (2.6)       Noting that the average degree of a vertex in the network is z =                    709
657
                                                                              (N − 1)p, we can also write this as                                                 710
658   where e10,r is the random value in case of uniform distribution
659   of the property among nodes that depends on the number of                             (N − 1)!       zk    ⎛      z ⎞
                                                                                                                             N−k
                                                                                 pk =                            ⎜1 −      ⎟
660   possible edges E = N(N − 1)/2, N = n1 + n0 beingthe number                        k! (N − 1 − k)! (N − 1)k ⎝    N − 1⎠
661   of nodes:
                                                                                        z k e −z
         e10,r = En1(N − n1)                                                        ≃
662                                                               (2.7)                   k!                                                           (2.11)     711

663   Analogously, as for the homologous contacts, it is defined the           where the second equality gets exact as N → ∞; in this case, pk                     712
664   dyadicity D as                                                          corresponds to the bell-shaped curve that peaks on the average                      713

                e                                                             value (Figure 6b).                                                                  714 f6
         D = 11
665
               e11,r                                           (2.8)
666   and the corresponding value for random homologous nodes is
                     n1(n1 − 1)
         e11,r = E
667                       2                                       (2.9)
668      Thus, dyadic networks have D values larger than 1 and, on
669   the other hand, H values lower than unity.
670      The above-described network invariants provide a descrip-
671   tion that can be traced back to the single node of a network, but       Figure 6. Random graph: (a) a sample picture, where most nodes have
672   the effective values of the descriptors strongly depend on the           three or four links, and (b) the bell-shaped degree distribution.
673   general wiring architecture of the whole graph, again a systemic
674   top-down causation metric. The dyadic character of PCNs was
675   exploited by Alves and colleagues94 to define simple hydro-                Random graphs have been employed extensively as models                            715

676   phobicity scores to profile protein structure. Single residue            of real-world networks of various types, particularly in                            716

677   hydrophobicity was demonstrated to be strongly correlated               epidemiology,74 where the spreading of a disease through a                          717

678   with the corresponding network invariants:56 these systemic             community strongly depends on the pattern of contacts                               718

679   properties strictly depend upon the “general class” the specific         between infected subjects and those susceptible to it.                              719

680   graph pertains to. Below we will briefly present the main classes          However, as a model of a real-world network, a random                             720

681   of wiring architectures.                                                graph has some serious shortcomings. Perhaps the most serious                       721
                                                                              one is its degree distribution, which is quite unlike those seen in                 722
      2.7. Models of Graphs                                                   most real-world networks.92 On the other hand, the random                           723
682      2.7.1. Random Graphs. One of the simplest and oldest                 graph has many desirable properties; specifically, many of its                       724
683   network models is the random graph model,95 which was                   properties can be calculated exactly.92                                             725
684   introduced by Solomonoff and Rapoport96 and studied                        The random graph model has been applied to PCNs to test                           726
685   extensively by Erdös and Rènyi;97−99 according to their               their connectivity (degree) distribution.48,75 Specifically, the                     727
686   works, there are two different random graph models.                      protein dynamic properties have been explored in terms of                           728

                                                                          H                        dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                Review

     729   random graphs, since the unbiased corresponding network                    all nodes, and for γ = 2, a hub network emerges, with the largest                   772
     730   dynamics can be put into the perspective of the random                     hubs being in contact with a large fraction of all nodes.107 In                     773
     731   evolution of the protein structure, due to random, Brownian                general, the unusual properties of scale-free networks are valid                    774
     732   motion of protein segments to get up to the final, stable                   only for γ < 3, such as a high degree of robustness against                         775
     733   conformation.100                                                           accidental node failures.85 For γ > 3, however, most unusual                        776
     734      Further, the generic random graph model is introduced as a              features are absent, and in many respects, the scale-free network                   777
     735   reference to test the property of the network as a specialized             behaves like a random one.85 As for the World Wide Web,                             778
     736   random             graph            (small-world             net-          Barabàsi105 found that the value of γ for incoming links was                       779
     737   work).15,35,38,40,41,43,47,49,72,101−103 This comparison has a tight       approximately 2; this means that any node has roughly a                             780
     738   link with the common assumption of the random coil structure               probability 4 times bigger to have half the number of incoming                      781
     739   as being a reference state in folding thermodynamics: the                  links than another node.                                                            782
     740   random coil has the corresponding translation in terms of a                   Different from a Poisson degree distribution of random                            783
     741   “graph formula” into the random graph model that represents a              networks, a power law distribution does not have a peak, but it                     784
     742   random network of residue interactions, corresponding to a                 is described by a continuously decreasing function (Figure 8):                      785 f8
     743   random distribution of the inter-residue distance.
     744      In their work,104 Bartoli and colleagues demonstrated PCNs
     745   are very far from random graph behavior, this was particularly
     746   evident when they projected simulated networks together with
     747   real PCNs in the bidimensional space, spanned by the
     748   clustering coefficient and characteristic path length (see Figure
f7   749   7).




                                                                                      Figure 8. Scale-free networks: (a) a sample scale-free networks, in
                                                                                      which few nodes have many links, and (b) the degree distribution of
                                                                                      the scale-free graph power law.106


                                                                                      in this case, it is evident that a specific characteristic average                   786
                                                                                      degree does not exist; in other words, these networks do not                        787
                                                                                      converge toward a characteristic degree, at increasing number                       788
           Figure 7. Characteristic path length vs Clustering Coefficient (Figure       of nodes. On the contrary, in scale-free networks, the average                      789
           3 in ref 104): sample protein classes are labeled as CA#, the label        degree progressively increases with sampling dimension,                             790
           “random” refers to collection of random graphs, whereas “regular”
           points to periodic lattices.
                                                                                      because the (very rare) high-degree nodes are sampled with a                        791
                                                                                      higher probability. The lack of a characteristic degree is on the                   792
                                                                                      basis of the denomination “scale free” for this kind of                             793
     750      The authors demonstrated the difference between random                   architecture.                                                                       794
     751   graph and contact maps derive from the existence of the                       This is in strong contrast to random networks, for which the                     795
     752   covalent backbone, that imposes very strict constraints to the             degree of all nodes is in the vicinity of the average degree,                       796
     753   contact that can be established between residues. This feature             which could be considered typical. However, as Barabàsi and                        797
     754   makes PCNs to more similar to the so-called scale-free graphs.             colleagues wrote in,107 scale-free networks could easily be called                  798
     755      2.7.2. Scale-Free Graphs. Since many years from the                     scale-rich as well, as their main feature is the coexistence of                     799
     756   seminal work of Erdös and Rènyi,97 all complex networks are              nodes of widely different degrees (scales), ranging from nodes                       800
     757   treated commonly as random graphs. This paradigm was                       with one or two links to major hubs.                                                801
     758   outdated by the pioneristic work of Barabàsi,105 in which the                In contrast to the democratic distribution of links typical of                   802
     759   topology of the World Wide Web was studied, formerly                       random networks, power laws describe systems in which few                           803
     760   thought to show a bell-shaped degree distribution, as in the case          hubs dominate:105 networks that are characterized by a power-                       804
     761   of random graphs.                                                          law degree distribution are highly nonuniform, most of the                          805
     762      Instead, by counting how many Web pages have exactly k                  nodes having only a few links. Only few nodes with a very large                     806
     763   links the authors showed that the distribution followed a so-              number of links, which are often called hubs, hold these nodes                      807
     764   called power law, namely, the probability that any node is                 together.                                                                           808
     765   connected to k other nodes is                                                 A key feature of many complex systems is their robustness,                       809

              pk = αk  −γ                                                             which refers to the system’s ability to respond to changes in the                   810
                                                                                      external conditions or internal organization while maintaining                      811
     766   where γ is the degree exponent and α is the proportionality                relatively normal behavior.107 In a random network, disabling a                     812
     767   constant. The value of γ determines many properties of the                 substantial number of nodes will result in an inevitable                            813
     768   system. The smaller the value of γ, the more important the role            functional disintegration of a network, breaking the network                        814
     769   of the hubs is in the network. Whereas for γ > 3 the hubs are              into isolated node clusters.107                                                     815
     770   not relevant, for 2 < γ < 3 there is a hierarchy of hubs, with the            Scale-free networks do not have a critical threshold for                         816
     771   most connected hubs being in contact with a small fraction of              disintegration (percolation threshold108): they are amazingly                       817

                                                                                  I                        dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Chemical Reviews                                                                                                                                 Review

     818   robust against accidental failures: even if 80% of randomly                 follows modular topological organization; this assumption has                       861
     819   selected nodes fail, the remaining 20% still form a compact                 been applied to biological neural networks, showing that the                        862
     820   cluster with a path connecting any two nodes.107 This is                    dynamic behavior of neural networks might be coordinated                            863
     821   because random failure is likely to affect mainly the several                through different topological features,111 such as network                           864
     822   small degree nodes, whose removal does not disrupt the                      modularity and the presence of central hub nodes. A similar                         865
     823   networks integrity.85 This reliance on hubs, on the other hand,             topology/dynamics relation seems to hold for contact net-                           866
     824   induces a so-called attack vulnerability: the removal of a few key          works, too. As a matter of fact, allosteric “hot spots”,65 where                    867
     825   hubs splinters the system into small isolated node clusters.85              the motion is generalized from a local excitation to the entire                     868
     826      Scale-free architecture can exhibit the so-called “small word            protein structure, correspond to central residue contacts, which                    869
     827   property”.38,104 The small word model has its roots in the                  were demonstrated to be crucial for efficient allosteric                              870
     828   observation that many real-world networks show the following                communications.41,59,66,72,86                                                       871
     829   two properties: (i) the small-world effect (i.e., small average                 The field of relations between molecular dynamics                                 872
     830   shortest path length) and (ii) high clustering or transitivity,             trajectories and topological contact network description is a                       873
     831   meaning that there is a heightened probability that two vertices            very important avenue of research in protein science.62−64,66,70                    874
     832   will be connected directly to one another if they have another
     833   neighboring vertex in common.                                               3. APPLICATIONS
     834      The former property is quantified by the characteristic path              3.1. Networks and Interactions
     835   length (or average shortest path) l of the graph, while the
     836   second property is computed as the clustering coefficient C.                  It is well-known that proteins interact among themselves and                        875
     837   Thus, small-world effect means that the average shortest path in             with other molecules to perform their biological functions;69                       876
     838   the network scales logarithmically with graph size73,109,110                crucial factors in all interactions are the shape and chemical                      877
                                                                                       properties of the pockets located on protein surfaces, which                        878
              l ∝ log(N )                                                              show high affinity to binding sites. In a recent work,112 the                         879
                                                                                       analysis of topological properties of the pocket similarity                         880
     839   where N is the number of nodes.                                             network demonstrated that highly connected pockets (hubs)                           881
     840      PCNs were analyzed as for their scale-free properties, in                generate similar concavity patterns on different protein surfaces.                   882
     841   order to identify crucial binding sites.43,59 The small-world               These similarities go hand-in-hand with similar biological                          883
     842   behavior of protein structure networks was shown for the first               functions that imply similar pockets.112 In addition, they found                    884
     843   time by Vendruscolo et al.43 and later confirmed in several                  that maximum connected components in the pocket similarity                          885
     844   works.38,75 As we stretched before, it was shown that small-                networks have a small-world and scale-free scaling. The analysis                    886
     845   world behavior of an inter-residue contact graph is conditioned             of the physicochemical features of hub pockets leads to the                         887
     846   by the backbone connectivity.104                                            investigation of more functional implications from the similarity                   888
     847      According to both,59,104 PCNs are not “pure small-world”                 network model, which provided new insights into structural                          889
     848   networks, given that no explicit hub is present, so they must be            genomics and have great potential for applications in functional                    890
     849   considered as “a class of network in its own”, generated by the             genomics.113 The future purpose is to develop a classification                       891
     850   very peculiar constraint to mantain a continuous (covalent)                 method to divide similar pockets into small groups and                              892
     851   backbone joining the nodes in a fixed sequence.59,104                        afterward to compile this evolutionary information into a library                   893
     852      Nevertheless, the most important feature of small-world                  of functional templates.                                                            894
     853   architecture, i.e., the presence of shortcuts allowing for an                  This work delineates a possible link between network wiring                      895
     854   efficient signal transmission at long distance, is present in PCNs            and common function of utmost interest for the development                          896
     855   and it is the very basis of their physiological role (allostery,            of contact-based meaningful formulas. By briefly describing                          897
f9   856   dynamical properties, folding rate, etc.) (Figure 9).                       direct translation of graph theoretical descriptors into mean-                      898
     857      In this respect, it is relevant to go more in-depth into the link        ingful protein functional properties, we gave a proof-of-concept                    899
     858   existing between a given topology and the dynamical behavior it             of the general relevance of the proposed formalism. Now we                          900
     859   can host. As a matter of fact, according to a pattern-based                 will go more in-depth into some of these topology−function                          901
     860   computational approach,111 modular dynamic organization                     relations, but a leading leitmotiv can be already stated: the                       902
                                                                                       structure−function link passes through a topological bottle-                        903
                                                                                       neck, the contact network, that allows for a consistent and very                    904
                                                                                       efficient formalism to be applied to the study of macro-                              905
                                                                                       molecules.                                                                          906
                                                                                       3.2. Protein Structure Classification
                                                                                       Proteins can be considered as modular geometric objects                             907
                                                                                       composed of blocks, so allowing for a peptide-fragment-based                        908
                                                                                       partition.114 For instance, it is well-known that globular                          909
                                                                                       proteins are made up of regular secondary structures (α-helices                     910
                                                                                       and β-strands) and nonregular secondary regions, called loops,                      911
                                                                                       that join regular secondary structures and lack the regularity of                   912
                                                                                       torsion angles for consecutive residues; actually, many families                    913
                                                                                       of proteins evolved to perform multiple functions, with                             914
                                                                                       variations in loop regions on a relatively conserved secondary                      915
           Figure 9. An example of a small-world network: most nodes are linked        structure framework. Considering this, Tendulkar et al.114                          916
           only to their immediate neighbors, while few edges generate shortcuts       developed an unconventional scheme of loops and secondary                           917
           between distant regions of the network.                                     structure classification: the clustering of the peptide fragments                    918

                                                                                   J                        dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
Protein Contact Networks: An Emerging Paradigm in Chemistry
Protein Contact Networks: An Emerging Paradigm in Chemistry
Protein Contact Networks: An Emerging Paradigm in Chemistry
Protein Contact Networks: An Emerging Paradigm in Chemistry
Protein Contact Networks: An Emerging Paradigm in Chemistry
Protein Contact Networks: An Emerging Paradigm in Chemistry

Más contenido relacionado

La actualidad más candente

Cr2424582463
Cr2424582463Cr2424582463
Cr2424582463IJMER
 
Genetic Programming based Image Segmentation
Genetic Programming based Image SegmentationGenetic Programming based Image Segmentation
Genetic Programming based Image SegmentationTarundeep Dhot
 
Transfer learning with multiple pre-trained network for fundus classification
Transfer learning with multiple pre-trained network for fundus classificationTransfer learning with multiple pre-trained network for fundus classification
Transfer learning with multiple pre-trained network for fundus classificationTELKOMNIKA JOURNAL
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07dingggthu
 
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Dr. Thiti Vacharasintopchai, ATSI-DX, CISA
 
Agile and CMMI - a potential blend
Agile and CMMI - a potential blendAgile and CMMI - a potential blend
Agile and CMMI - a potential blendMosesraj R
 
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...csandit
 
LaSiO 3 Cl:Ce 3+ ,Tb 3+ and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...
LaSiO 3 Cl:Ce 3+ ,Tb 3+  and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...LaSiO 3 Cl:Ce 3+ ,Tb 3+  and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...
LaSiO 3 Cl:Ce 3+ ,Tb 3+ and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...IJECEIAES
 

La actualidad más candente (13)

Cr2424582463
Cr2424582463Cr2424582463
Cr2424582463
 
Auxetics and folding
Auxetics and foldingAuxetics and folding
Auxetics and folding
 
Genetic Programming based Image Segmentation
Genetic Programming based Image SegmentationGenetic Programming based Image Segmentation
Genetic Programming based Image Segmentation
 
Transfer learning with multiple pre-trained network for fundus classification
Transfer learning with multiple pre-trained network for fundus classificationTransfer learning with multiple pre-trained network for fundus classification
Transfer learning with multiple pre-trained network for fundus classification
 
Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07Object recognition with cortex like mechanisms pami-07
Object recognition with cortex like mechanisms pami-07
 
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...Semantic Web Services for Computational Mechanics : A Literature Survey and R...
Semantic Web Services for Computational Mechanics : A Literature Survey and R...
 
Agile and CMMI - a potential blend
Agile and CMMI - a potential blendAgile and CMMI - a potential blend
Agile and CMMI - a potential blend
 
Vol2no12 5
Vol2no12 5Vol2no12 5
Vol2no12 5
 
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...
ROBUST COLOUR IMAGE WATERMARKING SCHEME BASED ON FEATURE POINTS AND IMAGE NOR...
 
B04 07 0614
B04 07 0614B04 07 0614
B04 07 0614
 
Ar26272276
Ar26272276Ar26272276
Ar26272276
 
No2422242227
No2422242227No2422242227
No2422242227
 
LaSiO 3 Cl:Ce 3+ ,Tb 3+ and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...
LaSiO 3 Cl:Ce 3+ ,Tb 3+  and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...LaSiO 3 Cl:Ce 3+ ,Tb 3+  and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...
LaSiO 3 Cl:Ce 3+ ,Tb 3+ and Mg 2 TiO 4 :Mn 4+ : quantum dot phosphors for im...
 

Destacado

Responding to the challenge of artisanal and small-scale mining - Report-IIED
Responding to the challenge of artisanal and small-scale mining - Report-IIEDResponding to the challenge of artisanal and small-scale mining - Report-IIED
Responding to the challenge of artisanal and small-scale mining - Report-IIEDLe Scienze Web News
 
Evento giornata mondiale malattie rare 28-02-2013
Evento giornata mondiale malattie rare 28-02-2013Evento giornata mondiale malattie rare 28-02-2013
Evento giornata mondiale malattie rare 28-02-2013Le Scienze Web News
 
Programma forum Tumori NET GEP - Regina Elena
Programma forum Tumori NET GEP - Regina ElenaProgramma forum Tumori NET GEP - Regina Elena
Programma forum Tumori NET GEP - Regina ElenaLe Scienze Web News
 
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011Programma Nazionale delle Iniziative Settimana Unesco ESS 2011
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011Le Scienze Web News
 
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...Le Scienze Web News
 
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...Vanessa Ramirez
 
Beneficios y riesgos de utilizar fertilizantes y plaguicidas
Beneficios y riesgos de utilizar fertilizantes y plaguicidas Beneficios y riesgos de utilizar fertilizantes y plaguicidas
Beneficios y riesgos de utilizar fertilizantes y plaguicidas MMDRP
 
Slides: Matematica Boulevard Monde
Slides: Matematica Boulevard MondeSlides: Matematica Boulevard Monde
Slides: Matematica Boulevard MondeRodolfo Gasparian
 

Destacado (8)

Responding to the challenge of artisanal and small-scale mining - Report-IIED
Responding to the challenge of artisanal and small-scale mining - Report-IIEDResponding to the challenge of artisanal and small-scale mining - Report-IIED
Responding to the challenge of artisanal and small-scale mining - Report-IIED
 
Evento giornata mondiale malattie rare 28-02-2013
Evento giornata mondiale malattie rare 28-02-2013Evento giornata mondiale malattie rare 28-02-2013
Evento giornata mondiale malattie rare 28-02-2013
 
Programma forum Tumori NET GEP - Regina Elena
Programma forum Tumori NET GEP - Regina ElenaProgramma forum Tumori NET GEP - Regina Elena
Programma forum Tumori NET GEP - Regina Elena
 
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011Programma Nazionale delle Iniziative Settimana Unesco ESS 2011
Programma Nazionale delle Iniziative Settimana Unesco ESS 2011
 
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...
Progetto Minni - Sistema modellistico per le politiche di qualità dell'aria a...
 
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...
¿Cuales son las propiedades de algunos materiales que utilizaban las culturas...
 
Beneficios y riesgos de utilizar fertilizantes y plaguicidas
Beneficios y riesgos de utilizar fertilizantes y plaguicidas Beneficios y riesgos de utilizar fertilizantes y plaguicidas
Beneficios y riesgos de utilizar fertilizantes y plaguicidas
 
Slides: Matematica Boulevard Monde
Slides: Matematica Boulevard MondeSlides: Matematica Boulevard Monde
Slides: Matematica Boulevard Monde
 

Similar a Protein Contact Networks: An Emerging Paradigm in Chemistry

Towards designing robust coupled networks
Towards designing robust coupled networksTowards designing robust coupled networks
Towards designing robust coupled networksaugustodefranco .
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...thanhdowork
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
Charge Transport in organic semiconductors
Charge Transport in organic semiconductorsCharge Transport in organic semiconductors
Charge Transport in organic semiconductorsTauqueer Khan
 
Principle of flexible docking
Principle of flexible dockingPrinciple of flexible docking
Principle of flexible dockinglab13unisa
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABElisabeth Ortega
 
Structural Bioinformatics.pdf
Structural Bioinformatics.pdfStructural Bioinformatics.pdf
Structural Bioinformatics.pdfRahmatEkoSanjaya1
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-finalmarpierc
 
Quadrupolar structures generated by chiral islands in freely suspended smecti...
Quadrupolar structures generated by chiral islands in freely suspended smecti...Quadrupolar structures generated by chiral islands in freely suspended smecti...
Quadrupolar structures generated by chiral islands in freely suspended smecti...NunoSilvestre
 
Limitations & lessons in the use of x ray structural information in drug design
Limitations & lessons in the use of x ray structural information in drug designLimitations & lessons in the use of x ray structural information in drug design
Limitations & lessons in the use of x ray structural information in drug designDilip Darade
 
Percolation in interacting networks
Percolation in interacting networksPercolation in interacting networks
Percolation in interacting networksaugustodefranco .
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationDai-Hai Nguyen
 
Final report - Adam Zienkiewicz
Final report - Adam ZienkiewiczFinal report - Adam Zienkiewicz
Final report - Adam ZienkiewiczAdam Zienkiewicz
 
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)Kevin Keraudren
 
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...Sérgio Sacani
 
An Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationAn Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationIDES Editor
 
IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology ClassificationIRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology ClassificationIRJET Journal
 
Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Spencer Bliven
 

Similar a Protein Contact Networks: An Emerging Paradigm in Chemistry (20)

Towards designing robust coupled networks
Towards designing robust coupled networksTowards designing robust coupled networks
Towards designing robust coupled networks
 
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
240318_Thuy_Labseminar[Fragment-based Pretraining and Finetuning on Molecular...
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
Charge Transport in organic semiconductors
Charge Transport in organic semiconductorsCharge Transport in organic semiconductors
Charge Transport in organic semiconductors
 
Principle of flexible docking
Principle of flexible dockingPrinciple of flexible docking
Principle of flexible docking
 
pmic7074
pmic7074pmic7074
pmic7074
 
Poster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UABPoster presentat a les jornades doctorals de la UAB
Poster presentat a les jornades doctorals de la UAB
 
Structural Bioinformatics.pdf
Structural Bioinformatics.pdfStructural Bioinformatics.pdf
Structural Bioinformatics.pdf
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
Quadrupolar structures generated by chiral islands in freely suspended smecti...
Quadrupolar structures generated by chiral islands in freely suspended smecti...Quadrupolar structures generated by chiral islands in freely suspended smecti...
Quadrupolar structures generated by chiral islands in freely suspended smecti...
 
Limitations & lessons in the use of x ray structural information in drug design
Limitations & lessons in the use of x ray structural information in drug designLimitations & lessons in the use of x ray structural information in drug design
Limitations & lessons in the use of x ray structural information in drug design
 
Percolation in interacting networks
Percolation in interacting networksPercolation in interacting networks
Percolation in interacting networks
 
Advanced machine learning for metabolite identification
Advanced machine learning for metabolite identificationAdvanced machine learning for metabolite identification
Advanced machine learning for metabolite identification
 
Molecular Electronics
Molecular ElectronicsMolecular Electronics
Molecular Electronics
 
Final report - Adam Zienkiewicz
Final report - Adam ZienkiewiczFinal report - Adam Zienkiewicz
Final report - Adam Zienkiewicz
 
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)
 
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...
A weak lensing_mass_reconstruction_of _the_large_scale_filament_massive_galax...
 
An Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender ClassificationAn Information Maximization approach of ICA for Gender Classification
An Information Maximization approach of ICA for Gender Classification
 
IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology ClassificationIRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
IRJET- Deep Convolution Neural Networks for Galaxy Morphology Classification
 
Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]Following the Evolution of New Protein Folds via Protodomains [Report]
Following the Evolution of New Protein Folds via Protodomains [Report]
 

Más de Le Scienze Web News

Reading catastrophes convegno_2012_programma
Reading catastrophes convegno_2012_programmaReading catastrophes convegno_2012_programma
Reading catastrophes convegno_2012_programmaLe Scienze Web News
 
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida Programma
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida ProgrammaXXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida Programma
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida ProgrammaLe Scienze Web News
 
Programma nazionale delle iniziative UNESCO Settimana ESS 2012
Programma nazionale delle iniziative UNESCO Settimana ESS 2012Programma nazionale delle iniziative UNESCO Settimana ESS 2012
Programma nazionale delle iniziative UNESCO Settimana ESS 2012Le Scienze Web News
 
Monografia Unesco Settimana Unesco ESS 2012
Monografia Unesco Settimana Unesco ESS 2012Monografia Unesco Settimana Unesco ESS 2012
Monografia Unesco Settimana Unesco ESS 2012Le Scienze Web News
 
Incontri educativo-informativi sul tema della fertilità alla Sapienza
Incontri educativo-informativi sul tema della fertilità alla SapienzaIncontri educativo-informativi sul tema della fertilità alla Sapienza
Incontri educativo-informativi sul tema della fertilità alla SapienzaLe Scienze Web News
 
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCILDIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCILLe Scienze Web News
 
Tas-Silg a Malta campagna di scavo 2011
Tas-Silg a Malta campagna di scavo 2011Tas-Silg a Malta campagna di scavo 2011
Tas-Silg a Malta campagna di scavo 2011Le Scienze Web News
 
Electron Diffraction Using Transmission Electron Microscopy
Electron Diffraction Using Transmission Electron MicroscopyElectron Diffraction Using Transmission Electron Microscopy
Electron Diffraction Using Transmission Electron MicroscopyLe Scienze Web News
 
Metallic phase with long range phys rev lett.53.1951
Metallic phase with long range phys rev lett.53.1951Metallic phase with long range phys rev lett.53.1951
Metallic phase with long range phys rev lett.53.1951Le Scienze Web News
 
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"Le Scienze Web News
 
Schema della struttura del pensiero logico simbolico
Schema della struttura del pensiero logico simbolicoSchema della struttura del pensiero logico simbolico
Schema della struttura del pensiero logico simbolicoLe Scienze Web News
 
ISPRA Rapporto rifiuti speciali 2010
ISPRA Rapporto rifiuti speciali 2010ISPRA Rapporto rifiuti speciali 2010
ISPRA Rapporto rifiuti speciali 2010Le Scienze Web News
 
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibile
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibileLe politiche del Ministero dell’Ambiente a favore della mobilità sostenibile
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibileLe Scienze Web News
 
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la città
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la cittàMobilità autonoma dei bambini: una necessità per loro, una risorsa per la città
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la cittàLe Scienze Web News
 
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...Le Scienze Web News
 
L'innovazione nei trasporti e la green logistics
L'innovazione nei trasporti e la green logisticsL'innovazione nei trasporti e la green logistics
L'innovazione nei trasporti e la green logisticsLe Scienze Web News
 

Más de Le Scienze Web News (20)

Reading catastrophes convegno_2012_programma
Reading catastrophes convegno_2012_programmaReading catastrophes convegno_2012_programma
Reading catastrophes convegno_2012_programma
 
WMO GREENHOUSE GAS BULLETIN
WMO GREENHOUSE GAS BULLETINWMO GREENHOUSE GAS BULLETIN
WMO GREENHOUSE GAS BULLETIN
 
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida Programma
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida ProgrammaXXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida Programma
XXXI convegno del Gruppo Nazionale di Geofisica della Terra Solida Programma
 
Programma nazionale delle iniziative UNESCO Settimana ESS 2012
Programma nazionale delle iniziative UNESCO Settimana ESS 2012Programma nazionale delle iniziative UNESCO Settimana ESS 2012
Programma nazionale delle iniziative UNESCO Settimana ESS 2012
 
Monografia Unesco Settimana Unesco ESS 2012
Monografia Unesco Settimana Unesco ESS 2012Monografia Unesco Settimana Unesco ESS 2012
Monografia Unesco Settimana Unesco ESS 2012
 
Indice globale della fame 2012
Indice globale della fame 2012Indice globale della fame 2012
Indice globale della fame 2012
 
Incontri educativo-informativi sul tema della fertilità alla Sapienza
Incontri educativo-informativi sul tema della fertilità alla SapienzaIncontri educativo-informativi sul tema della fertilità alla Sapienza
Incontri educativo-informativi sul tema della fertilità alla Sapienza
 
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCILDIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL
DIRECTIVE 2010/63/EU OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL
 
Tas-Silg a Malta campagna di scavo 2011
Tas-Silg a Malta campagna di scavo 2011Tas-Silg a Malta campagna di scavo 2011
Tas-Silg a Malta campagna di scavo 2011
 
Electron Diffraction Using Transmission Electron Microscopy
Electron Diffraction Using Transmission Electron MicroscopyElectron Diffraction Using Transmission Electron Microscopy
Electron Diffraction Using Transmission Electron Microscopy
 
Metallic phase with long range phys rev lett.53.1951
Metallic phase with long range phys rev lett.53.1951Metallic phase with long range phys rev lett.53.1951
Metallic phase with long range phys rev lett.53.1951
 
Premio eco and the city 2011
Premio eco and the city 2011Premio eco and the city 2011
Premio eco and the city 2011
 
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"
12 novembre 2011 Cerimonia Ufficiale UNESCO "A come Acqua"
 
Schema della struttura del pensiero logico simbolico
Schema della struttura del pensiero logico simbolicoSchema della struttura del pensiero logico simbolico
Schema della struttura del pensiero logico simbolico
 
Solarexpo 2011
Solarexpo 2011Solarexpo 2011
Solarexpo 2011
 
ISPRA Rapporto rifiuti speciali 2010
ISPRA Rapporto rifiuti speciali 2010ISPRA Rapporto rifiuti speciali 2010
ISPRA Rapporto rifiuti speciali 2010
 
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibile
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibileLe politiche del Ministero dell’Ambiente a favore della mobilità sostenibile
Le politiche del Ministero dell’Ambiente a favore della mobilità sostenibile
 
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la città
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la cittàMobilità autonoma dei bambini: una necessità per loro, una risorsa per la città
Mobilità autonoma dei bambini: una necessità per loro, una risorsa per la città
 
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...
L'autonomia di movimento dei bambini: una necessità per loro, una risorsa per...
 
L'innovazione nei trasporti e la green logistics
L'innovazione nei trasporti e la green logisticsL'innovazione nei trasporti e la green logistics
L'innovazione nei trasporti e la green logistics
 

Último

Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPrerana Jadhav
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxDr. Dheeraj Kumar
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Badalona Serveis Assistencials
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Prerana Jadhav
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranTara Rajendran
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxDr. Dheeraj Kumar
 
Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Mohamed Rizk Khodair
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiGoogle
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...saminamagar
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingArunagarwal328757
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsMedicoseAcademics
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt downloadAnkitKumar311566
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAAjennyeacort
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.ANJALI
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfDolisha Warbi
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxDr.Nusrat Tariq
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfDivya Kanojiya
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdfDolisha Warbi
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxdrashraf369
 

Último (20)

Presentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous SystemPresentation on Parasympathetic Nervous System
Presentation on Parasympathetic Nervous System
 
Measurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptxMeasurement of Radiation and Dosimetric Procedure.pptx
Measurement of Radiation and Dosimetric Procedure.pptx
 
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
Presentació "Real-Life VR Integration for Mild Cognitive Impairment Rehabilit...
 
Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.Presentation on General Anesthetics pdf.
Presentation on General Anesthetics pdf.
 
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara RajendranMusic Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
Music Therapy's Impact in Palliative Care| IAPCON2024| Dr. Tara Rajendran
 
Epilepsy
EpilepsyEpilepsy
Epilepsy
 
Radiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptxRadiation Dosimetry Parameters and Isodose Curves.pptx
Radiation Dosimetry Parameters and Isodose Curves.pptx
 
Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)Primary headache and facial pain. (2024)
Primary headache and facial pain. (2024)
 
Introduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali RaiIntroduction to Sports Injuries by- Dr. Anjali Rai
Introduction to Sports Injuries by- Dr. Anjali Rai
 
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...call girls in Connaught Place  DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
call girls in Connaught Place DELHI 🔝 >༒9540349809 🔝 genuine Escort Service ...
 
Pharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, PricingPharmaceutical Marketting: Unit-5, Pricing
Pharmaceutical Marketting: Unit-5, Pricing
 
Hematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes FunctionsHematology and Immunology - Leukocytes Functions
Hematology and Immunology - Leukocytes Functions
 
world health day presentation ppt download
world health day presentation ppt downloadworld health day presentation ppt download
world health day presentation ppt download
 
97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA97111 47426 Call Girls In Delhi MUNIRKAA
97111 47426 Call Girls In Delhi MUNIRKAA
 
Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.Statistical modeling in pharmaceutical research and development.
Statistical modeling in pharmaceutical research and development.
 
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdfPULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
PULMONARY EMBOLISM AND ITS MANAGEMENTS.pdf
 
Glomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptxGlomerular Filtration rate and its determinants.pptx
Glomerular Filtration rate and its determinants.pptx
 
Basic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdfBasic principles involved in the traditional systems of medicine PDF.pdf
Basic principles involved in the traditional systems of medicine PDF.pdf
 
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
PNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdfPNEUMOTHORAX   AND  ITS  MANAGEMENTS.pdf
PNEUMOTHORAX AND ITS MANAGEMENTS.pdf
 
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptxSYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
SYNDESMOTIC INJURY- ANATOMICAL REPAIR.pptx
 

Protein Contact Networks: An Emerging Paradigm in Chemistry

  • 1. gjp00 | ACSJCA | JCA10.0.1465/W Unicode | research.3f (R3.4.i1:3887 | 2.0 alpha 39) 2012/09/13 09:54:00 | PROD-JCA1 | rq_816271 | 11/19/2012 12:15:06 | 16 Review pubs.acs.org/CR 1 Protein Contact Networks: An Emerging Paradigm in Chemistry 2 L. Di Paola,† M. De Ruvo,‡ P. Paci,‡ D. Santoni,§ and A. Giuliani*,∥ † 3 Faculty of Engineering, Università CAMPUS BioMedico, Via A. del Portillo, 21, 00128 Roma, Italy ‡ 4 BioMathLab, §CNR-Institute of Systems Analysis and Computer Science (IASI), viale Manzoni 30, 00185 Roma, Italy ∥ 6 5 Environment and Health Department, Istituto Superiore di Sanità, Viale Regina Elena 299, 00161, Roma, Italy species as nodes and covalent bonds as edges connecting them. 49 Structural formulas constitute an extremely efficient symbolic 50 language carrying a very peculiar idea of what a structure is. 51 While in physics structures are generally considered as 52 consequences of a force field shaping a continuous space, so 53 7 CONTENTS that the emerging structures are simply “energetically allowed” 54 9 1. Introduction A configurations in this mainly continuous space, chemistry 55 10 2. Graph Theory and Protein Contact Networks C assigns to a given structure an autonomous meaning by itself 56 11 2.1. Elements of Graph Theory C and not only as a consequence of an external force field. 57 12 2.2. Protein Contact Networks (PCNs) C The molecular graph (structural formula) relative to a given 58 13 2.3. Shortest Paths, Average Path Length, and organic molecule is a condensate of the knowledge relative to 59 14 Diameter E that molecule: no other “scientific language” has an information 60 15 2.4. Clustering on Graphs E storage and retrieval efficiency comparable to structural 61 16 2.4.1. Spectral Clustering E formulas. As a matter of fact, they can be used as the sole 62 17 2.4.2. Intracluster and Extracluster Parameters E input for the computation of thousands of chemicophysical 63 18 2.5. Network Centralities F descriptors ranging from quantum chemistry to “bulk” 64 19 2.5.1. Path-Based Centralities: Closeness and properties, like melting point or partition coefficients,2 and 65 20 Between-ness G the knowledge of structural formula alone is, in many cases, 66 21 2.6. Network Assortativity and Nodes Property sufficient to predict the interaction of the molecule with 67 22 Distribution G biological systems.3 Descriptors based on bidimensional 68 23 2.7. Models of Graphs H molecular graphs were demonstrated to outperform on many 69 24 2.7.1. Random Graphs H occasions, as in the prediction of receptor binding, 70 25 2.7.2. Scale-Free Graphs I sophisticated three-dimensional models, thus giving another 71 26 3. Applications J proof of the unique role played by pure topology in chemistry.4 72 27 3.1. Networks and Interactions J Thus, chemical scholars could safely (and proudly) consider the 73 28 3.2. Protein Structure Classification J recent surge of interest in graph-theoretical and, in general, 74 29 3.2.1. Modularity in Allosteric Proteins K network-based approaches in both physics and biology as 75 30 3.2.2. Protein Folding L nothing particularly novel for them. 76 31 4. Conclusions L Chemistry has already exploited graph theory methods: on 77 32 Author Information M the molecular scale, the chemical graph theory5,6 has been 78 33 Corresponding Author M harnessing the topological sketch of molecules into nodes 79 34 Notes M (atoms) and links (chemical bonds) to derive mathematical 80 35 Biographies M descriptors of molecular structures, trying to delineate an 81 36 References N ontology of molecules and predict their properties, on the sole 82 basis of the molecular graph wiring. This method has been 83 applied to derive the chemicophysical properties of alkanes, 84 1. INTRODUCTION similarly to other methods that rely on the properties prevision 85 from a group contribution application (UNIFAC7 and 86 37 Topology is at the very heart of chemistry. This stems from the 38 fact that chemical thought, since its prescientific alchemic UNIQUAC8). 87 39 origins, focused on the mutual relations between different Biological chemistry, additionally, poses intriguing issues 88 40 entities expressed in terms of natural numbers instead of regarding the analysis of complex kinetic schemes, made up of 89 41 continuous quantities. This is the case in the concept of valence several chemical reactions with nonlinear kinetic expression for 90 42 (e.g., atomic species A combines with atomic species B in the the corresponding reaction rate (Michaelis−Menten kinetic 91 43 ratio 1:2 or 2:3) as well as of the periodic table, in which the rate for enzymatic reactions). In this framework, the classical 92 44 discrete character of the atoms is implicit in the very same analytical approach to derive the dynamics of reactive systems9 93 45 structure of a two-entry (period and group) matrix.1 Chemical is unsatisfactory, due to high computational and modelistic 94 46 “primitives” are thus very often relational concepts that are 47 naturally translated into the most widespread topological object Received: June 11, 2012 48 of the whole science: the structural formula having atomic © XXXX American Chemical Society A dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 2. Chemical Reviews Review Figure 1. Recoverin 3D structure (left) and correspondent adjacency matrix. 95 burden required for a complete kinetic representation. expression correlation networks) for which no such support is 139 96 Mathematics and chemistry meet on the common ground of possible. 140 97 the chemical reaction network theory (CNRT) that is explicitly The inter-residue contact network has been yet largely 141 98 aimed at analyzing complex biochemical reaction networks in explored in terms of inter-residue contacts frequencies under 142 99 terms of their topological emerging features.10−13 the quasichemical approximation;18−20 as a matter of fact, in 143 100 Nowadays, many different fields of investigation ranging the seminal work of Miyazawa and Jernigan,18 the amino acid 144 101 from systems biology to electrical engineering, sociology, and hydrophobicity is assessed on the basis of the frequency of 145 102 statistical mechanics converge into the shared operational contacts of the corresponding residues as emerging from the 146 103 paradigm of complex network analysis.14 A massive advance- analysis of a large number of structures. 147 104 ment in the elucidation of general behavior of network systems In this way, residues involved more frequently in noncovalent 148 105 made possible the generation of brand new graph theoretical interactions (mainly of hydrophobic nature, for hypothesis) are 149 106 descriptors, at both single node and entire graph level, that addressed to be of similar hydrophobic character. Application 150 107 could be useful in many fields of chemistry. and confirmation of this view emerge from more recent 151 108 More specifically, in this review we will deal with the protein works,19,21 where the thermal stability of proteins belonging to 152 109 3D structures in terms of contact networks between amino acid thermophiles or psychrophiles has been inspected through the 153 110 residues. This case allows for a straightforward formalization in inter-residue interaction potential. The main result is that a 154 111 topological terms: the role of nodes (residues) and edges characteristic distribution of inter-residues is able to provide the 155 112 (contacts) is devoid of any ambiguity and the introduction of protein structure with the required flexibility to adapt to the 156 113 van der Waals radii of amino acids allows us to assign a environment. 157 114 motivated threshold for assigning contacts and building the The two above referenced works19,21 are, in any case, only a 158 115 network.15−17 statistical application over a huge number of proteins; what we 159 116 On the other hand, the Protein Data Bank (PDB) collects really want to know is the character of information about a 160 117 thousands of very reliable X-ray-resolved molecular structures, single and specific molecule that we can derive from its residue 161 118 allowing scientists to perform sufficiently populated statistical contact graph. 162 119 enquiries to highlight relevant shared properties of protein A very immediate example of this single molecule 163 120 structures or to go in-depth into specific themes (e.g., 121 topological signatures of allostery), as well as to identify information is the fact that protein secondary structure can 164 122 residues potentially crucial for activity and stability of proteins. be reproduced with no errors on the sole basis of an adjacency 165 123 From a purely theoretical point of view, the reduction of a matrix.22 Similar considerations hold true for protein folding 166 124 protein structure (that in its full rank corresponds to the three- rate,23−26 while normal-mode analysis confirmed that mean 167 125 dimensional coordinates relative to all the atoms of the square displacement of highly contacted residues is substantially 168 126 molecule) to a binary contact matrix between the α-carbons of limited (nearly 20% of maximal movement range27). From 169 127 the residues represents a dramatic collapse. How many relevant another perspective, the presence of highly invariant patterns of 170 128 properties of protein 3D structures (and consequently of graph descriptors shared by all the proteins, irrespective of their 171 129 possible consequences in terms of protein physiological role) general shape and size, points to still unknown mesoscopic 172 130 are kept alive (and, hopefully, exalted by the filtering out of not invariants (formally an analogue to valence considerations) on 173 131 relevant information) by the consideration of a protein as a the very basis of protein-like behavior, irrespective for both 174 132 contact network? How firmly based is the guess that adjacency fibrous and globular structures.28,29 The scope of this review is, 175 133 matrices having as rows and columns amino acid residues (see by briefly discussing some applications in this rapidly emerging 176 f1 134 Figure 1) could in the future play the same role the structural field, to sketch an at least initial answer to the quest for a new 177 135 formula plays for organic chemistry? Relying on a single “structural formula” language for proteins. This quest will be 178 136 nonambiguous and physically motivated ordering of nodes (the pursued in the following chapters by presenting side-by-side the 179 137 primary structure) dramatically enlarges the realism of contact different complex network invariants developed by graph 180 138 networks with respect to other kinds of networks (e.g., gene theory and their protein counterparts. 181 B dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 3. Chemical Reviews Review 2. GRAPH THEORY AND PROTEIN CONTACT ⎧ Ai , j = w(vi , vj) if (vi , vj) ∈ E ⎪ 182 NETWORKS Aij = ⎨ ⎪ ⎩ Ai , j = 0 otherwise 2.1. Elements of Graph Theory 183 The classic Könisberg bridge problem introduced graph theory The degree of a node in a weighted graph is defined as 220 184 in 18th century. The problem had the following formulation: N 185 does there exist a walk crossing each of the seven bridges of 186 Könisberg exactly once? The solution to this problem appeared k(v) = ∑ w(ui , v) 187 in “Solutio Problematis ad geometriam situs pertinentis” in i=1 188 1736 by Euler.30 This was the first time a problem was codified where ui ∈ N(v). 221 189 in terms of nodes and edges linking nodes. This structure was 190 called a graph. 2.2. Protein Contact Networks (PCNs) 191 A graph G is a mathematical object used to model complex A protein structure is a complex three-dimensional object, 222 192 structures and it is made of a finite set of vertices (or nodes) V formally defined by the coordinates in 3D space of its 223 193 and a collection of edges E connecting two vertices. atoms.31,32 Since the first works on the subject in the early 224 194 A graph G = (V, E) can be represented as a plane figure by 1960s,33 a large number of protein molecular structures has 225 195 drawing a line between two nodes u and v and an edge e = (u, been resolved, now accessible on devoted web databases.34 The 226 f2 196 v) ∈ E (Figure 2). large availability of protein molecular structures has not solved 227 yet many of the issues regarding the strict relationship between 228 structure and function in the protein universe. 229 Thus, an emerging need in protein science is to define simple 230 descriptors, able to describe each protein structure with few 231 numerical variables, hopefully representative of the functionally 232 Figure 2. Example of an undirect graph comprising two nodes and an relevant properties of the analyzed structure. 233 edge. Protein structure and function rely on the complex network 234 of inter-residue interactions that intervene in forming and 235 keeping the molecular structure and in the protein biological 236 activity. 237 197 A graph G = (V, E) can be represented by its adjacency Thus, the residues interactions are a good starting point to 238 198 matrix A; given an order of V = {v1, v2, ... vn}, we define the define the protein interaction network;20,27,35 in this frame- 239 199 generic element of the matrix Ai,j as follows: work, the molecular structure needs to be translated into a 240 simpler picture, cutting out the redundant information 241 ⎧ Ai , j = 1 if (vi , vj) ∈ E ⎪ embedded in the complete spatial position of all atoms. 242 Aij = ⎨ The most immediate choice is collapsing it into its α-carbon 243 ⎪ ⎩ Ai , j = 0 otherwise location (thereinafter indicated as Cα): correspondingly, the 244 position of the entire amino acid in the sequence is collapsed 245 200 The adjacency matrix of a graph is unique with respect to the into the corresponding Cα. 246 201 chosen ordering of nodes. In the case of proteins, where the The spatial position of Cα is still reminiscent of the protein 247 202 ordering of nodes (residues) corresponds to the residue backbone; thus residues that are immediately close in sequence 248 203 sequence (primary structure), we can state that its correspond- are separated by a length of 3−4 Å, corresponding to the 249 204 ing network is unique. This is one extremely strong peptide bond length36 (see Figure 3); other α-carbons have a 250 f3 205 consequence that establishes a 1 to 1 correspondence between position that recalls the secondary domains and still reproduce, 251 206 the molecule and its corresponding graph. even in a very bare representation, the key features of the three- 252 207 Let v ∈ V be a vertex of a graph G; the neighborhood of v is dimensional structure. 253 208 the set N(v) = {u ∈ G | e(u,v) ∈ E}. Two vertices u and v are As soon as the complex protein structure architecture has 254 209 adjacent or neighbors, when e = (u,v) ∈ E (u ∈ N(v) or v ∈ been reduced to a simpler picture in terms of Cα position, the 255 210 N(u)). The degree ki of the ith node is the number of its spatial topology can be further reduced to a contact topology 256 211 neighbors, defined on the basis of the adjacency matrix as that represents the network of inter-residue interactions, 257 primarily responsible for the protein’s three-dimensional 258 N structure and activity. Thus, the interaction topology is derived 259 ki = ∑ Aij by the spatial distribution of residues in the crystal three- 260 j=1 dimensional structure and represents the overall intramolecular 261 potential. 262 212 When ki = 0, the ith node is said to be isolated in G, whereas if Specifically, starting from the Cα spatial distribution, the 263 213 ki = 1, it is said to be a leaf of the graph. distance matrix d = {dij} is computed, the generic element dij 264 214 Information may be attached to edges, in this case we call the being the Euclidean distance in the 3D space between the ith 265 215 graph weighted and we refer to the weights as “costs”. A and jth residues (holding the sequence order). The interaction 266 216 weighted graph is defined as G = (V, E, W), where W is a topology is then computed on the basis of d: if the distance dij 267 217 function assigning to each edge of the graph a weight: falls into a given spatial interval (said cutoff), a link exists 268 W: E →  between the ith and the jth residues. The definition of the type 269 of the graph (unweighted or weighted) is made in order to 270 218 The adjacency matrix A of a weighted graph is defined as describe a given kind of interaction, in a more or less detailed 271 219 follows: fashion. 272 C dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 4. Chemical Reviews Review represented by all of its atoms. Then, the distance matrix is 291 computed over all protein atoms, which are labeled according 292 to the residue they belong to. The strength of the interaction 293 between two residues is measured as the number of their atoms 294 whose distance lies within .15,39,46−49 295 Eventually, a straightforward way to establish a weighted 296 protein contact network is to take the inverse of the distance 297 among two residues as a direct measurement of their mutual 298 interaction: the closer they lie, the stronger their mutual 299 interaction.50,51 300 Another kind of representation is based on the same criterion 301 but adopts as nodes the 20 different amino acids, which are 302 combined through the peptide bond backbone in the protein 303 primary structure. The link between two residues is represented 304 by the number of links the residues of those types establish in 305 the three-dimensional structure, according to the distance 306 matrix d and the cutoff interval , as a rule. This method can be 307 applied on an ensemble of protein structures,43,52 in order to 308 find a common rule of protein structure construction, in terms 309 of more probable contacts between residues. 310 This representation, while keeping track of the nature of the 311 interacting residues, destroys the one-to-one correspondence 312 Figure 3. Geometry of the peptide bond: the upper threshold of 8 Å, with the original 3D structure, given that different structures 313 commonly introduce in the of PCNs, roughly corresponds to two peptide bond lengths.36 can give rise to the same representation in a way analogously to 314 the structure isomerism in organic chemistry. Figure 4 reports 315 f4 273 The choice of determines the kind of interactions included the two kind of formulas. 316 274 in the analysis.17,37 Most authors15,16,38,39 consider only an The first emerging property of the PCNs is the degree of the 317 275 upper threshold (around 8 Å) to cut off negligible interactions; corresponding graph, i.e., the average number of links each 318 276 some others, conversely, introduce also a lower limit, around 4 node (residue) establishes with neighbors. It is a direct measure 319 277 Å, that corresponds to the average value of the peptide bond of connectivity attitude of residues within the interaction 320 278 length, so to eliminate the “noise” due to the “obliged” contacts network and it is strictly linked to the attitude of residue to 321 279 coming from sequence proximity. In this way, only significant establish noncovalent interactions with other resi- 322 280 noncovalent interactions are included in the analysis, with the dues.28,38,40,46,49,54−56 The average degree, on the other hand, 323 281 purpose of including only those interactions that may be is a measure of the overall protein connectivity that is a rough 324 282 modified upon slight environment changes, such in the case of index of the protein stability. 325 283 biological response to environment stimuli. The contact density of a protein decreases exponentially with 326 284 Many authors use unweighted graphs to represent the number of residues; thus, bigger proteins are much less 327 285 PCNs,16,35,40−46 in order to infer several properties while compact than smaller ones, giving rise to bigger cavities and a 328 286 keeping minimal information. On the other hand, some other more fuzzy distinction between internal and external 329 287 groups propose a description for the PCNs using weighted milieu.57,58 330 288 graphs that is based on a side chain level reduction of the whole The degree distribution defines the graph model, allowing us 331 289 protein structure. In this case, all the information regarding the to classify the network into already established network classes 332 290 spatial position of atoms is kept and the single residue is endowed with specific features (e.g., random graphs, scale-free 333 Figure 4. Graph protein formulas: (a) contact map53 and (b) wheel diagram.27 D dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 5. Chemical Reviews Review 334 networks, regular lattices) as we will show in the next 3 × (the number of triangles on a graph) 335 paragraphs. C= the number of connected triples of vertices (2.2) 391 2.3. Shortest Paths, Average Path Length, and Diameter 336 In a graph G, the distance spv,u between any two vertices v,u ∈ where a “triangle” corresponds to three vertices that are each 392 337 V is given by the length of the shortest path between the connected to each other and a “connected triple” means a 393 338 vertices, that is, the minimal number of edges that need to be vertex that is connected to an (unordered) pair of other 394 339 crossed to travel from vertex v to vertex u. The shortest path vertices. The factor of 3 in the numerator accounts for the fact 395 340 between two vertices is not necessarily unique, since different that each triangle contributes to three connected triples of 396 341 paths may exist with identical length. In a graph, if no path vertices, one for each of its three vertices; thus, the value of C 397 342 exists connecting two nodes v, u ∈ V, we say that those nodes lies strictly in the range from zero to one. 398 343 belong to different connected components; in such a case, we With regard to PCNs, the clustering coefficient referred to 399 344 call the graph disconnected. the ith residue measures the triangles number insisting on 400 345 All PCNs are connected graphs at first glance. The so-called it;15,28,29,38,40,45−47,49,56,75 thus, high clustering coefficient nodes 401 346 “percolation threshold” of a PCN can be estimated as the are central in communities with a large number of 402 347 number of edges to be destroyed in order for the PCN to lose interconnecting links, corresponding to high local stability. In 403 348 its connectivity. other words, we can infer that mutation producing depletion of 404 349 This concept becomes relevant when we focus on long-range such nodes may cause dramatic changes in the protein 405 350 contacts (i.e., contacts between residues far away on the structure.29 406 351 sequence,23,59 which were demonstrated to be of crucial 2.4.1. Spectral Clustering. The spectral analysis of a graph 407 352 importance in protein folding rates,23,25 as we will see in allows one to identify clusters in the network by minimizing the 408 353 more detail below. value of parameter Z defined as45 409 354 The diameter diam(G) = max{spv,u|v, u ∈ VG} of a graph is n n defined as the maximal distance of any pair of vertices. The 1 355 average or characteristic length l(G) = ⟨spv,u⟩ is defined as the Z= ∑ ∑ (xi − xj)2 Aij 356 2 i=1 j=1 357 average distance between all pairs of vertices; the average 358 inverse path length (efficiency) is defined as eff(G) = ⟨1/spv,u⟩; where xi and xj represent the position of nodes i and j in the 410 359 this descriptor is particularly suitable when components are network and Aij is the adjacency matrix. The minimum of Z 411 360 disconnected (in this case, the contribution of infinite distances corresponds to the second smallest eigenvalue of the Laplacian 412 361 corresponds to zero efficiency). matrix L of {Aij}, also known as the Kirchoff matrix, defined as 413 362 The shortest path spv,u between two residues of a PCNs 363 represents a molecular shortcut that connect the residues L=D−A 364 through a mutual interaction pathway. In this sense, the smaller 365 the spu,v, the tighter the relationship between the two nodes, where D is the degree matrix, which is a diagonal matrix in 414 366 which are strictly correlated, regardless of their distance in a which {Dii} = ki. Once L eigenvalues λ are computed, the 415 367 sequence. These tight relations are thought to be responsible second smallest eigenvalue λ2 corresponds to the minimum 416 368 for the allosteric response in protein ligand binding42,60−65 and value of Z (the first one provides a trivial solution45). The 417 369 in the concerted motions of distinct protein regions in protein components of the corresponding eigenvector v2, known as the 418 370 dynamics.64,66−71 Fiedler eigenvector, refer to single nodes and define two 419 371 In general, still preliminary evidence from our group (work clusters depending on the sign of each component. Nodes are 420 372 in progress) points to the average shortest path as the most parted into two clusters according to the sign of the 421 373 crucial network invariant to link topology to both molecules’ corresponding component in v2. This process can be iterated 422 374 dynamics and the general thermodynamical properties of the on both subnetworks until all the components of v2 show the 423 375 protein molecules. same sign. 424 2.4. Clustering on Graphs Identification of clusters in PCNs has a strong impact on 425 detecting structural and functional domains in pro- 426 376 Identifying clusters on a network is a more complicated task teins.50,54,76,77 The presence of folding clusters is a key point 427 377 than computing the average shortest path. The clustering in the molecular development of the funnel folding pathways 428 378 coefficient measures the cliquishness of a typical neighborhood theory, which provides the most reasonable molecular 429 379 (a local property). One possible definition is the follow- mechanism for protein folding, out of the random approaches 430 380 ing:29,30,40,72 let us define the clustering coefficient of the ith of residues, in order to form the favorable inter-residue 431 381 node Ci as interaction network, providing stability to the tertiary 432 the number of connected neighbor pairs structure.78,79 433 Ci = 1 The reliable identification of clusters in the PCNs allows for 434 k (k 2 i i − 1) (2.1) the definition of descriptors at the single residue level, relying 435 382 on the PCNs partition structure. 436 383 where ki is the degree of the ith vertex; the average clustering 2.4.2. Intracluster and Extracluster Parameters. Once 437 384 coefficient C of the graph is the average of Ci values over all the clustering process is performed, two parameters, zi and Pi, 438 385 nodes. representing the modularity rate for each node,80 can be 439 386 For social networks, Ci and C have intuitive meanings: Ci computed. These two parameters are defined as 440 387 reflects the extent to which friends of i are also friends of each 388 other; thus, C measures the cliquishness of a typical friendship kis − kis ̅ 389 circle. zi = 390 Another definition for C is73,74 σis (2.3) 441 E dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 6. Chemical Reviews Review nM⎛ kis ⎞2 Pi = 1 − ∑ ⎜ ⎟ s=1 ⎝ i ⎠ k (2.4) 442 443 where kis is the number of links the ith node establishes with 444 nodes belonging to its own cluster si; k̅si is the average degree 445 for nodes in cluster si, σsi is the corresponding standard 446 deviation, and nM is the number of clusters to which the ith 447 node belongs to. The spectral clustering performs a “crispy” 448 partition, namely, clusters are disjoint sets of nodes, thus eq 2.4 449 becomes ⎛ kis ⎞2 Pi = 1 − ⎜ ⎟ 450 ⎝ ki ⎠ (2.5) 451 These parameters have been introduced to discriminate 452 nodes according to their topological role in the so-called 453 Guimerà and Amaral’s cartography,80 the aim of which is the 454 classification of nodes in a modular network, relying on intra- 455 and intermodule connectivities.81 456 In their seminal work,80 Guimerà and Amaral demonstrated 457 that the relative importance of each node in maintaining the 458 global graph connectivity can be traced back to its location in 459 the P, z plane. 460 Once the network is partitioned into a set of meaningful 461 communities, it is possible to compute statistics for how 462 connected each hub (a hub is a node having an extremely high 463 degree of connectivity) is both within its own community and 464 to other communities: hubs endowed with strong connections 465 within functional modules were assumed to be interacting with 466 their partners at once (party hubs); conversely, those with a 467 low correlation were assumed to link together multiple modules 468 (date hubs), playing a global role in the network. It is worth 469 stressing that although both hub types have similar essentiality 470 in the network, as the characteristic path length increases, Figure 5. P vs z plot (dentist’s chair) (a) for a single protein, where 471 deleting given hubs, the network begins to disintegrate, since each point identifies a residue, and (b) the superposition of structures 472 hubs provide the coordination between functional modules. To analysis of 1420 proteins58. 473 make a comparison, party hubs should correspond to Guimerà 474 “provincial hubs”, which have many links within their module The strong invariance of the P, z portraits of PCNs 499 475 but few outside, whereas date hubs could be “nonhub irrespective of both protein general shape and size is extremely 500 476 connectors” or “connector hubs”, both of which have links to intriguing, given it suggests the existence of still hidden 501 477 several different modules; they could also fall into the “kinless” mesoscopic principles of protein structures analogous to 502 478 roles, since very few nodes are actually found in these valence rules in general chemistry. 503 479 categories.82 Considering network motifs, it was observed 480 that party hub network motifs control a local topological 2.5. Network Centralities 481 structure and stay together inside protein complexes, at a lower The centrality of a node deals with its topological features in 504 482 level of the network. On the other hand, date hub network network wiring. The term “central” stems from the origin of 505 483 motifs control the global topological structure and act as the this concept in the definition of key, central indeed, nodes of 506 484 connectors among signal pathways, at a high level of the social networks: people, in other words, that are responsible for 507 485 network. Network motifs should not be merely considered as a the stability and activity of the network. This “social science” 508 486 connection pattern derived from topological structures but also origin of the concept of centrality was found to have a 509 487 as functional elements organizing the modules for biological correlation in PCNs by Csermely.83 510 488 processes.67 Centrality can be computed in different ways, using different 511 489 Spectral clustering of PCNs produces characteristic P−z weights to evaluate and compare the importance of a node 512 490 diagrams, referred to as “dentist’s chair”, due to their (degree, clustering coefficient, for instance). They are almost 513 491 shape.28,29,58 This shape is strongly invariant with respect to equivalent definitions that point to the same attitude of central 514 f5 492 the protein molecule, as shown in Figure 5; panel a refers to a nodes, to establish strong local interactions, in their own local 515 493 typical diagram derived from the analysis of a single protein community, able to stabilize the whole network structure. 516 494 structure, while panel b shows the superposition of a structure The role of central nodes in modifying the network structure 517 495 analyses of 1420 proteins. The fact the general shape of the according to their centrality values is the starting point to define 518 496 graph remains substantially invariant on going from one to the property of centrality−lethality,81,84 which emerges as a key 519 497 1420 proteins is an impressive proof of the robustness of the P, element in the analysis of biological networks, where central 520 498 z organization of PCNs. nodes represent a prerequisite for the organism survival: for 521 F dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 7. Chemical Reviews Review 522 instance, a shortage or a depletion of a central protein in a σv , u(s) 523 protein−protein interaction network does lead to the death of betw(s) = ∑ ∑ σv , u 524 the organism.85 v∈V ,v≠s u∈V ,u≠s 525 Central nodes in PCNs correspond to residues crucial for In biological networks (e.g., protein−protein interaction 580 526 both the protein structure folding and stability. Thus, the network), the nodes with higher between-ness were demon- 581 527 centrality of a node can be a measure of the biological strated to be the main regulators.84,88,89 582 528 consequences of its mutation; for instance, the highly In PCNs, since the between-ness centrality is based on 583 529 detrimental mutation of hemoglobin that causes sickle cell shortest paths, it comes immediately clear that this index is 584 530 anemia is due to just a substitution of one residue (glutamic strongly linked to the centrality of nodes (residues) in terms of 585 531 acid is replaced in position 6 by valine) that produces dramatic their capability to transfer signals throughout the protein 586 532 changes in the protein structure and function. This effect is molecule.43,56,72,86 Thus, the depletion of residues having high 587 533 widely reflected in the high centrality value of this specific between-ness centrality values is supposed to interrupt the 588 534 residue.29 allosteric communication among regions of the proteins that lie 589 535 The easiest and the most natural way to define centrality is far apart. 590 536 provided by the so-called degree centrality that simply counts 2.6. Network Assortativity and Nodes Property 537 the number of connections for each node, its degree, i.e., the Distribution 591 538 number of nodes it is directly connected with; in this case, the 90 539 degree centrality of a node vi corresponds to its degree. Hubs Newman suggested that an important driving factor in the 592 540 are, thus, the central nodes of a network, according to this formation of communities was the preference of nodes to 593 541 paradigm. connect to other nodes that possess similar characteristics; he 594 542 2.5.1. Path-Based Centralities: Closeness and Be- defined this behavior as assortativity. The concept of 595 543 tween-ness. Closeness centrality, as well as between-ness, assortativity is a very general one, so in the case of protein 596 544 belong to the class of shortest path-based centrality measures. structures, we could identify the “behavior” of different residues 597 545 The closeness centrality provides information about how close in terms of their hydrophobic/hydrophilic character so that an 598 546 a node is to all other nodes. The closeness of a node vi is assortative structure will correspond to a network in which 599 547 defined as similar hydrophobicity residues will be preferentially in contact 600 with each other compared to what is expected by pure chance. 601 1 In this example, the “behavior” of the nodes corresponds to a 602 c(vi) = n feature (hydrophobicity) independent of the pure network 603 ∑ j = 1 spi , j wiring and can be equated to a “coloring” of the nodes, whose 604 relations with the underlying topological support constituted by 605 548 The closeness centrality is connected to the aptitude of a network wiring is investigated. Along similar lines, we could 606 549 node to participate in the signal transmission throughout the think of assortative social networks in which friends (nodes in 607 550 protein structure. High closeness centrality nodes were direct contact) tends to share the same political ideas, income 608 551 demonstrated to correspond to residues located in the active classes, or professional activities. On a different heading, we can 609 552 site of ligand-binding proteins or to evolutionary conserved think of assortativity as an “internal” description of network 610 553 residues.41,52,72,86,87 wiring in which nodes are defined in terms of their connection 611 554 It is worth noting that closeness centrality, at odds with patterns. Actually, in some networks high-degree nodes 612 555 degree centrality, it is not solely based on local features of the preferentially connect to other high-degree nodes (assortative 613 556 network but takes into account the location of the node in the networks), whereas in other types of networks high-degree 614 557 global context of the network it is embedded into. In this nodes connect to low-degree nodes (disassortative networks); 615 558 respect, closeness, as well as between-ness, are genuine systemic in particular, numerical evidence from experimental data have 616 559 properties that are computed at the single node level, thus shown that many biological networks exhibit a negative 617 560 establishing a “top-down” causative process. This is probably assortativity coefficient and are therefore claimed to be 618 561 the reason for the efficiency of this kind of network invariants examples of disassortative mixing.40,91 619 562 to single out relevant general properties of the proteins. Assortativity r is defined as the Pearson correlation 620 563 This is formally analogous to what happens in basic coefficient of degrees at either ends of an edge, and it varies 621 564 chemistry, where the properties (i.e., acidity, electronegativity) as −1 ≤ r ≤ 1;92 r is a very simple measure of the probability of 622 565 of the hydrogen atom in the CH4 molecule are different from a high-degree node to form edges with other high-degree 623 566 those of the hydrogen atoms in H2O or H2 molecules, because nodes. When the r value is close to 1, the network is addressed 624 567 of the general molecular context they are embedded into. to as assortative, whereas values of r close to −1 are 625 568 This is the same philosophy of single node (residue) characteristic of disassortative networks. Random graphs are 626 569 descriptors, implicitly taking into account the whole context purely nonassortative networks, since by definition, links 627 570 and so overcoming a purely reductionist view. between nodes, in this case, 628 571 Between-ness measures the ability of a vertex to monitor N 572 communication between other vertices; every vertex that is part 573 of a shortest path between two other vertices can monitor and k(v) = ∑ w(ui , v) i=1 574 influence communication between them. In this view, a vertex 575 is central if lots of shortest paths connecting any two other are placed at random. 629 576 nodes cross it. Let σv,u denote the number of shortest paths In the case of external “coloring” assortativity, the index r, 630 577 between two vertices v, u ∈ V and let σv,u(s), where s ∈ V, be instead of being computed on the nodes degree, can be 631 578 the number of shortest paths between v and u crossing s; computed over the feature of interest, the one used to “color” 632 579 trivially σv,u ≥ σv,u(s). the nodes. 633 G dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 8. Chemical Reviews Review 634 Thus, in a recent work,57 Di Paola et al. demonstrated the One is called Gn,m and is the set of all graphs consisting of n 687 635 lack of any clearly defined “hydrophobic core” in proteins, for vertices and m edges, and it is built by throwing down m edges 688 636 which the arrangement of fractal structures was demonstrated between vertex pairs chosen at random from n initially 689 637 not to have a clear-cut separation between internal and external unconnected vertices. 690 638 milieu by means of network assortativity measures based on the The other is called Gn,p and it is the set of all graphs 691 639 hydrophobicity of nodes. Moreover, the presence of both consisting of n vertices, where each pair is connected together 692 640 assortative and disassortative structuring (hydrophobic−hydro- with independent probability p. In order to generate a graph 693 641 phobic and hydrophilic−hydrophobic) in proteins highlighted sampled uniformly at random from the set Gn,p, initially 694 642 the presence of different “folding logic” contemporarily present unconnected vertices are taken and each pair of them is joined 695 643 in the protein world, probably as a consequence of the varying with an edge with probability p (1 − p being the probability of 696 644 relevance of hydrophobic and electronic forces in the folding being unconnected). Thus, the presence or absence of an edge 697 645 process. between two vertices is independent of the presence or absence 698 646 Generally speaking, the distribution of a given feature of the of any other edge, so that each edge may be considered to be 699 647 nodes can be explored through the combined definition of present with independent probability p. The two models are 700 648 diadicity and heterophilicity,93 measuring the tendency of essentially equivalent in the limit of a large number of nodes n. 701 649 nodes with similar properties to form links. Given a key Since Gn,p is somewhat simpler to work with than Gn,m, it is 702 650 physical property, if nodes show an attitude to establish usual to refer to it as a random graph Gn,p. 703 651 preferentially links with similar nodes, the network is named as A vertex in a random graph is connected with equal 704 652 dyadic, otherwise it is said to be antidyadic or heterophilic.93 probability p to each of the N − 1 other vertices in the graph, 705 653 Let n1 and n0 respectively denote the number of node and hence, the probability pk that it has degree k is given by the 706 654 possessing or not a specific property; e10 and e11 are the number binomial distribution 707 655 of edges connecting homologous and heterelogous nodes, ⎛N ⎞ 656 respectively. The heterophilicity score H is then defined as pk = ⎜ ⎟pk (1 − p)N − k e ⎝k⎠ (2.10) 708 H = 10 e10,r (2.6) Noting that the average degree of a vertex in the network is z = 709 657 (N − 1)p, we can also write this as 710 658 where e10,r is the random value in case of uniform distribution 659 of the property among nodes that depends on the number of (N − 1)! zk ⎛ z ⎞ N−k pk = ⎜1 − ⎟ 660 possible edges E = N(N − 1)/2, N = n1 + n0 beingthe number k! (N − 1 − k)! (N − 1)k ⎝ N − 1⎠ 661 of nodes: z k e −z e10,r = En1(N − n1) ≃ 662 (2.7) k! (2.11) 711 663 Analogously, as for the homologous contacts, it is defined the where the second equality gets exact as N → ∞; in this case, pk 712 664 dyadicity D as corresponds to the bell-shaped curve that peaks on the average 713 e value (Figure 6b). 714 f6 D = 11 665 e11,r (2.8) 666 and the corresponding value for random homologous nodes is n1(n1 − 1) e11,r = E 667 2 (2.9) 668 Thus, dyadic networks have D values larger than 1 and, on 669 the other hand, H values lower than unity. 670 The above-described network invariants provide a descrip- 671 tion that can be traced back to the single node of a network, but Figure 6. Random graph: (a) a sample picture, where most nodes have 672 the effective values of the descriptors strongly depend on the three or four links, and (b) the bell-shaped degree distribution. 673 general wiring architecture of the whole graph, again a systemic 674 top-down causation metric. The dyadic character of PCNs was 675 exploited by Alves and colleagues94 to define simple hydro- Random graphs have been employed extensively as models 715 676 phobicity scores to profile protein structure. Single residue of real-world networks of various types, particularly in 716 677 hydrophobicity was demonstrated to be strongly correlated epidemiology,74 where the spreading of a disease through a 717 678 with the corresponding network invariants:56 these systemic community strongly depends on the pattern of contacts 718 679 properties strictly depend upon the “general class” the specific between infected subjects and those susceptible to it. 719 680 graph pertains to. Below we will briefly present the main classes However, as a model of a real-world network, a random 720 681 of wiring architectures. graph has some serious shortcomings. Perhaps the most serious 721 one is its degree distribution, which is quite unlike those seen in 722 2.7. Models of Graphs most real-world networks.92 On the other hand, the random 723 682 2.7.1. Random Graphs. One of the simplest and oldest graph has many desirable properties; specifically, many of its 724 683 network models is the random graph model,95 which was properties can be calculated exactly.92 725 684 introduced by Solomonoff and Rapoport96 and studied The random graph model has been applied to PCNs to test 726 685 extensively by Erdös and Rènyi;97−99 according to their their connectivity (degree) distribution.48,75 Specifically, the 727 686 works, there are two different random graph models. protein dynamic properties have been explored in terms of 728 H dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 9. Chemical Reviews Review 729 random graphs, since the unbiased corresponding network all nodes, and for γ = 2, a hub network emerges, with the largest 772 730 dynamics can be put into the perspective of the random hubs being in contact with a large fraction of all nodes.107 In 773 731 evolution of the protein structure, due to random, Brownian general, the unusual properties of scale-free networks are valid 774 732 motion of protein segments to get up to the final, stable only for γ < 3, such as a high degree of robustness against 775 733 conformation.100 accidental node failures.85 For γ > 3, however, most unusual 776 734 Further, the generic random graph model is introduced as a features are absent, and in many respects, the scale-free network 777 735 reference to test the property of the network as a specialized behaves like a random one.85 As for the World Wide Web, 778 736 random graph (small-world net- Barabàsi105 found that the value of γ for incoming links was 779 737 work).15,35,38,40,41,43,47,49,72,101−103 This comparison has a tight approximately 2; this means that any node has roughly a 780 738 link with the common assumption of the random coil structure probability 4 times bigger to have half the number of incoming 781 739 as being a reference state in folding thermodynamics: the links than another node. 782 740 random coil has the corresponding translation in terms of a Different from a Poisson degree distribution of random 783 741 “graph formula” into the random graph model that represents a networks, a power law distribution does not have a peak, but it 784 742 random network of residue interactions, corresponding to a is described by a continuously decreasing function (Figure 8): 785 f8 743 random distribution of the inter-residue distance. 744 In their work,104 Bartoli and colleagues demonstrated PCNs 745 are very far from random graph behavior, this was particularly 746 evident when they projected simulated networks together with 747 real PCNs in the bidimensional space, spanned by the 748 clustering coefficient and characteristic path length (see Figure f7 749 7). Figure 8. Scale-free networks: (a) a sample scale-free networks, in which few nodes have many links, and (b) the degree distribution of the scale-free graph power law.106 in this case, it is evident that a specific characteristic average 786 degree does not exist; in other words, these networks do not 787 converge toward a characteristic degree, at increasing number 788 Figure 7. Characteristic path length vs Clustering Coefficient (Figure of nodes. On the contrary, in scale-free networks, the average 789 3 in ref 104): sample protein classes are labeled as CA#, the label degree progressively increases with sampling dimension, 790 “random” refers to collection of random graphs, whereas “regular” points to periodic lattices. because the (very rare) high-degree nodes are sampled with a 791 higher probability. The lack of a characteristic degree is on the 792 basis of the denomination “scale free” for this kind of 793 750 The authors demonstrated the difference between random architecture. 794 751 graph and contact maps derive from the existence of the This is in strong contrast to random networks, for which the 795 752 covalent backbone, that imposes very strict constraints to the degree of all nodes is in the vicinity of the average degree, 796 753 contact that can be established between residues. This feature which could be considered typical. However, as Barabàsi and 797 754 makes PCNs to more similar to the so-called scale-free graphs. colleagues wrote in,107 scale-free networks could easily be called 798 755 2.7.2. Scale-Free Graphs. Since many years from the scale-rich as well, as their main feature is the coexistence of 799 756 seminal work of Erdös and Rènyi,97 all complex networks are nodes of widely different degrees (scales), ranging from nodes 800 757 treated commonly as random graphs. This paradigm was with one or two links to major hubs. 801 758 outdated by the pioneristic work of Barabàsi,105 in which the In contrast to the democratic distribution of links typical of 802 759 topology of the World Wide Web was studied, formerly random networks, power laws describe systems in which few 803 760 thought to show a bell-shaped degree distribution, as in the case hubs dominate:105 networks that are characterized by a power- 804 761 of random graphs. law degree distribution are highly nonuniform, most of the 805 762 Instead, by counting how many Web pages have exactly k nodes having only a few links. Only few nodes with a very large 806 763 links the authors showed that the distribution followed a so- number of links, which are often called hubs, hold these nodes 807 764 called power law, namely, the probability that any node is together. 808 765 connected to k other nodes is A key feature of many complex systems is their robustness, 809 pk = αk −γ which refers to the system’s ability to respond to changes in the 810 external conditions or internal organization while maintaining 811 766 where γ is the degree exponent and α is the proportionality relatively normal behavior.107 In a random network, disabling a 812 767 constant. The value of γ determines many properties of the substantial number of nodes will result in an inevitable 813 768 system. The smaller the value of γ, the more important the role functional disintegration of a network, breaking the network 814 769 of the hubs is in the network. Whereas for γ > 3 the hubs are into isolated node clusters.107 815 770 not relevant, for 2 < γ < 3 there is a hierarchy of hubs, with the Scale-free networks do not have a critical threshold for 816 771 most connected hubs being in contact with a small fraction of disintegration (percolation threshold108): they are amazingly 817 I dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX
  • 10. Chemical Reviews Review 818 robust against accidental failures: even if 80% of randomly follows modular topological organization; this assumption has 861 819 selected nodes fail, the remaining 20% still form a compact been applied to biological neural networks, showing that the 862 820 cluster with a path connecting any two nodes.107 This is dynamic behavior of neural networks might be coordinated 863 821 because random failure is likely to affect mainly the several through different topological features,111 such as network 864 822 small degree nodes, whose removal does not disrupt the modularity and the presence of central hub nodes. A similar 865 823 networks integrity.85 This reliance on hubs, on the other hand, topology/dynamics relation seems to hold for contact net- 866 824 induces a so-called attack vulnerability: the removal of a few key works, too. As a matter of fact, allosteric “hot spots”,65 where 867 825 hubs splinters the system into small isolated node clusters.85 the motion is generalized from a local excitation to the entire 868 826 Scale-free architecture can exhibit the so-called “small word protein structure, correspond to central residue contacts, which 869 827 property”.38,104 The small word model has its roots in the were demonstrated to be crucial for efficient allosteric 870 828 observation that many real-world networks show the following communications.41,59,66,72,86 871 829 two properties: (i) the small-world effect (i.e., small average The field of relations between molecular dynamics 872 830 shortest path length) and (ii) high clustering or transitivity, trajectories and topological contact network description is a 873 831 meaning that there is a heightened probability that two vertices very important avenue of research in protein science.62−64,66,70 874 832 will be connected directly to one another if they have another 833 neighboring vertex in common. 3. APPLICATIONS 834 The former property is quantified by the characteristic path 3.1. Networks and Interactions 835 length (or average shortest path) l of the graph, while the 836 second property is computed as the clustering coefficient C. It is well-known that proteins interact among themselves and 875 837 Thus, small-world effect means that the average shortest path in with other molecules to perform their biological functions;69 876 838 the network scales logarithmically with graph size73,109,110 crucial factors in all interactions are the shape and chemical 877 properties of the pockets located on protein surfaces, which 878 l ∝ log(N ) show high affinity to binding sites. In a recent work,112 the 879 analysis of topological properties of the pocket similarity 880 839 where N is the number of nodes. network demonstrated that highly connected pockets (hubs) 881 840 PCNs were analyzed as for their scale-free properties, in generate similar concavity patterns on different protein surfaces. 882 841 order to identify crucial binding sites.43,59 The small-world These similarities go hand-in-hand with similar biological 883 842 behavior of protein structure networks was shown for the first functions that imply similar pockets.112 In addition, they found 884 843 time by Vendruscolo et al.43 and later confirmed in several that maximum connected components in the pocket similarity 885 844 works.38,75 As we stretched before, it was shown that small- networks have a small-world and scale-free scaling. The analysis 886 845 world behavior of an inter-residue contact graph is conditioned of the physicochemical features of hub pockets leads to the 887 846 by the backbone connectivity.104 investigation of more functional implications from the similarity 888 847 According to both,59,104 PCNs are not “pure small-world” network model, which provided new insights into structural 889 848 networks, given that no explicit hub is present, so they must be genomics and have great potential for applications in functional 890 849 considered as “a class of network in its own”, generated by the genomics.113 The future purpose is to develop a classification 891 850 very peculiar constraint to mantain a continuous (covalent) method to divide similar pockets into small groups and 892 851 backbone joining the nodes in a fixed sequence.59,104 afterward to compile this evolutionary information into a library 893 852 Nevertheless, the most important feature of small-world of functional templates. 894 853 architecture, i.e., the presence of shortcuts allowing for an This work delineates a possible link between network wiring 895 854 efficient signal transmission at long distance, is present in PCNs and common function of utmost interest for the development 896 855 and it is the very basis of their physiological role (allostery, of contact-based meaningful formulas. By briefly describing 897 f9 856 dynamical properties, folding rate, etc.) (Figure 9). direct translation of graph theoretical descriptors into mean- 898 857 In this respect, it is relevant to go more in-depth into the link ingful protein functional properties, we gave a proof-of-concept 899 858 existing between a given topology and the dynamical behavior it of the general relevance of the proposed formalism. Now we 900 859 can host. As a matter of fact, according to a pattern-based will go more in-depth into some of these topology−function 901 860 computational approach,111 modular dynamic organization relations, but a leading leitmotiv can be already stated: the 902 structure−function link passes through a topological bottle- 903 neck, the contact network, that allows for a consistent and very 904 efficient formalism to be applied to the study of macro- 905 molecules. 906 3.2. Protein Structure Classification Proteins can be considered as modular geometric objects 907 composed of blocks, so allowing for a peptide-fragment-based 908 partition.114 For instance, it is well-known that globular 909 proteins are made up of regular secondary structures (α-helices 910 and β-strands) and nonregular secondary regions, called loops, 911 that join regular secondary structures and lack the regularity of 912 torsion angles for consecutive residues; actually, many families 913 of proteins evolved to perform multiple functions, with 914 variations in loop regions on a relatively conserved secondary 915 Figure 9. An example of a small-world network: most nodes are linked structure framework. Considering this, Tendulkar et al.114 916 only to their immediate neighbors, while few edges generate shortcuts developed an unconventional scheme of loops and secondary 917 between distant regions of the network. structure classification: the clustering of the peptide fragments 918 J dx.doi.org/10.1021/cr3002356 | Chem. Rev. XXXX, XXX, XXX−XXX