This document presents a mathematical framework for analyzing systems of interacting networks. The key points are:
1) The framework allows calculating the percolation threshold and component size distributions for systems of l interacting networks, taking into account connectivity both within and between networks.
2) Exact expressions are derived for the percolation threshold and applied to different degree distributions for two interacting networks.
3) The framework is applied to real-world systems involving communications networks and software networks to better understand their structure and function.
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Percolation in interacting networks
1. Percolation on interacting networks
E. A. Leicht1 and Raissa M. D’Souza1, 2
1
Department of Mechanical and Aeronautical Engineering, University of California, Davis, CA 95616
2
The Santa Fe Institute, Santa Fe, NM 87501
(Dated: July 6, 2009)
Most networks of interest do not live in isolation. Instead they form components of larger systems
in which multiple networks with distinct topologies coexist and where elements distributed amongst
different networks may interact directly. Here we develop a mathematical framework based on
generating functions for analyzing a system of l interacting networks given the connectivity within
and between networks. We derive exact expressions for the percolation threshold describing the onset
of large-scale connectivity in the system of networks and each network individually. These general
expressions apply to networks with arbitrary degree distributions and we explicitly evaluate them for
arXiv:0907.0894v1 [cond-mat.dis-nn] 6 Jul 2009
l = 2 interacting networks with a few choices of degree distributions. We show that the percolation
threshold in an individual network can be significantly lowered once “hidden” connections to other
networks are considered. We show applications of the framework to two real-world systems involving
communications networks and socio-tecnical congruence in software systems.
PACS numbers: 64.60.aq, 89.75.Fb
In the past decade there has been a significant advance works was introduced with the layered network frame-
in understanding the structure and function of networks. work of [4]. Yet, the networks in the distinct layers must
Mathematical models of networks are now widely used to be composed of the identical nodes (modeling essentially
describe a broad range of complex systems, from spread physical connectivity and logical connectivity or flow).
of disease on networks of human contacts to interactions Herein we consider systems of l ≥ 2 distinct interact-
amongst proteins [1, 2, 3]. However, current methods ing networks and calculate explicitly how the connectiv-
deal almost exclusively with individual networks treated ity within and between networks determines the onset of
as isolated systems. In reality an individual network is large scale connectivity in the system and in each net-
often just one component in a much larger complex sys- work individually. Our mathematical formulation has
tem; a system that can bring together multiple networks some overlap with recent works calculating connectivity
with distinct topologies and functions. For instance, a properties in a single network accounting for a diversity
pathogen spreads on a network of human contacts abet- of node attributes [5, 6] or interactions between modules
ted by global and regional transportation networks. Like- within a network [7, 8]. Here we present our formalism
wise, email and e-commerce networks rely on the Internet and also applications to real-world systems of interacting
which in turn relies on the electric grid. In biological sys- networks coming from telecommunications and software.
tems, activated genes give rise to proteins some of which The onset of large-scale connectivity (i.e., the percola-
go back to the genetic level and activate or inhibit other tion threshold) corresponding to the emergence of a giant
genes. Results obtained in the context of a single isolated connected component in an isolated network has been
network can change dramatically once interactions with
other networks are incorporated.
Consider a system formed by two interacting networks,
α and β, Fig 1(a). Network α could be a human contact k1 edges to
α network 1
network for one geographic region and network β that for
a separated region. When viewed as individual systems,
kν t
ed wo
··
only small clusters of connected nodes exist, hence, a dis-
ne
ge rk
·
sb ν
ease spreading in either network should stay contained
ac
edge traversed
within clusters. In reality, a disease can hop from α to β,
k
µ
from ν to µ
···
for instance, by an infected person flying on a airplane,
spread in the β network and eventually hop back to the kl edges to
α network into new clusters, causing an epidemic out- network l
break. Next consider interacting networks that contain β
completely different types of nodes. Network α can be
(a) (b)
a social network, such as an email communication net-
work of software developers, while network β can be a FIG. 1: a) Two networks α and β. Nodes interact directly
technological network, such as the network of calls be- with other nodes in their immediate network, yet also with
tween functions in software code. Here, bi-partite edges nodes in the second network. b) An illustration of the re-
connect developers on α to code they author on β. maining edges incident to a node in a network µ reached by
An important step towards modeling interacting net- following a random edge between networks ν and µ.
2. 2
studied extensively, first for random networks with Pois- Consider selecting uniformly at random an edge falling
son degree distributions [9] and later for random networks between a node in network ν and a node in network
with arbitrary degree distributions [10]. Similar results µ (i.e., a ν-µ edge). The µ node attached to the edge
were then derived using generating functions [11, 12], the is kν times more likely to have ν-degree kν than de-
approach we employ herein. Generating functions, simi- gree 1. We can also account for the remaining local
lar to the network configuration model [10, 13], evaluate connectivity, to nodes in other networks as shown in
the ensemble of all possible random networks consistent Fig. 1(b). In single isolated networks remaining con-
with a specified degree distribution, {pk }, and are most nectivity is called the excess degree of a node [11]. Let
µν
accurate in the sparse regime where networks are approx- qk1 ···kν ···kl denote the probability of following a randomly
imately tree-like. Thus in the regime before the emer- chosen ν-µ edge to a node with excess ν degree as shown
gence of the giant component, generating functions can in Fig. 1(b) (which has total ν-degree of kν + 1). Then
be used to calculate the distribution of component sizes. qk1 ···kν ···kl ∝ (kν + 1)pµ1 ···(kν +1)···kl , and the generating
µν
k
In the supercritical regime they can be used to calculate µν
function for the distribution, {qk1 ···kl } is,
the distribution in sizes of components that are not part
of the giant component. ∞
µν
For our purposes, a system with l ≥ 2 interacting net- Gµν (x) = qk1 ···kl xk1 · · · xkl
1 l (2)
works is described by a set of degree distributions. Each k1 ,...,kl =0
individual network µ is characterized by a multi-degree ∞ (kν + 1)pµ1 ···(kν +1)···kl
k
distribution, {pµ1 k2 ···kl }, where pµ1 k2 ···kl is the fraction
k k = ∞ µ xk1 · · · xkl
1 l
of all nodes in network µ that have k1 edges to nodes k1 ,···kl =0 j1 ,...,jl =0 (jν+1 )pj1 ···(jν +1)···jl
in network 1, k2 edges to nodes in network 2, etc. The −1
∞ ∞
multi-degree distribution for each network may be writ- ∂
ten in the form of a generating function: = jν pµ1 ···jl
j pµ1 ···kl xk1 · · · xkl
k 1 l
j1 ,··· ,jl =0
∂xν
k1 ,...,kl =0
∞
ν
Gµ (x1 , . . . , xl ) = pµ1 ···kl xk1 · · · xkl . (1) Gµ (x)
k 1 l =
k1 ,...,kl =0 ν
Gµ (1)
To simplify notation in what follows, we now define two ν
where Gµ (x) denotes the first derivative of Gµ (x) with
l-tuple’s, x = (x1 , . . . , xl ) and 1 = (1, . . . , 1).
respect to xν and the denominator is a normalization
Our interest is in calculating the distribution of compo- ν ν
constant so that Gµν (1) = 1. Also note that Gµ (1) ≡ k µ
nent sizes, where a component is a set of nodes connected
is the average ν-degree for a node in network µ.
to one another either directly or indirectly by travers-
The distribution of second nearest neighbors for
ing a path along edges. Clearly such components can
that µ node via the ν layer is calculated by us-
be composed of nodes distributed among the l different
ing Eq. 2 as the argument to Eq. 1, namely
networks, and our formulation allows us to calculate the
Gµ (1, 1, ..., Gνµ (x)|xλ =1,λ=µ , ..., 1). Comparing this dis-
distribution of such system-wide components, yet also to
refine the focus and calculate the contribution coming tribution calculated via generating functions to that
from nodes contained in only one of the l networks. found in real-world interacting networks can reveal in-
We begin by deriving the distribution of connectiv- teresting statistical features. Returning to the software
ity forGeneral Availability release randomly chosen edge. example, we have a network of email communication be-
First a node at the end of a tween developers, a network of relations between code,
Bug and security fix
and bipartite edges connecting developers to the code
6 they edit. We would expect that the real system does not
5 resemble a random network, but instead reflects a struc-
JSD (norm)
4 ture conducive to project development. For instance, if
3 two developers edit the same code we would like for them
2 to directly communicate via email and thus be first neigh-
1 bors. In a sparse random network these developers would
0
2001 2002 2003 2004 typically be second neighbors, connected indirectly via
Time the code they both edit.
We analyze the evolution of the Apache 2.0 Open
FIG. 2: Comparison over time of the distribution of the num- Source Software project from mid-2000 thru 2004, with
ber of developers connected indirectly via co-editing code in data aggregated over three month windows. From this we
the Apache project with the distribution expected in a ran- extracted the multi-degree distribution of the system for
dom network with the same multi-degree distribution. Verti- each time-shot, which we then plug into our generating
cal lines mark the first generally available release in 2002, and functions to calculate the expected distribution of second
a significant deviation from random in 2003, when the com- neighbors found by following first a developer-to-code
munication network shrinks and the project seems to become edge then a code-to-developer edge. We then compare
more efficiently organized. this distribution to the real distribution of such devel-
3. 3
We recognize the form of this equation from Eq. 2, thus
µ λ ! ! µ λ µ γ Hµν (x) = xµ Gµν [H1µ (x), . . . , Hlµ (x)]. (5)
!
= µ + µ + µ +... We now consider starting from a randomly chosen µ-
!"# !"# $"# node, rather than a random ν-µ edge. A topology such as
ν µ ν µ ν µ ν µ
one from Fig. 3 exists a the end of each edge incident to
the µ-node. The generating function for the probability
FIG. 3: A diagramatical representation of the topological con- distribution of component sizes is,
straints placed on the generating function Hµν (x) for the dis-
Hµ (x) = xµ Gµ [H1µ (x), . . . , Hlµ (x)]. (6)
tribution of sizes of components reachable by following a ran-
domly chosen ν-µ edge. The labels attached to each edge While in theory it is possible to solve Eq. 5 for Hµν (x)
indicate type or flavor of the edge and summation notation and use that solution in Eq. 6 to solve for Hµ (x), in
indicates that we are summing over all possible flavors.
practice, even for the case of a single isolated network, as
noted in [11] the equations are typically quite difficult to
solve. Yet, Eq. 6 allows calculation of average component
oper second nearest neighbors using the Jensen-Shannon
size. A component may include multiple node flavors, but
divergence [14], a symmetric measure based on Kullback-
we can distinguish between the average number of each
Leibler divergence. The results are shown in Fig. 2 with
type. For example, the average number of ν-nodes in the
the JS-score of the real networks normalized by the JS-
component of a randomly chosen µ-node is
scores from the ensemble of random networks. Values
greater or less than unity indicate networks more or less ∂
random than average. We indicate two vertical bars sµ ν = Hµ (x)
∂xν x=1
where significant difference between the random and real
= δµν Gµ [H1µ (1), . . . , Hlµ (1)]
networks occurs. The first, in mid-2002 marks the first
l
general availability release of Apache 2.0, the second, at λ ν
the start of 2003, is a bug and security fix [15]. This lat- + Gµ [H1µ (1), . . . , Hlµ (1)]Hλµ (1)
ter point, moreover, marks when a substantial purging λ=1
of developers from the communication network occurs. l
λ ν
In any three-month window we observe that only about = δµν + Gµ (1)Hλµ (1) (7)
25 developers edit code, yet prior to 2003 the number λ=1
of developers in the email network is significantly larger. ν
Intuitively Eq. 7 is reasonable because Hγλ (1) represents
Thus this time seems to indicate when the Apache project
the average number of ν-nodes in the component found
becomes more efficiently organized, eliminating noise of
by following a µ-λ edge towards a λ-node, and the ex-
spurious emails to inactive developers.
pected number of µ-λ edges incident to an initial µ-node
We are now in position to consider component sizes. λ
λ λ
Assume we follow a randomly chosen ν-µ edge to a µ node is Gµ (1) (recall, Gµ (1) = k µ ). The product of the two
(Fig. 1(b)), and consider the distribution in sizes of the terms summed over all λ networks produces the num-
component found by following the additional outgoing ber of ν-nodes in a component connected to a randomly
edges. Let Hµν (x) denote the associated generating func- chosen µ-node, sµ ν .
tion. Fig. 3 illustrates all the types of connectivity possi- The preceding results regarding components hold in
ble for the µ-node, and summing over all these possibili- the sub-critical regime where no giant connected compo-
ties leads to the self-consistency equation for Hµν (x): nent exists. Once a giant component emerges, generating
functions allow us to calculate properties of components
µν
Hµν (x) = xµ q0···0 (3) not belonging to it. The giant component will span mul-
1 l tiple networks and calculating its size requires accounting
+ xµ δ1,Pl µν
Hγµ (x)kγ for the contribution from each network. Let Sµ be the
kλ qk1 ···kl
k1 ...kl =0
λ=1
γ=1 fraction of µ-nodes belonging to the giant component.
2 l
The probability that a randomly chosen µ-node is not
µν part of the giant component must then satisfy the fol-
+ xµ δ2,Pl q Hγµ (x)kγ + ···
λ=1 kλ k1 ···kl lowing equation,
k1 ,...,kl =0 γ=1
∞
δij denotes the Kronecker delta, used here to account for 1 − Sµ = pµ1 ,...,kl uk1 · · · ukl = Gµ (u1µ , . . . , ulµ ), (8)
k 1µ lµ
all combinations of flavors of edges connected to the µ- k1 ,...,kl =0
node leading to specified excess degree i. Reordering the
terms, Eq. 3 becomes where uνµ is the probability that an µ-ν edge is not part
of the giant component. In addition, for all µ, ν ∈ l, uνµ
∞ must satisfy,
µν
Hµν (x) = xµ qk1 ···kl H1µ (x)k1 · · · Hlµ (x)kl . (4)
k1 ...kl =0 uνµ = Gνµ (u1ν , . . . , ulν ), (9)
4. 4
derived using the same self-consistency arguments that 1
resulted in Eq. 5.
Though all the equations above hold for a system of
l ≥ 2 interacting networks, we now give a concrete ex- 0.8
Fraction of nodes
ample for l = 2, with the networks indexed as α and β.
Consider first the simplest of systems, where the inter-
0.6
nal connectivity of α and β each has a distinct Poisson
degree distribution, and the inter-network connectivity
0.7
is described by a third Poisson degree distribution, for 0.4
α α β β 0.6
instance, pαα kβ = (k α )kα e−kα /kα ! (k α )kβ e−kα /kβ ! .
k
0.5
0.4
ν
(Recall k µ denotes the average ν-degree for a node in 0.2 0.3
0.2
network µ.) Then, from Eq. 1, 0.1
01
α β 10 κ 100
Gα (xα , xβ ) = ekα (xα −1) ekα (xβ −1) (10) 00 1 2 3 4 5
β
α β
Gβ (xα , xβ ) = ekβ (xα −1) ekβ (xβ −1) . (11) kβ
Using Eq. 7, the average number of α-nodes in a compo- FIG. 4: Numerical simulations of connectivity in a system
nent reachable from a randomly chosen α-node is, of two interacting Poisson degree distributed networks, α and
α β α α β
β, with inter-network connectivity also Poisson distributed, as
kα + kα kβ − kα kβ connectivity on β increases. Each network has 100,00 nodes,
sα α =1+ α β β α
. (12) α β α
with kα = 0.4 and kα = kβ = 0.5. Shown are the fraction of
(1 − k α )(1 − k β ) − k α k β
α nodes, Sα (circles), β nodes, Sβ (squares), and all nodes,
S (triangles) in the system-wide giant component, with the
The average component size diverges for
α β β α dashed curves giving the analytic results, Eqns. (13) and (14).
( 1 − k α ) ( 1 − k β ) = k α k β ; the point at The horizontal dashed line is the asymptotic value to which
which the giant component emerges. (Ref. [8] recently Sα approaches. (Inset) Analogous results when α has Poisson
α
presented an alternate method for deriving similar per- distribution with kα = 0.5, inter-network edges follow a Pois-
β α
colation thresholds and connectivity properties, but in a son distribution with kα = kβ = 0.4, but β has a power-law
single network with multiple interacting communities.) distribution with exponent τ = 2.5 and an exponential cutoff
Note, following Eq. 7, we can show sβ α , sα β , and that we vary between 1 ≤ κ ≤ 300. The solid curve is the
sβ β also all diverge at this point, marking when a giant result for network β when viewed in isolation.
component emerges in each network and throughout
the system. Further simplifying, by assuming the two β α α β
−1
interacting networks have the same degree distribution, shown that as k β increases Sα → α W
kα
−k α e−kα −kα +1
α β β α
k α = k β = k intra and k α = k β = k inter , then the giant (dashed horizontal line in Fig. 4), where W is the Lam-
component emerges when, k inter + k intra = 1, recovering bert W function, also known as the product log.
the standard result for a single network (which, by We next consider more complex degree distribu-
definition, has k inter = 0) that emergence occurs for tions, where α is still described by a Poisson dis-
k intra = 1. tribution, but the internal connectivity of β is de-
Once the giant component emerges the uνµ which sat- scribed by a power-law distribution with an exponen-
isfy Eq. 9 are uαα = uαβ = 1 − Sα and uββ = uβα = tial cutoff. While power-law degree distributions have
1 − Sβ , while Sα and Sβ , respectively, the number of α- attracted considerable attention as a model for node
nodes and β-nodes in the giant component of the system, degree distributions in many types of networks [16],
satisfy a power-law with an exponential cutoff may be a
better model for real-world networks [17]. Here
α β α α
Sα = 1 − e−(kα Sα +kα Sβ ) (13) pβα kβ
k = (k β )kα e−kβ /kα ! (kβ )τ e−kβ /κ /Liτ (e−1/κ )
α β
Sβ = 1 − e−(kβ Sα +kβ Sβ ) . (14) where Lin (x) is the nth polylogarithm of x and serves
as a normalizing factor for the distribution. Thus, we
To observe the change in connectivity of one network can write our basic generating function for network β,
precipitated by an increase in connectivity of a second
network attached to the first, we simulated a system of α Liτ (xβ e−1/κ )
Gβ (xα , xβ ) = ekβ (xα −1) . (15)
α β α
two interacting networks and fixed k α , k α , and k β while Liτ (e−1/κ )
β β
varying k β from 0 to 5 (Fig. 4). As k β increases the The generating function for α is still given by Eq. 10.
β-network becomes a single connected component (the We simulate the impact on the connectivity of the α-
traditional behavior for a single network) and Sβ → 1. network as the exponential cutoff and hence the average
However, the connectivity of α remains limited. It can be degree of network β increases, inset of Fig. 4. Again
5. 5
1 tooth connectivity between individuals from raw data of
Bluetooth sightings by 41 attendees at the 25th IEEEE
International Conference on Computer Communications
Fraction of nodes (S)
0.8
(INFOCOM) [19]. We initially partition the raw data
into discrete 20 minute windows and consider that a
0.6 communication edge exists between any two devices so
α
long as they are within contact for at least 120 seconds.
0.4 Each network has approximately a Poisson degree dis-
tribution of connectivity. We choose two arbitrary 20
minute snapshots as proxies for two distinct networks,
0.2
β
α and β, representing, for instance, two separate rooms
at the conference. We calculate how adding long-range
0 0.2 0.4 0.6 0.8 1 connections between α and β (for instance via text mes-
β α sages or email) enhances overall connectivity in the sys-
kα and kβ ] tem. In other words, we calculate how many long-range
FIG. 5: Inset are two sample networks of Bluetooth connec- connections would be needed between two isolated local
tivity. The main figure shows the increase in participation Bluetooth networks to create the desired large scale con-
in the giant component as connectivity between α and β in- nectivity, potentially allowing many users to share infor-
β α
creases, starting from kα = kβ = 0.1. Points are obtained by mation. Figure 5 shows the size of the giant component
taking the empirical data and simulating inter-network edges obtained via numerical simulations using the real data
β α
with the appropriate kα and kβ , averaged over 100 realiza- (points) and the analytic calculations obtained via gen-
tions. The solid line is from analytic calculations. erating functions (dashed line). The analytic calculations
slightly overestimate connectivity, yet there is remarkable
agreement with empirical data even though the actually
the dashed curves are the analytic results obtained by networks are quite small.
solving Eqns. 8 and 9. The solid red line is the behavior In summary, we have introduced a formalism for cal-
for the β network considered in isolation, showing that culating connectivity properties in a system of l interact-
even the percolation threshold for β is lowered through ing networks. We demonstrate the extreme lowering of
connectivity with network α. the percolation threshold possible once interactions with
Finally we consider an application of connectivity to other networks are taken into account. This framework
communications networks, building on the increasing in- for calculating connectivity and statistics of interacting
terest in using Bluetooth connectivity between individ- networks should be broadly applicable, and we show po-
uals to transmit data [18]. For instance, rather than tential applications to software and communications sys-
downloading a webpage (such as the CNN homepage) tems.
by connecting to the Internet, a copy could be obtained Acknowledgements We thank Christian Bird for
from a close-by individual already in possession of this providing data on the Apache project and for useful con-
data. We construct prototypical networks of local Blue- versations.
[1] S. N. Dorogovtsev and J. F. F. Mendes, Advances in Rev. E 64, 026118 (2001).
Physics, 51, 1079-1187 (2002). [12] D. S. Callaway, M. E. J. Newman, S. H. Strogatz, and
[2] M. E. J. Newman, SIAM Review 45, 167 (2003). D. J. Watts, Phys. Rev. Lett. 85, 5468 (2000).
[3] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez and D.- [13] B. Bollob´s, European Journal of Combinatorics 1, 311
a
U. Hwang, Physics Reports, 424, 175-308 (2006). (1980).
[4] M. Kurant and P. Thiran, Phys. Rev. Lett. 96, 138701 [14] J. Lin, IEEE Trans. Information Theory, 37 (1) 145-151
(2006). (1991).
[5] B. Bollob´s, S. Janson and O. Riordan, Random Struc-
a [15] http://www.apacheweek.com/features/ap2.
tures and Algorithms 31, 3-122 (2007). [16] A.-L. Barab´si and R. Albert, Science 286, 510-512
a
[6] A. Allard, P-A No¨l, L. J. Dub´ and B. Pourbohloul,
e e (1999).
Phys. Rev. E 79, (3) 036113 (2009). [17] A. Clauset, C. R. Shalizi, and M. E. J. Newman, SIAM
[7] S. N. Dorogovtsev, J. F. F. Mendes, A. N. Samukhin, Review, in-press (2009), (arXiv:0706.1062).
and A. Y. Zyuzin, Phys. Rev. E 78, 056106 (2008). [18] S. Ioannidis, A. Chaintreau, and L. Massouli´. Proceed-
e
[8] M. Ostilli and J. F. F. Mendes, arXiv:0812.0608 (2008). ings of IEEE INFOCOM, 2009.
[9] P. Erd˝s and A. R´nyi, Publicationes Mathematicae 6,
o e [19] J. Scott, R. Gass, J. Crowcroft, P.
290 (1959). Hui, C. Diot and A. Chaintreau,
[10] M. Molloy and B. Reed, Random Structures and Algo- http://crawdad.cs.dartmouth.edu/cambridge/haggle/
rithms 6, 161 (1995). imote/infocom (2006).
[11] M. E. J. Newman, S. H. Strogatz, and D. J. Watts, Phys.