Yandex wg-talk

Random graph process models of large
networks
Colin Cooper
Department of Informatics
King’s College London

28th October 2013
Yandex

Random graph process
Graph process: at each step the existing graph is modiﬁed by
making a small number of structural changes, e.g.
Add a new vertex with edges incident to existing graph
Add edges within the existing graph
Delete some edges or vertices
Exchange some existing edges for others
If these changes are random then some asymptotic structural
properties may emerge as the process evolves. For example
The degree sequence has a power law with parameter γ

Outline
Introduction

Various web graph models

Degree distribution: Undirected model

Hub-Authority model: Directed

Web-graphs of increasing degree

Experimental studies
Large-scale dynamic networks such as the Internet and the
World Wide Web
Barabási and Albert, Emergence of scaling in random
networks, (1999).
Broder, Kumar, Maghoul, Raghavan, Rajagopalan, Stata,
Tomkins and Wiener, Graph Structure in the Web, (2000).

M. Faloutsos and P. Faloutsos and C. Faloutsos, On
Power-law Relationships of the Internet Topology, (1999)

Power law degree sequence
Proportion of vertices of a given degree k follows an
approximate inverse power law
nk ∼ Ck −γ
for some constants C, γ
Various explanatory models e.g.
Bollobás, Riordan, Spencer and Tusnády, The degree
sequence of a scale-free random graph process, (2001)
Aiello, Chung and Lu, A random graph model for massive
graphs, (2000)
Kumar, Raghavan, Rajagopalan, Sivakumar, Tomkins and
Upfal. Stochastic models for the web graph, (2000)
Dorogovtsev, Mendes and Samukhin, Structure of growing
networks with preferential linking (2000)

Preferential attachment
One approach: generate graphs via a preferential attachment
PA: attach to a vertex proportional to degree
PA gives a power law distribution parameter γ = 3
The preferential attachment model dates back to Yule
G. Yule. A mathematical theory of evolution based on the
conclusions of Dr. J.C. Willis, Philosophical Transactions of the
Royal Society of London (Series B) (1924).
Yule model: Random tree. Each point independently
generates children with rate 1 in time interval ∆t. Early points
have most children
PA was proposed as a random graph model for the web by
Barabási and Albert. Emergence of scaling in random
networks, (1999)

Publications relevant to this talk
Cooper and Frieze, A general model of web graphs, RSA
(2003)
An analysis of the recurrence for the expected number of
vertices of degree k , combined with concentration results and
bounds for maximum degree.
Uses Laplace’s method to solve recurrences with rational
coefﬁcients
Cooper. The age speciﬁc degree distribution of web-graphs,
CPC (2006)
Derives degree distribution directly, and uses this to obtain
expected number of vertices of degree k
Cooper, Pralat. Scale-free graphs of increasing degree, RSA
(2011)
Adapts the degree distribution method to obtain results for
growth model

Web-graph models
Simple undirected or directed process models where a mixture
of vertices and edges are added at each step either
preferentially or uniformly at random
For undirected web-graph processes, as the degree k tends to
inﬁnity, the expected proportion of vertices of degree k tends to
Nk ∝ k −γ . The power law parameter is given by
γ = 1 + 1/η.
Here η is the limiting ratio of the expected number of edge
endpoints inserted in the process by preferential attachment to
the expected total degree
The maximum degree ∆ in this model is a.s.
∆ = O (nη )
where n is the number of vertices
Surprisingly, these results seem to hold for other types of
process model and can be useful as a general heuristic

Some examples of the power law heuristic
Standard preferential attachment: Make G(t) from G(t − 1) by
adding a new vertex vt with (an average of) m neighbours
chosen preferentially from G(t − 1)
η=

Power law

1
m
=
2m
2

γ =1+

Maximum degree

1
=1+2=3
η

∆ = O n1/2

Experimental evidence PA model
Rapid convergence for PA graphs γ = 3
20, 000 vertices is enough (see light blue plot data)

Thanks to Yiannis Siantos for the ﬁgure

Non-standard triangle closing model
Make G(t) from G(t − 1) by adding a new vertex vt
with one neighbour u chosen u.a.r from G(t − 1)
and one edge from vt to a random neighbour w of u

Pr(w chosen) ∝ d(w)
One edge in 4 is chosen preferentially

Proportion of edges added preferentially is
η=

1
4

So heuristically
Power law

γ =1+

Maximum degree

1
=1+4=5
η
∆ = O n1/4

Experimentally this seems to be true in the limit (see next slide)
The model seems difﬁcult to analyze formally

Heuristic gives no information on convergence rate
Slow convergence: Large experiments up to 4 × 108 vertices
Still not quite arrived at γ = 5, ∆ = O n1/4

Thanks to Yiannis Siantos for the ﬁgure

Web-graph model generative choices

Web-graph model: Power law degree sequence
For undirected web-graph process, as the vertex degree k
tends to inﬁnity, the expected proportion of vertices of degree k
tends to Nk ∝ k −γ . The power law parameter is given by
γ = 1 + 1/η
where η is the limiting ratio of the expected number of edge
endpoints inserted by preferential attachment to the expected
total degree
Any γ > 2 can be obtained by suitable choices of parameters

Undirected Web-graph model parameters
At each step either NEW vertex (+edges) is added with
probability α
or extra edges added between OLD vertices with prob.
β =1−α
For convenience edges are regarded as "directed out" from
new vertex
The number of edges is sampled from a distribution
depending on the choice made (NEW, OLD)
Each edge endpoint makes independent UAR or PA
choices:
A. New vertex v , choice for edges directed OUT from v
B. Old vertex v , choice for extra edge directed OUT from v
C. Old vertex v , choice for extra edge directed IN to v

Undirected model continued
NEW procedure.
All edges are "directed out" from new vertex.
Each edge of v chooses independently using probability
mixture (parameter A)
Pr(w is selected) = A1

1
d(w, t)
+ A2
2|E(t)|
|V (t)|

where
Pr(w is selected by ei ) = A1 + A2 = 1
w

In all OLD cases Z = A, B, C we have
pZ (v , t) = Z1

1
d(v , t − 1)
+ Z2
2|E(t − 1)|
|V (t − 1)|

Result of these choices
At each step with prob. α, NEW vertex (+edges) is added,
with prob. β = 1 − α extra edges are added between OLD
vertices
The number of edges m, M (NEW, OLD) sampled from a
probability distribution. Expected number of edges m, M
A. New vertex v , edges directed OUT from v
B. Old vertex v , edges directed OUT from v
C. Old vertex v , edges directed IN to v
Degree distribution depends on two parameters η, ν

PA

UAR

η=

αmA1 + βM(B1 + C1 )
2(αm + βM)

ν=

αmA2 + βM(B2 + C2 )
α

Degree distribution: Undirected model

η =

αmA1 + βM(B1 + C1 )
2(αm + βM)

PA

ν =

αmA2 + βM(B2 + C2 )
α

Uar

Vertex v of initial degree m added at step v
Distribution of degree d(v , t), of v at step t
P(d(v , t) = m+ |m) ∼

+m+ ν −1
η

v
t

m η +ν

1−

Assumes t → ∞ and v is added after time v0 → ∞, and
= o(t 1/4 )

v η
t

Illustration: Pr (degree increases by 2)
Prob. of change p, no change q at step t
η(m + j) ν
+
t
t
Change points τ1 , τ2
p(j, t) ∼

q(j, t) = 1 − p(j, t)

v | − − − − − −|τ1 − − − − − −|τ2 − − − − − − − −|t
Prob of exactly 2 changes at τ1 , τ2
q(0, v + 1) · · · q(0, τ1 − 1)p(0, τ1 )
×q(1, τ1 + 1) · · · q(1, τ2 − 1)p(1, τ2 )
×q(2, τ2 + 1) · · · q(2, t)

ﬁrst change at τ1
second change at τ2

no further changes

This evaluates to
v
F (τ1 , τ2 ) ∼ ((ηm+ν)(η(m+1)+ν))
t

m+ν

η−1
ητ1
tη

η−1
ητ2
tη

This evaluates to
F (τ1 , τ2 ) ∼ ((ηm+ν)(η(m+1)+ν))

v
t

m+ν

η−1
ητ1
tη

η−1
ητ2
tη

Add over all possible τ1 , τ2
F (τ1 , τ2 ) ∼
∼

(ηm+ν)(η(m+1)+ν)
2!
(ηm+ν)(η(m+1)+ν)
2!

v
t
v
t

m+ν

m+ν

t

ητ η−1
dτ
tη
v
v η 2
1−
t

2

From deg. distn we can obtain..
n( | m) expected proportion of vertices of degree m +
n( | m) =

(( + m − 1)η + ν) · · · (mη + ν)
(( + m)η + ν + 1) · · · (mη + ν + 1)

Proportion, Nt ( | m) of vertices of degree m +
concentrated around n( | m) provided t → ∞, and not
too large
As → ∞, n( | m) ∼ K −(1+1/η)
Range of η is 0 < η < 1. Power law coefﬁcient γ ≥ 2
η=

αmA1 + βM(B1 + C1 )
2(αm + βM)

As η → 0. Geometric degree sequence random graph
lim nη ( | m) ∼

η→0

1
ν+1

ν
ν+1

Hub-Authority model: Directed
Hub: Vertex with a lot of edges directed out (opinionated page)
Authority: Vertex with a lot of edges directed in (popular page)
The initial in- and out-degree is given by a distribution (P − , P + )
How does a new vertex v added at step t + 1 choose its
IN-neighbours?
Pr(w points to v ) = D1

1
d + (w, t)
+ D2
|E(t)|
|V (t)|

It is most likely a hub vertex will point an edge to v
How does a new vertex added at step t + 1 choose its
OUT-neighbours?
Pr(v points to w) = A1

d − (w, t)
1
+ A2
,
|E(t)|
|V (t)|

it is most likely v will point to an authority vertex

Results summary
Undirected model
√
( ) Age dependent degree distribution
√
( ) Number of vertices with given degree
√
( ) Asymptotic degree sequence n(k ) ∼ k −x
Hub-Authority model
√
( ) Age dependent in- and out-degree distribution
√
( , ×) Number of vertices with given in- & out-degree (as an
integral)
√
( ) Asymptotic degree sequence
n(k , l) ∼ k −x

−

−x +

, x = x(k , )

General Directed model
(×) The in- and out-degree distribution is not obtainable
explicitly
Sum of path dependent integrals (order of events matters)

Directed model. Deﬁnition only
In general, the choice type can be made on a mixture of IN and
OUT degree
E.g. How does a new vertex added at step t choose its
OUT-neighbours?

Pr(v points to w) =
A(1,+)

d − (w, t − 1)
1
d + (w, t − 1)
+ A(1,−)
+ A2
,
|E(t − 1)|
|E(t − 1)|
|V (t − 1)|

where
A(1,+) + A(1,+) + A2 = 1
An in-degree of 2 at w could be made up of various choices
(++), (+−), (−+), (−−) at w by subsequent vertices t > w

Results: Hub-Authority model
Degree distribution: Explicit distribution (similar to undirected)
Power law: Number of vertices n(r , s) of in-degree r ,
out-degree s is of the form
−

n(r , s | m− , m+ ) = Cr ,s r −x s−x

+

The parameters x − , x + depend on the relative sizes of r , s
+
−
They change as s increases from 1 to s = Θ(r η /η )
Functional form x = f (η + , η − , ν, m+ , m− ) quotient
η + , η − are the preferential attachment parameters
The parameter η − is the limiting ratio of the expected number of
edges whose terminal vertex was chosen by preferential
attachment, to the expected number of edges of the process
η− =

αm+ A1 + βMC1
αm+ + γm− + βM

How does degree sequence differ from Undirected?
Pr(d − (v , t) = r , d + (v , t) = s) ∼ Pr(d − (v , t) = r )Pr(d + (v , t) = s)
Expected proportion of vertices of degree (r , s)
−

+

n(r , s) = Cr −(1−ξ ) s−(1−ξ ) J(r , s)
where ξ + = m+ + ν + /η + and
1

J(r , s) =

x a (1 − x)r (1 − x b )s dx

0

where b = η + /η − and a = η + /η − ξ + + 1/η − + ξ − − 1
Asymptotics for J(r , s) depend on relative sizes of r , s

Increasing degree model: Preferential Attachment
Can we escape from power law γ = 3 by increasing the
number of edges added at each step?
At each step t add NEW vertex with f (t) edges
f (t) = [t c ],
For k

0<c<1

t c the power law we get is
nk = C

t

1+c
1−c

3−c

k 1+c

Need c > 0 constant to escape power law γ = 3 given by PA
models
When c = 1 all vertices have degree ∼ t so no power law
anymore
For 0 < c < 1 the power law is γ(c) = 1 + 2/(1 − c) > 3

Concluding remarks
Good points of web-graph model
Method works well for undirected models
Provides a heuristic for predicting degree sequence power
law and maximum degree in unrelated models
Generalizes to hypergraph models (not covered in this talk)
If 1 ≤ m(t) = t o(1) edges added at step t, power law is 3
Not so good points of web-graph model
Directed models less pleasing, as power law varies as a
function of relative sizes of in-degree and out-degree
General directed model: no closed form for degree
distribution?
Model does not explain/predict power laws with parameter
γ < 2 (As η ≤ 1 it must be that γ = 1 + 1/η ≥ 2)

Yandex wg-talk

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (7)

Similar a Yandex wg-talk

Similar a Yandex wg-talk (20)

Más de Yandex

Más de Yandex (20)

Último

Último (20)

Yandex wg-talk