2. 2
Brains
nodes are neurons
edges are synapses
Social networks
nodes are people
edges are
friendships
Electrical grid
nodes are power plants
edges are transmission
linesTim Meko, Washington Post
Currency
nodes are accounts
edges are transactions
Background. Networks are sets of nodes and edges (graphs)
that model real-world systems.
3. 3
Background. Networks are globally sparse but locally dense.
Co-author network
Networks for real-world systems have modules, clusters, communities.
[Watts-Strogatz 1998; Flake 2000; Newman 2004, 2006; many others…]
Brain network
Sporns and
Bullmore, Nature
Rev. Neuro., 2012
4. 4
How do we measure
how much a network clusters?
5. 5
? C(u) = fraction of length-2 paths
centered at node u that form a triangle.
average clustering coefficient
C = average C(u) over all nodes u.
• In real-world networks, C is larger than we would expect (there is clustering).
[Watts-Strogatz 1998] > 34k citations!
• Attributed to triadic closure in sociology – a common friend provides an
opportunity for more friendships. [Rapoport 1953; Granovetter 1973]
• Key property for generative models.
[Newman 2009; Seshadhri-Kolda-Pinar 2012; Robles-Moreno-Neville 2016]
• Common feature in role discovery, anomaly detection, etc.
[Henderson+ 2012; La Fond-Neville-Gallagher 2014, 2016]
• Predictor of mental health. [Bearman-Moody 2004]
-
-
Background. The clustering coefficient is the fundamental
measurement of network science.
6. 6
The clustering coefficient measures the closure
probability of just one simple structure—the triangle.
… but there is lots of evidence that dense
“higher-order structure” between > 3 nodes are
also important for clustering.
• 4-cliques reveal community structure in word
association and PPI networks [Palla+ 2005]
• 4- and 5-cliques (+ other motifs/graphlets)
used to identify network type and dimension
[Yaveroğlu+ 2014, Bonato+ 2014]
• 4-node motifs identify community structure in
neural systems [Benson-Gleich-Leskovec 2016]
The clustering coefficient is inherently limited.
7. 7
Triangles tell just one part of the
story.
How can we measure
higher-order (clique) closure
patterns?
8. 1. Find a 2-clique 2. Attach adjacent edge 3. Check for (2 + 1)-
clique
1. Find a 3-clique 2. Attach adjacent edge 3. Check for (3+1)-
clique
1. Find a 4-clique 2. Attach adjacent edge 3. Check for (4+1)-clique
8
C2 = avg. fraction of (2-clique, adjacent edge) pairs that induce a (2+1)-clique
Increase clique size by 1 to get a higher-order clustering
coefficient.
C3 = avg. fraction of (3-clique, adjacent edge) pairs that induce a (3+1)-clique
C4 = avg. fraction of (4-clique, adjacent edge) pairs that induce a (4+1)-clique
-
-
-
Our higher-order view through clique expansion.
9. Alice
Bo
b
Charli
e
1. Start with a
group of 3 friends
2. One person in the
group befriends
someone new
3. The group might
increase in size
Dav
e
9
rollingstone.com
oprah.com
Intuition for higher-order closure in social networks.
10. 10
We generalize clustering coefficients to account for clique closure.
This particular generalization has several advantages…
1. Theory. Analyze relationships between clustering at different orders.
• small-world and Gn,p random graph models
• combinatorics for general graphs
2. Data Insights. How do real-world networks cluster?
• old idea pretty much all real-world networks exhibit clustering
• new idea real-world networks may only cluster up to a certain order.
3. Applications. Finding “higher-order” communities.
• Large higher-order clustering coefficient → can find good “higher-order community”
Higher-order clustering coefficients.
11. 11
Second-order
(classical) local
clustering coefficient at
node u.
Second-order (classical)
global clustering
coefficient.
Second-order (classical)
average clustering
coefficient.
Background. Local, average, and global clustering coefficients.
12. 12
Third-order
local clustering
coefficient at node u.
Third-order
global clustering
coefficient.
Third-order
average clustering
coefficient.
Local, average, and global higher-order clustering coefficients.
13. 13
• Small-world [Watts-Strogatz 1998]
• Start with n nodes and edges to 2k neighbors
and then rewire each edge with probability p.
n = 16
k = 3
p = 0
[Watts-Strogatz 1998]
[Yin-Benson-Leskovec 2017]
Small-world network analysis.
14. 14
Proposition [Yin-Benson-Leskovec 2017]
Everything scales exponentially in the order of the cluster coefficient...
Even if a node’s neighborhood is dense, i.e., C2(u) is large,
higher-order clustering still decays exponentially in Gn,p.
Gn,p random graph network analysis.
15. 15
General network combinatorial analysis.
Extremal relationships HOCCs of different orders.
Proposition [Yin-Benson-Leskovec 2017]
For any node u in the network,
(tight upper and lower bounds)
16. 16
General network combinatorial analysis.
Clique density interpretation.
Proposition [Yin-Benson-Leskovec 2017]
The product of the first r - 1 local higher-order clustering coefficients is
the r-clique density between the neighbors of node u.
17. 17
General network combinatorial analysis.
Clique participation and computation.
Observation
We can compute the rth-order HOCCs by enumerating r- and (r + 1)-
cliques.
is the number of a-
cliques containing u
18. 18
We generalize clustering coefficients to account for clique closure.
This particular generalization has several advantages…
1. Theory. Analyze relationships between clustering at different orders.
• small-world and Gn,p random graph models
• combinatorics for general graphs
2. Data Insights. How do real-world networks cluster?
• old idea pretty much all real-world networks exhibit clustering
• new idea real-world networks may only cluster up to a certain order.
3. Applications. Finding “higher-order” communities.
• Large higher-order clustering coefficient → can find good “higher-order community”
Higher-order clustering coefficients.
20. 20
Neural connections 0.18 0.08 0.06 decreases with
order
Facebook friendships 0.16 0.11 0.12 decreases and
increases
Co-authorships 0.32 0.33 0.36 increases with
order
Is this just due to cliques in co-authorships?
No. High-degree nodes in co-authorships exhibit
clique + star structure where C3(u) > C2(u).
Global higher-order clustering coefficients.
21. 21
Neural connections 0.31 0.14 0.06
Random configurations 0.15 0.04 0.01
Random configurations (C2 fixed) 0.31 0.17 0.09
Facebook friendships 0.25 0.18 0.16
Random configurations 0.03 0.00 0.00
Random configurations (C2 fixed) 0.25 0.14 0.09
Co-authorships 0.68 0.61 0.56
Random configurations 0.01 0.00 0.00
Random configurations (C2 fixed) 0.68 0.60 0.52-
-
-
Average higher-order clustering coefficients
23. 23
Neural connections findings not just due to cliques.
Original network Null model
# 4-cliques 2,010 440 ± 68
C3 0.14 0.17 ± 0.004
4-clique count decreases in the null model, but the
higher-order clustering coefficient increases.
-
Key reason. Clustering coefficients are
normalized by opportunities to cluster.
24. 24
Neural connections
Gn,p baseline
Upper bound
Facebook friendships Co-authorships
Dense but nearly
random regions
Dense and
structured regions
• Real network
• Random configuration with C2 fixe-
Local HOCCs.
25. 25
We generalize clustering coefficients to account for clique closure.
This particular generalization has several advantages…
1. Theory. Analyze relationships between clustering at different orders.
• small-world and Gn,p random graph models
• combinatorics for general graphs
2. Data Insights. How do real-world networks cluster?
• old idea pretty much all real-world networks exhibit clustering
• new idea real-world networks may only cluster up to a certain order.
3. Applications. Finding “higher-order” communities.
• Large higher-order clustering coefficient → can find good “higher-order community”
Higher-order clustering coefficients.
26. 26
If a network has
a large higher-order clustering coefficient,
then it has communities.
then there exists at least one community
by one particular measure of “higher-order community structure”,
but we can find the community efficiently.
27. Conductance is one of the most important cluster quality scores [Schaeffer
2007]
used in Markov chain theory, spectral clustering, bioinformatics, vision, etc.
The conductance of a set of vertices S is the ratio of
edges leaving to edges in S.
small conductance good cluster
(edges leaving S)
(edge end points in S)
27
S S
Background. Graph clustering and conductance.
28. 28
Background. Motif conductance generalizes conductance to
higher-order structures like cliques [Benson-Gleich-Leskovec 2016]
Uses higher-order notions of cut and
volume.
M = triangle motif
29. 29
Easy to see that if Cr = 1,
then the network is a union
of disjoint cliques…
… any of these cliques has
optimal motif conductance = 0
Theorem [Yin-Benson-Leskovec, in preparation]
There is some node u whose 1-hop
neighborhood N1(u) satisfies
where M is the r-clique motif
This generalizes and improves a similar r = 2 (edge) result [Gleich-Seshadhri
2012]
Higher-order clustering higher-order communities.
30. 30
Neural connections Facebook friendships Co-authorships
Neighborhood
Neighborhood with smallest conductance
Fiedler cut with motif normalized Laplacian
[Benson-Gleich-Leskovec 16]
Large C3 and several neighborhoods
with small triangle conductance
Higher-order clustering higher-order communities.
31. 31
Higher-order clustering higher-order communities.
Theory. (pessimistic in practice)
Practice. If the higher-order clustering coefficient is non-trivial,
then there should be good local clusters.
32. 32
Local higher-order graph clustering
Yin, Benson, Leskovec, & Gleich, KDD, 2017.
• Studies the general problem of
finding local clusters based on
motifs (cliques).
• Our method is a generalization
of the Andersen-Chung-Lang
personalized PageRank
algorithm that expands
clusters around a seed node.
• Theoretical guarantees on
cluster quality and
performance (in practice, < 2
sec / seed on 2B edge graph).
Seed
node
Local cluster
33. 33
Local higher-order graph clustering
Yin, Benson, Leskovec, & Gleich, KDD, 2017.
• Clusters based on
triangles yield better
recovery results on
common synthetic graph
models.
Average F1 0.40 0.50
• Clusters based on triangles
can better recover a person’s
departmental affiliation in an
academic email network.
34. 34
Related work
Gleich and Seshadrhi, “Vertex neighborhoods, low conductance cuts, and good seeds for local
community methods”, KDD, 2012.
Motivation for relating higher-order clustering coefficients to 1-hop neighborhood communities.
Intellectually indebted for their proof techniques!
Benson, Gleich, and Leskovec, “Higher-order organization of complex networks,” Science, 2016.
Introduced higher-order conductance and a spectral method for optimizing it.
Fronczak et al., “Higher order clustering coefficients in Barabási–Albert networks.” Physica A, 2002.
Higher-order clustering by looking at shortest path lengths.
Jiang and Claramunt, “Topological analysis of urban street networks,” Environ. and Planning B,
2004.
Higher-order clustering by looking for triangles in k-hop neighborhoods.
Lambiotte et al., “Structural Transitions in Densifying Networks,” PRL, 2016.
Bhat et al., “Densification and structural transitions in networks that grow by node copying,” PRE,
2016.
Generative models with similar clique closure ideas.
35. 35
Papers
• “Higher-order clustering in networks.” Yin, Benson, and Leskovec. arXiv, 2017.
• “Local higher-order graph clustering.” Yin, Benson, Leskovec, and Gleich. KDD, 2017.
• “Higher-order organization of complex networks.” Benson, Gleich, and Leskovec. Science, 2016.
1. A generalization of the fundamental
measurement of network science through
“clique expansion” interpretation.
2. Able to analyze generally and in common
random graph models (small-world and Gn,p).
3. old idea all real-world graphs cluster.
new idea only cluster up to a certain order.
4. In data, helps distinguish between dense and
random (neural connections) and dense and
structured (FB friendships, co-authorship).
5. Higher-order clustering implies local (1-hop
neighborhood) higher-order communities.
Open questions / future work
• Is there a generative model
that reproduces the
observed higher-order
clustering coefficients (e.g.,
forest fire)?
• Tighter analysis for 1-hop
neighborhood conductance?
• Higher-order clustering
coefficients for other motifs
(i.e., not just cliques).
http://cs.cornell.edu/~arb
@austinbenson
arb@cs.cornell.edu
Thanks!
Austin Benson
36. 36
Neural connections Facebook friendships Co-authorships
Decrease in average
clustering with order is
independent of
degree.
For large degrees,
Changes in higher-order clustering coefficients tend to be
independent of degree.