On the Separability of Structural Classes of Communities

Text
On the Separability of
Structural Classes of Communities
Bruno Abrahao
Sucheta Soundarajan
John Hopcroft
Robert Kleinberg Cornell University
1

The idea of community structure as distinctive relationships

[Newman-Girvan, 2004]

2

Which community is real?

3

Which community is real?

Metis “Real” Community Random Walk

Infomap Newman-Modularity Louvain
3

How do their structures differ?

Metis “Real” Community Random Walk

Infomap Newman-Modularity Louvain
3

The definition of community structure

• Community structure is not well defined
– different people have different notions of community structure

• Traditional strategy
1. start with an expectation of what a community should look like
• e.g., a set of nodes that interact more within the set than with the outside
2. define an optimization problem
3. design heuristic
4. the solution gives the desired communities

4

Two research questions that we address here

• A multitude of different of algorithms
– different objective functions
– different heuristics
How dissimilar are their outputs?

• Communities may differ from the
proposed mathematical constructs
– e.g., preponderance of links to the outside, as in this
ﬁgure, contrary to widely accepted notions
Which algorithms extract communities that
most closely resemble the structure of real
communities?

5

Obstacles to answering the preceding questions

• We don't know what properties communities possess

• We can't characterize communities in the absence of negative
examples
– Look at real communities and determine their structure
– do other sets that are not communities have these properties?
– every other connected set could be a negative example - intractable
– sets that are not annotated could also be communities

• We don't know what metrics we should use
– modularity, conductance, clustering coefﬁcient...

6

Our plan to address the preceding questions

• Propose a methodology to analyze community structure by
using different notions of communities as references

– key idea: analyze community structure without requiring negative examples of
communities

• Scalable and comprehensive, simultaneously considering
– multiple notions of communities
– diverse domains of application
– a broad spectrum of community metrics

• Assess the structural dissimilarities between
– the output of different community detection algorithms
– the output of algorithms and real communities

7

Building structural classes by extracting examples

Algorithm Network

8


Apply

Algorithm Network

8


Apply

Algorithm Network Extract community
examples

8


Algorithm 1

Algorithm 2

Algorithm 3

Algorithm 4

Algorithm k

9


Algorithm 1 Class 1

Algorithm 2 Class 2

Algorithm 3 Class 3

Class 4
Algorithm 4

Algorithm k Class k

9

Building a feature space by characterizing examples

Labeled Example

10


Labeled Example

Feature Vector

10


11


Feature Space
11

Measure inter-class separability from feature space

Feature Space

12


Class Separability
Measure

Feature Space

12


Are the classes separable?

Class Separability
Measure

Separability = Distinct structures
Feature Space

12

We test our methods on large-scale network datasets

• Social + Rice University

• Commercial

• Biological

Facebook+Rice with permission of Mislove et al.. Other datasets publicly available.

13

We furnish our framework with10 community detection algorithms

• BFS (Random connected subgraphs)
• Random-Walk-based (with and without restart)
• (α,β)-communities
• InfoMap
• Markov Clustering
• Metis
• Louvain
• Newman-Clauset-Moore
• Link Communities

14

Annotated communities identify exemplar communities
Metadata included in the datasets

+ Rice University

15

To what extent are the classes separable?

16

First separability measure: Scatter Matrices

• Traditional methods for measuring class separability give a
single score, e.g., scatter matrices

Network

Reference: the same data with shufﬂed labels

• This is a global measure. We need more ﬁne-grained
separability information of each class!

17

Idea: use the performance of probabilistic multi-class classiﬁers

Train Algorithm 1

Probabilistic k-way
classifier Algorithm 2
(SVM, k-NN)

Annotated
communities

18


Classify
(cross-validation)

Probabilistic k-way
classifier
(SVM, k-NN)

19


Classify
(cross-validation)

Probabilistic k-way
classifier
(SVM, k-NN)

Pr(Algorithm 1) = 0.05
...
Pr(Annotated) = 0.48

19

Cross-validation performance indicates class separability

BFS BFS

RW0 RW0

RW15 RW15

AB AB
Structural Class

IM IM

LC LC

Louv. Louv.

Newm. Newm.

MCL MCL

Metis Metis

Ann. Ann

0.0 0.2 0.4 0.6 0.8 1.0

Probabilistic-SVM cross-validation outcome with 11 structural classes.
Data: DBLP network.
20

Matching annotated communities

Which algorithms extract communities that most
closely resemble the structure of annotated
communities?

21

Repeat the preceding experiment, leaving out the class of annotated communities

Learn Algorithm 1

Probabilistic k-way
Algorithm 2
classifier

Algorithm N

22

Introduce the class of annotated communities in the test set

Classify

Probabilistic k-way
classifier

23

Introduce the class of annotated communities in the test set

Classify

Probabilistic k-way
classifier

...
Pr(Algorithm k) = 0.12

23

Classiﬁcation reveals that annotated resemble unstructured methods

grad BFS

RW0
Ugrad

RW15
SC

AB
HS
Network

IM
Fly
LC

Amazon
Louv.

DBLP
Newm.

LJ1 MCL

LJ2 Metis

0.0 0.2 0.4 0.6 0.8 1.0

Probabilistic-SVM classification of annotated communities into 11
structural classes structural class for 9 different networks.

24

Improving the quality of the space

What classes should we consider?

25

The classiﬁer confuses the two types of RW communities

BFS BFS

RW0 RW0

RW15 RW15

AB AB
Structural Class

IM IM

LC LC

Louv. Louv.

Newm. Newm.

MCL MCL

Metis Metis

Ann. Ann

0.0 0.2 0.4 0.6 0.8 1.0

Data: DBLP network.
26

Fisher’s discriminant ratio

A Separability Framework for Analyzing Community Structure, Bruno Abrahao, Sucheta Soundarajan, John Hopcroft, Robert Kleinberg,
To appear in ACM Transactions on Knowledge Discovery from Data (TKDD), 2013

27


BFS BFS

RW0 RW0

RW15 RW15

AB AB
Structural Class

IM IM

LC LC

Louv. Louv.

Newm. Newm.

MCL MCL

Metis Metis

Ann. Ann

0.0 0.2 0.4 0.6 0.8 1.0

Data: DBLP network.
28

Structural Class

29


grad BFS

RW0
Ugrad

RW15
SC

AB
HS
Network

IM
Fly
LC

Amazon
Louv.

DBLP
Newm.

LJ1 MCL

LJ2 Metis

0.0 0.2 0.4 0.6 0.8 1.0

Probabilistic-SVM classification of annotated communities into 11
structural classes structural class for 9 different networks.
30

Network

31

Can we reveal latent similarities among
community detection algorithms?

Our framework enables one to cluster algorithms that behave
similarly

32

Step 1: identifying the most important features

7 features out of 36 retain the discriminative power of the full set

33

Grouping algorithms by their tendencies
with respect to most discriminative features

High

Medium

Low

34

Conclusion of methodology

• We present a methodology to address the complexity of
analyzing community structure, which simultaneously considers
– large number of algorithms

– multiple domains of application

– a broad spectrum of metrics to characterize community structure

• A scalable framework that enables
– researchers to compare and understand biases of new and existing community
detection algorithms

– practitioners to decide on the most suitable algorithm for particular purpose and
network

35

Conclusion of experimental analysis

• Our experimental analysis, which include 10 community
detection algorithms and 9 different networks analyzed with
36 properties reveals

– High variability among the output of community detection methods
– Annotated communities have a distinct structure from what we
expect
• their structure is closer to the output of baseline procedures than to that of popular
algorithms
– A small set of features explain the biases produced by different
algorithms
– We can organize the tapestry of available community detection
algorithms by grouping them with respect to similarities in behavior

36

Final remarks on future directions

• Traditional methods are unsupervised
– they find a particular type of community
– little sensitivity to different purposes, structures of interest and domains of
application

• Our approach suggests a supervised approach to
community detection
– user specifies what they intended to find through examples (real or synthetic)
– algorithm learns from those examples and retrieves similar structures in the
network

37

Thank you!

On the Separability of
Structural Classes of Communities
Bruno Abrahao
Sucheta Soundarajan
John Hopcroft
Robert Kleinberg Cornell University
38

On the Separability of Structural Classes of Communities

Recomendados

Recomendados

Más contenido relacionado

Similar a On the Separability of Structural Classes of Communities

Similar a On the Separability of Structural Classes of Communities (20)

Último

Último (20)

On the Separability of Structural Classes of Communities

Notas del editor