SlideShare una empresa de Scribd logo
1 de 57
Descargar para leer sin conexión
2
3
4
5
Community
Detection
Social Networks can be represented in
graphs
Nodes correspond to individuals
Edges represent interaction among them
A community can be defined as a group
of entities that share similar properties
6
7
8
10
Compute the distance
between all vertices
and communities
Choose two communities
based on their similarity
Update the distance
between communities
Merge these two
communities into a new
community
Walk-Trap
11
12
13
Modularity is based on the idea that a random graph is not expected to have a
community structure
𝑀𝑜𝑑𝑢𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑄 =
1
2𝑚 𝑖𝑗
(𝐴𝑖𝑗 −
𝑑𝑖 𝑑𝑗
2𝑚
)𝛿(𝐶𝑖, 𝐶𝑗)
A: Adjacency Matrix m: the total number of edges in the network
𝑑𝑖: degree of node i
𝛿(𝐶𝑖, 𝐶𝑗) =
1, 𝐶𝑖 = 𝐶𝑗
0, 𝐶𝑖 ≠ 𝐶𝑗
The choice of null model is in principle arbitrary, and several possibilities exist
𝑸 𝒎𝒂𝒙 = 𝐦𝐚𝐱
𝒑
𝒄=𝟏
𝒏 𝒄
𝑰 𝒄
𝒎
−
𝒅 𝒄
𝟐𝒎
𝟐
=
𝟏
𝒎
𝐦𝐚𝐱
𝒑 𝒄=𝟏
𝒏 𝒄
𝑰(𝑪) − 𝑬𝒙(𝑰 𝒄 )
=-
𝟏
𝒎
𝐦𝒊𝒏
𝒑
− 𝒄=𝟏
𝒏 𝒄
𝑰(𝑪) − 𝑬𝒙(𝑰 𝒄 )
=-
𝟏
𝒎
𝐦𝒊𝒏
𝒑
𝒎 − 𝒄=𝟏
𝒏 𝒄
𝑰(𝒄) − 𝒎 − 𝒄=𝟏
𝒏 𝒄
𝑬𝒙(𝑰 𝒄 )
=-
𝟏
𝒎
𝐦𝒊𝒏
𝒑
( 𝑪𝒖𝒕 𝒑 − 𝑬𝒙𝑪𝒖𝒕 𝒑)
Intra-community edges
14
Inter-community edges
15
Each node is assigned
to its own community
The algorithm repeatedly merges
pairs of communities together
Repeat the procedure until
only one community
remains
Choose the merger for which the
resulting modularity is the
largest.
FastQ
16
Proposed
Method
17
A majority of community detection methods try
to optimize a global metric
Several of methods need initial parameters to
find out the problems
A centralized decision maker has been
proposed by most of the algorithms
A distributed framework has been proposed
to detect social networks communities
Each community acts as a selfish agent
to maximize its utility function
We use local utility maximization
Modularity has been chosen as the
community utility function
18
Each community just uses local information
to maximize its utility function
Each community has some pre-defined
actions
Each community chooses the best action in
order to have maximum utility
Our distributed framework can perform as
well as the existing centralized approaches
19
Local information is used to identify communities
Every community only utilizes the knowledge
obtained from its neighbors
Nodes belonging to a community fall into two
types:
1-Core Set(C): no node in C is linked to the
outside of the community
2-Boundary Set(B): every node in B has at
least one connection to the outside of the
community
20
𝑸 =
𝑪=𝟏
𝒏 𝑪 𝑰(𝑪)
𝒎
−
𝑫 𝑪
𝟐𝒎
𝟐
𝐃 𝐂 = 𝐝 𝟏 + 𝐝 𝟐 + ⋯ + 𝐝 𝐧
𝟐 = 𝐝 𝟏
𝟐
+ 𝐝 𝟐
𝟐
+ ⋯ + 𝟐𝐝 𝟏 𝐝 𝟐 + ⋯ + 𝐝 𝐧
𝟐𝐈 𝐂 =
𝐢𝐣
𝐀 𝐢𝐣
𝑪
𝑰(𝑪)
𝑫 𝑪
𝒎
22
C1
C2
C
C
C1
C2
There exist 2
Possible Merge
for C1
𝑼 𝟏 =
𝑰(𝑪 𝟏)
𝒎
−
𝑫 𝑪 𝟏
𝟐𝒎
𝟐
=
𝟑
𝟏𝟏
−
𝟏𝟎
𝟐𝟐
𝟐
= 𝟎. 𝟎𝟔𝟔
𝑼 𝟑 =
𝑰(𝑪 𝟑)
𝒎
−
𝑫 𝑪 𝟑
𝟐𝒎
𝟐
=
𝟏
𝟏𝟏
−
𝟓
𝟐𝟐
𝟐
= 𝟎. 𝟎𝟑𝟗
𝑼 =
𝑰 𝑪 𝟏 + 𝑰 𝑪 𝟑 + 𝒙
𝒎
−
𝑫 𝑪 𝟏 + 𝑫 𝑪 𝟑
𝟐𝒎
𝟐
=
𝟕
𝟏𝟏
−
𝟏𝟓
𝟐𝟐
𝟐
= 𝟎. 𝟏𝟕𝟏
Suppose C1 is a player
Merging between C1 &
C3 occurs If and only if
𝐔 > 𝐔 𝟏 + 𝐔 𝟑 𝒙 >
𝑫 𝑪 𝟏 𝑫 𝑪 𝟑
𝟐𝒎
= 𝟑 >
𝟓𝟎
𝟐𝟐
= 𝟐. 𝟐𝟕
23
25
Our goal is to find a
division in which
modularity has been
maximized
𝐬𝐢 = −𝟏
C
C1
C2
𝐒𝐢 = +𝟏
𝐐 =
𝟏
𝟐𝐦 𝐢𝐣
(𝐀𝐢𝐣 −
𝐝𝐢 𝐝𝐣
𝟐𝐦
) 𝐬𝐢 𝐬𝐣
𝑸 =
𝟏
𝟐𝒎
𝒔 𝑻 𝑩𝒔
S is a vector whose
elements are 𝒔𝒊
𝑺 =
𝒊=𝟏
𝒏
𝜶𝒊 𝒖𝒊
𝒖𝒊 is ith Eigen vector
of B
B is a modularity
matrix whose
elements are:
𝑩𝒊𝒋 = (𝑨𝒊𝒋 −
𝒅𝒊 𝒅𝒋
𝟐𝒎
))
26
𝒙 >
𝑫 𝑪 𝟏 𝑫 𝑪 𝟐
𝟐𝒎
,
𝒙 <
𝑫 𝑪 𝟏 𝑫 𝑪 𝟐
𝟐𝒎
,
The proposed method may get stuck at a
local modularity
It may be possible that no community can improve itself
and also modularity is not maximized
27
28
U1 U2
U
𝑼 𝟏
′
𝑼 𝟐
′
𝐔 < 𝐔 𝟏 + 𝐔 𝟐
Merging between C1 and
C2 is irrational
𝐔 𝟏 + 𝐔 𝟐 < 𝐔′ 𝟏 + 𝐔′ 𝟐
But splitting of C is rational
29
U1 U2
x
C
𝑼 𝟐
′
x’
Irrational Merge
Split Condition
𝐔 𝟏 + 𝐔 𝟐 < 𝐔′ 𝟏 + 𝐔′ 𝟐
(𝒙′−𝒙)
𝒎
>
𝟐𝑫 𝒄 𝟏 𝑫 𝒄 𝟐 − 𝟐𝑫 𝒄 𝟏
′
𝑫(𝒄 𝟐
′
)
𝟒𝒎 𝟐
𝑫 𝑪 𝟏 + 𝑫 𝑪 𝟐 = 𝑫 𝑪 𝟏
′
+ 𝐃(𝐂 𝟐
′
) 𝑰 𝑪 𝟏 + 𝑰 𝑪 𝟐 + 𝒙 = 𝑰 𝑪 𝟏
′
+ 𝐈 𝐂 𝟐
′
+ 𝐱′
𝑼 𝟏
′
30
31
𝒙 >
𝑫 𝑪 𝟏 𝑫 𝑪 𝟐
𝟐𝒎
,
𝒙 <
𝑫 𝑪 𝟏 𝑫 𝑪 𝟐
𝟐𝒎
(𝒙′−𝒙)
𝒎
>
𝟐𝑫 𝒄 𝟏 𝑫 𝒄 𝟐 −𝟐𝑫 𝒄 𝟏
′
𝑫(𝒄 𝟐
′
)
𝟒𝒎 𝟐
32
Experimental
Results
DataSet Number of Nodes Number of edges
Karate 34 77
Risk 42 83
Dolphin 62 159
Politics 105 441
AdjNoun 112 425
Football 115 613
Jazz 198 2742
USAir97 332 2126
Email 1133 5452
Power 4941 6594
Internet 22960 48436 33
34
The community structures of the ground truth communities and those detected by 1st proposed
Method and 2nd proposed method on Zachary’s karate club network.
35
The community structures of the ground truth communities and those detected by 1st proposed
Method and 2nd proposed method on Dolphin Network.
36
The community structures of the ground truth communities and those detected by 1st proposed
Method and 2nd proposed method on NCCA Football Network.
37
The community structures of the 1st proposed Method and 2nd proposed method on Risk
Network.
38
The community structures of the 1st proposed Method and 2nd proposed method on
Politics Network.
0
1
2
3
4
5
6
Karate Risk Dolphin Politics AdjNoun Football Jazz USAir97 Email Power Internet
Rank
Dataset
Rank of Modularity per Dataset
39
DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method
Karate 0.252 0.36 0.255 0.399 0.4197 0.4197
Risk 0.624 0.624 0.624 0.626 0.631 0.637
Dolphin 0.341 0.517 0.365 0.511 0.509 0.529
Politics 0.447 0.524 0.527 0.494 0.52 0.527
AdjNoun 0.1845 0.229 0.259 0.286 0.272 0.306
Football 0.577 0.604 0.604 0.6045 0.6043 0.6045
Jazz 0.403 0.437 0.441 0.428 0.425 0.444
USAir97 0.29 0.315 0.363 0.351 0.356 0.366
Email 0.506 0.534 0.543 0.47 0.548 0.566
Power 0.447 0.886 0.932 0.64 0.933 0.939
Internet 0.472 0.647 0.646 0.574 0.588 0.6489
Modularity Obtained From Several Popular Approaches And
Our Proposed Method On Real World Networks
40
0
1
2
3
4
5
6
Karate Risk Dolphin Politics AdjNoun Football Jazz USAir97 Email Power
Rank
Dataset
Rank of Execution Time per Dataset
FastQ walktrap SLAP 1st Prposed Method 2nd Proposed Method
41
DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
Karate 77 77 45 31 39
Risk 93 84 38 18 90
Dolphin 211 117 63 54 141
Politics 414 197 88 107 314
AdjNoun 426 194 82 126 379
Football 350 190 100 152 380
Jazz 740 314 295 625 1100
USAir97 3600 497 211 1020 4200
Email 5452 1833 458 2042 8201
Power 31458 7153 762 9472 39763
42
43
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
Rank
Dataset
Rank of Modularity per Dataset(MU=0.3)
FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method
100 0.35 0.365 0.365 0.365 0.324 0.365
200 0.500 0.549 0.549 0.549 0.523 0.549
300 0.541 0.593 0.593 0.563 0.549 0.593
400 0.562 0.606 0.606 0.6058 0.58 0.606
500 0.574 0.613 0.613 0.613 0.602 0.613
600 0.597 0.608 0.608 0.587 0.589 0.608
700 0.591 0.612 0.612 0.612 0.604 0.612
800 0.59 0.613 0.613 0.613 0.596 0.613
900 0.595 0.611 0.611 0.610 0.579 0.613
1000 0.59 0.609 0.609 0.609 0.586 0.609
MODULARITY OBTAINED FROM SEVERAL POPULAR APPROACHES AND
OUR PROPOSED METHOD ON SYNTHETIC NETWORK(MU=0.3)
44
45
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
100 332 196 123 116 240
200 649 364 212 485 731
300 1162 480 390 780 1340
400 1284 682 486 992 2210
500 1555 873 685 1240 3406
600 2060 1041 1047 1570 4210
700 2776 1289 1358 1743 5378
800 2745 1580 1637 2020 6421
900 3980 1860 2438 2320 8745
1000 3565 2179 2599 2610 10255
The Execution Time From Several Popular Approaches And
Our Proposed Method On Synthetic Network(mu=0.3)
46
47
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
Rank
Dataset
Rank of Modularity per Dataset(Mu=0.5)
FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method
100 0.233 0.202 0.238 0.229 0.231 0.253
200 0.27 0.356 0.356 0.332 0.288 0.355
300 0.344 0.407 0.402 0.395 0.352 0.407
400 0.363 0.431 0.425 0.406 0.391 0.431
500 0.372 0.433 0.433 0.433 0.406 0.434
600 0.367 0.439 0.426 0.406 0.403 0.44
700 0.377 0.435 0.427 0.425 0.400 0.436
800 0.374 0.428 0.429 0.416 0.396 0.432
900 0.365 0.429 0.43 0.424 0.408 0.43
1000 0.375 0.436 0.431 0.435 0.415 0.436
48
MODULARITY OBTAINED FROM SEVERAL POPULAR APPROACHES AND
OUR PROPOSED METHOD ON SYNTHETIC NETWORK(MU=0.5)
49
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
Rank
Dataset
Rank of Execution Time per Dataset(Mu=0.5)
FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
100 354 191 102 91 221
200 723 377 199 463 621
300 886 472 392 720 1420
400 1288 677 544 1009 2451
500 1654 879 734 1120 3231
600 2180 1049 1061 1680 4621
700 2864 1295 1738 1920 5145
800 2848 1583 1896 2007 6352
900 3305 1867 2366 2247 8745
1000 3859 2172 2700 2670 11471
The Execution Time From Several Popular Approaches And
Our Proposed Method On Synthetic Network(mu=0.5)
50
51
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
Rank
Dataset
Rank of Modularity per Dataset(Mu=0.7)
FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method
100 0.234 0.196 0.244 0.242 0.23 0.254
200 0.168 0.144 0.178 0.154 0.159 0.179
300 0.155 0.174 0.189 0.166 0.141 0.19
400 0.169 0.239 0.236 0.231 0.177 0.232
500 0.180 0.247 0.245 0.238 0.204 0.247
600 0.181 0.257 0.255 0.202 0.206 0.257
700 0.184 0.26 0.254 0.236 0.229 0.259
800 0.182 0.259 0.255 0.231 0.233 0.259
900 0.185 0.262 0.258 0.252 0.23 0.262
1000 0.180 0.26 0.257 0.23 0.231 0.26
52
Modularity Obtained From Several Popular Approaches And
Our Proposed Method On Synthetic Network(mu=0.7)
53
0
1
2
3
4
5
6
100 200 300 400 500 600 700 800 900 1000
Rank
Dataset
Rank of Execution Time per Dataset(Mu=0.7)
FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
100 330 197 111 105 320
200 608 382 259 370 591
300 951 486 383 690 1345
400 1321 693 687 997 2684
500 1569 886 1045 1140 3354
600 2011 1085 1041 1620 4574
700 2198 1374 1492 1749 5354
800 2813 1541 1968 1984 6478
900 2836 1841 2408 2146 8894
1000 3823 2200 2618 2541 12577
The Execution Time From Several Popular Approaches And
Our Proposed Method On Synthetic Network(mu=0.7)
54
55
56













57
58
59

Más contenido relacionado

Similar a Final_Presentation

Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Thang Nguyen
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesAdnanHaider234505
 
Design of a Controller for MIMO System by using Approximate Model Matching (A...
Design of a Controller for MIMO System by using Approximate Model Matching (A...Design of a Controller for MIMO System by using Approximate Model Matching (A...
Design of a Controller for MIMO System by using Approximate Model Matching (A...Dr. Amarjeet Singh
 
Data Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsData Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsFormulatedby
 
Coupling Neural Networks to GCMs
Coupling Neural Networks to GCMsCoupling Neural Networks to GCMs
Coupling Neural Networks to GCMsNoah Brenowitz
 
Statistics & Decision Science for Agile - A Guided Tour
Statistics & Decision Science for Agile - A Guided TourStatistics & Decision Science for Agile - A Guided Tour
Statistics & Decision Science for Agile - A Guided TourSanjaya K Saxena
 
07 Statistical approaches to randomization
07 Statistical approaches to randomization07 Statistical approaches to randomization
07 Statistical approaches to randomizationdnac
 
Web Server Scheduling
Web Server SchedulingWeb Server Scheduling
Web Server SchedulingDavid Evans
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniquesShanmukha S. Potti
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Abdulrazak Zakieh
 
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...Koorosh Aslansefat
 
2013추계학술대회 인쇄용
2013추계학술대회 인쇄용2013추계학술대회 인쇄용
2013추계학술대회 인쇄용Byung Kook Ha
 
Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...Kolja Kleineberg
 

Similar a Final_Presentation (20)

Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...Overlapping community detection in Large-Scale Networks using BigCLAM model b...
Overlapping community detection in Large-Scale Networks using BigCLAM model b...
 
Convolution Neural Network Lecture Slides
Convolution Neural Network Lecture SlidesConvolution Neural Network Lecture Slides
Convolution Neural Network Lecture Slides
 
MNIST 10-class Classifiers
MNIST 10-class ClassifiersMNIST 10-class Classifiers
MNIST 10-class Classifiers
 
08 Statistical Models for Nets I, cross-section
08 Statistical Models for Nets I, cross-section08 Statistical Models for Nets I, cross-section
08 Statistical Models for Nets I, cross-section
 
community detection
community detectioncommunity detection
community detection
 
17 Statistical Models for Networks
17 Statistical Models for Networks17 Statistical Models for Networks
17 Statistical Models for Networks
 
Design of a Controller for MIMO System by using Approximate Model Matching (A...
Design of a Controller for MIMO System by using Approximate Model Matching (A...Design of a Controller for MIMO System by using Approximate Model Matching (A...
Design of a Controller for MIMO System by using Approximate Model Matching (A...
 
Data Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse GraphsData Science Salon: MCL Clustering of Sparse Graphs
Data Science Salon: MCL Clustering of Sparse Graphs
 
Coupling Neural Networks to GCMs
Coupling Neural Networks to GCMsCoupling Neural Networks to GCMs
Coupling Neural Networks to GCMs
 
Statistics & Decision Science for Agile - A Guided Tour
Statistics & Decision Science for Agile - A Guided TourStatistics & Decision Science for Agile - A Guided Tour
Statistics & Decision Science for Agile - A Guided Tour
 
07 Statistical approaches to randomization (2016)
07 Statistical approaches to randomization (2016)07 Statistical approaches to randomization (2016)
07 Statistical approaches to randomization (2016)
 
07 Statistical approaches to randomization
07 Statistical approaches to randomization07 Statistical approaches to randomization
07 Statistical approaches to randomization
 
Web Server Scheduling
Web Server SchedulingWeb Server Scheduling
Web Server Scheduling
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Time series data mining techniques
Time series data mining techniquesTime series data mining techniques
Time series data mining techniques
 
K means-1
K means-1K means-1
K means-1
 
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
 
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
SafeML: Safety Monitoring of Machine Learning Classifiers through Statistical...
 
2013추계학술대회 인쇄용
2013추계학술대회 인쇄용2013추계학술대회 인쇄용
2013추계학술대회 인쇄용
 
Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...Towards controlling evolutionary dynamics through network geometry: some very...
Towards controlling evolutionary dynamics through network geometry: some very...
 

Final_Presentation

  • 1.
  • 2. 2
  • 3. 3
  • 4. 4
  • 6. Social Networks can be represented in graphs Nodes correspond to individuals Edges represent interaction among them A community can be defined as a group of entities that share similar properties 6
  • 7. 7
  • 8. 8
  • 9. 10
  • 10. Compute the distance between all vertices and communities Choose two communities based on their similarity Update the distance between communities Merge these two communities into a new community Walk-Trap 11
  • 11. 12
  • 12. 13 Modularity is based on the idea that a random graph is not expected to have a community structure 𝑀𝑜𝑑𝑢𝑙𝑎𝑟𝑖𝑡𝑦 = 𝑄 = 1 2𝑚 𝑖𝑗 (𝐴𝑖𝑗 − 𝑑𝑖 𝑑𝑗 2𝑚 )𝛿(𝐶𝑖, 𝐶𝑗) A: Adjacency Matrix m: the total number of edges in the network 𝑑𝑖: degree of node i 𝛿(𝐶𝑖, 𝐶𝑗) = 1, 𝐶𝑖 = 𝐶𝑗 0, 𝐶𝑖 ≠ 𝐶𝑗 The choice of null model is in principle arbitrary, and several possibilities exist
  • 13. 𝑸 𝒎𝒂𝒙 = 𝐦𝐚𝐱 𝒑 𝒄=𝟏 𝒏 𝒄 𝑰 𝒄 𝒎 − 𝒅 𝒄 𝟐𝒎 𝟐 = 𝟏 𝒎 𝐦𝐚𝐱 𝒑 𝒄=𝟏 𝒏 𝒄 𝑰(𝑪) − 𝑬𝒙(𝑰 𝒄 ) =- 𝟏 𝒎 𝐦𝒊𝒏 𝒑 − 𝒄=𝟏 𝒏 𝒄 𝑰(𝑪) − 𝑬𝒙(𝑰 𝒄 ) =- 𝟏 𝒎 𝐦𝒊𝒏 𝒑 𝒎 − 𝒄=𝟏 𝒏 𝒄 𝑰(𝒄) − 𝒎 − 𝒄=𝟏 𝒏 𝒄 𝑬𝒙(𝑰 𝒄 ) =- 𝟏 𝒎 𝐦𝒊𝒏 𝒑 ( 𝑪𝒖𝒕 𝒑 − 𝑬𝒙𝑪𝒖𝒕 𝒑) Intra-community edges 14 Inter-community edges
  • 14. 15 Each node is assigned to its own community The algorithm repeatedly merges pairs of communities together Repeat the procedure until only one community remains Choose the merger for which the resulting modularity is the largest. FastQ
  • 16. 17 A majority of community detection methods try to optimize a global metric Several of methods need initial parameters to find out the problems A centralized decision maker has been proposed by most of the algorithms
  • 17. A distributed framework has been proposed to detect social networks communities Each community acts as a selfish agent to maximize its utility function We use local utility maximization Modularity has been chosen as the community utility function 18
  • 18. Each community just uses local information to maximize its utility function Each community has some pre-defined actions Each community chooses the best action in order to have maximum utility Our distributed framework can perform as well as the existing centralized approaches 19
  • 19. Local information is used to identify communities Every community only utilizes the knowledge obtained from its neighbors Nodes belonging to a community fall into two types: 1-Core Set(C): no node in C is linked to the outside of the community 2-Boundary Set(B): every node in B has at least one connection to the outside of the community 20
  • 20. 𝑸 = 𝑪=𝟏 𝒏 𝑪 𝑰(𝑪) 𝒎 − 𝑫 𝑪 𝟐𝒎 𝟐 𝐃 𝐂 = 𝐝 𝟏 + 𝐝 𝟐 + ⋯ + 𝐝 𝐧 𝟐 = 𝐝 𝟏 𝟐 + 𝐝 𝟐 𝟐 + ⋯ + 𝟐𝐝 𝟏 𝐝 𝟐 + ⋯ + 𝐝 𝐧 𝟐𝐈 𝐂 = 𝐢𝐣 𝐀 𝐢𝐣 𝑪 𝑰(𝑪) 𝑫 𝑪 𝒎
  • 22. There exist 2 Possible Merge for C1 𝑼 𝟏 = 𝑰(𝑪 𝟏) 𝒎 − 𝑫 𝑪 𝟏 𝟐𝒎 𝟐 = 𝟑 𝟏𝟏 − 𝟏𝟎 𝟐𝟐 𝟐 = 𝟎. 𝟎𝟔𝟔 𝑼 𝟑 = 𝑰(𝑪 𝟑) 𝒎 − 𝑫 𝑪 𝟑 𝟐𝒎 𝟐 = 𝟏 𝟏𝟏 − 𝟓 𝟐𝟐 𝟐 = 𝟎. 𝟎𝟑𝟗 𝑼 = 𝑰 𝑪 𝟏 + 𝑰 𝑪 𝟑 + 𝒙 𝒎 − 𝑫 𝑪 𝟏 + 𝑫 𝑪 𝟑 𝟐𝒎 𝟐 = 𝟕 𝟏𝟏 − 𝟏𝟓 𝟐𝟐 𝟐 = 𝟎. 𝟏𝟕𝟏 Suppose C1 is a player Merging between C1 & C3 occurs If and only if 𝐔 > 𝐔 𝟏 + 𝐔 𝟑 𝒙 > 𝑫 𝑪 𝟏 𝑫 𝑪 𝟑 𝟐𝒎 = 𝟑 > 𝟓𝟎 𝟐𝟐 = 𝟐. 𝟐𝟕 23
  • 23. 25 Our goal is to find a division in which modularity has been maximized 𝐬𝐢 = −𝟏 C C1 C2 𝐒𝐢 = +𝟏 𝐐 = 𝟏 𝟐𝐦 𝐢𝐣 (𝐀𝐢𝐣 − 𝐝𝐢 𝐝𝐣 𝟐𝐦 ) 𝐬𝐢 𝐬𝐣 𝑸 = 𝟏 𝟐𝒎 𝒔 𝑻 𝑩𝒔 S is a vector whose elements are 𝒔𝒊 𝑺 = 𝒊=𝟏 𝒏 𝜶𝒊 𝒖𝒊 𝒖𝒊 is ith Eigen vector of B B is a modularity matrix whose elements are: 𝑩𝒊𝒋 = (𝑨𝒊𝒋 − 𝒅𝒊 𝒅𝒋 𝟐𝒎 ))
  • 24. 26 𝒙 > 𝑫 𝑪 𝟏 𝑫 𝑪 𝟐 𝟐𝒎 , 𝒙 < 𝑫 𝑪 𝟏 𝑫 𝑪 𝟐 𝟐𝒎 ,
  • 25. The proposed method may get stuck at a local modularity It may be possible that no community can improve itself and also modularity is not maximized 27
  • 26. 28 U1 U2 U 𝑼 𝟏 ′ 𝑼 𝟐 ′ 𝐔 < 𝐔 𝟏 + 𝐔 𝟐 Merging between C1 and C2 is irrational 𝐔 𝟏 + 𝐔 𝟐 < 𝐔′ 𝟏 + 𝐔′ 𝟐 But splitting of C is rational
  • 27. 29 U1 U2 x C 𝑼 𝟐 ′ x’ Irrational Merge Split Condition 𝐔 𝟏 + 𝐔 𝟐 < 𝐔′ 𝟏 + 𝐔′ 𝟐 (𝒙′−𝒙) 𝒎 > 𝟐𝑫 𝒄 𝟏 𝑫 𝒄 𝟐 − 𝟐𝑫 𝒄 𝟏 ′ 𝑫(𝒄 𝟐 ′ ) 𝟒𝒎 𝟐 𝑫 𝑪 𝟏 + 𝑫 𝑪 𝟐 = 𝑫 𝑪 𝟏 ′ + 𝐃(𝐂 𝟐 ′ ) 𝑰 𝑪 𝟏 + 𝑰 𝑪 𝟐 + 𝒙 = 𝑰 𝑪 𝟏 ′ + 𝐈 𝐂 𝟐 ′ + 𝐱′ 𝑼 𝟏 ′
  • 28. 30
  • 29. 31 𝒙 > 𝑫 𝑪 𝟏 𝑫 𝑪 𝟐 𝟐𝒎 , 𝒙 < 𝑫 𝑪 𝟏 𝑫 𝑪 𝟐 𝟐𝒎 (𝒙′−𝒙) 𝒎 > 𝟐𝑫 𝒄 𝟏 𝑫 𝒄 𝟐 −𝟐𝑫 𝒄 𝟏 ′ 𝑫(𝒄 𝟐 ′ ) 𝟒𝒎 𝟐
  • 31. DataSet Number of Nodes Number of edges Karate 34 77 Risk 42 83 Dolphin 62 159 Politics 105 441 AdjNoun 112 425 Football 115 613 Jazz 198 2742 USAir97 332 2126 Email 1133 5452 Power 4941 6594 Internet 22960 48436 33
  • 32. 34 The community structures of the ground truth communities and those detected by 1st proposed Method and 2nd proposed method on Zachary’s karate club network.
  • 33. 35 The community structures of the ground truth communities and those detected by 1st proposed Method and 2nd proposed method on Dolphin Network.
  • 34. 36 The community structures of the ground truth communities and those detected by 1st proposed Method and 2nd proposed method on NCCA Football Network.
  • 35. 37 The community structures of the 1st proposed Method and 2nd proposed method on Risk Network.
  • 36. 38 The community structures of the 1st proposed Method and 2nd proposed method on Politics Network.
  • 37. 0 1 2 3 4 5 6 Karate Risk Dolphin Politics AdjNoun Football Jazz USAir97 Email Power Internet Rank Dataset Rank of Modularity per Dataset 39
  • 38. DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method Karate 0.252 0.36 0.255 0.399 0.4197 0.4197 Risk 0.624 0.624 0.624 0.626 0.631 0.637 Dolphin 0.341 0.517 0.365 0.511 0.509 0.529 Politics 0.447 0.524 0.527 0.494 0.52 0.527 AdjNoun 0.1845 0.229 0.259 0.286 0.272 0.306 Football 0.577 0.604 0.604 0.6045 0.6043 0.6045 Jazz 0.403 0.437 0.441 0.428 0.425 0.444 USAir97 0.29 0.315 0.363 0.351 0.356 0.366 Email 0.506 0.534 0.543 0.47 0.548 0.566 Power 0.447 0.886 0.932 0.64 0.933 0.939 Internet 0.472 0.647 0.646 0.574 0.588 0.6489 Modularity Obtained From Several Popular Approaches And Our Proposed Method On Real World Networks 40
  • 39. 0 1 2 3 4 5 6 Karate Risk Dolphin Politics AdjNoun Football Jazz USAir97 Email Power Rank Dataset Rank of Execution Time per Dataset FastQ walktrap SLAP 1st Prposed Method 2nd Proposed Method 41
  • 40. DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method Karate 77 77 45 31 39 Risk 93 84 38 18 90 Dolphin 211 117 63 54 141 Politics 414 197 88 107 314 AdjNoun 426 194 82 126 379 Football 350 190 100 152 380 Jazz 740 314 295 625 1100 USAir97 3600 497 211 1020 4200 Email 5452 1833 458 2042 8201 Power 31458 7153 762 9472 39763 42
  • 41. 43 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000 Rank Dataset Rank of Modularity per Dataset(MU=0.3) FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
  • 42. DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method 100 0.35 0.365 0.365 0.365 0.324 0.365 200 0.500 0.549 0.549 0.549 0.523 0.549 300 0.541 0.593 0.593 0.563 0.549 0.593 400 0.562 0.606 0.606 0.6058 0.58 0.606 500 0.574 0.613 0.613 0.613 0.602 0.613 600 0.597 0.608 0.608 0.587 0.589 0.608 700 0.591 0.612 0.612 0.612 0.604 0.612 800 0.59 0.613 0.613 0.613 0.596 0.613 900 0.595 0.611 0.611 0.610 0.579 0.613 1000 0.59 0.609 0.609 0.609 0.586 0.609 MODULARITY OBTAINED FROM SEVERAL POPULAR APPROACHES AND OUR PROPOSED METHOD ON SYNTHETIC NETWORK(MU=0.3) 44
  • 43. 45 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000
  • 44. DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method 100 332 196 123 116 240 200 649 364 212 485 731 300 1162 480 390 780 1340 400 1284 682 486 992 2210 500 1555 873 685 1240 3406 600 2060 1041 1047 1570 4210 700 2776 1289 1358 1743 5378 800 2745 1580 1637 2020 6421 900 3980 1860 2438 2320 8745 1000 3565 2179 2599 2610 10255 The Execution Time From Several Popular Approaches And Our Proposed Method On Synthetic Network(mu=0.3) 46
  • 45. 47 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000 Rank Dataset Rank of Modularity per Dataset(Mu=0.5) FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
  • 46. DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method 100 0.233 0.202 0.238 0.229 0.231 0.253 200 0.27 0.356 0.356 0.332 0.288 0.355 300 0.344 0.407 0.402 0.395 0.352 0.407 400 0.363 0.431 0.425 0.406 0.391 0.431 500 0.372 0.433 0.433 0.433 0.406 0.434 600 0.367 0.439 0.426 0.406 0.403 0.44 700 0.377 0.435 0.427 0.425 0.400 0.436 800 0.374 0.428 0.429 0.416 0.396 0.432 900 0.365 0.429 0.43 0.424 0.408 0.43 1000 0.375 0.436 0.431 0.435 0.415 0.436 48 MODULARITY OBTAINED FROM SEVERAL POPULAR APPROACHES AND OUR PROPOSED METHOD ON SYNTHETIC NETWORK(MU=0.5)
  • 47. 49 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000 Rank Dataset Rank of Execution Time per Dataset(Mu=0.5) FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
  • 48. DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method 100 354 191 102 91 221 200 723 377 199 463 621 300 886 472 392 720 1420 400 1288 677 544 1009 2451 500 1654 879 734 1120 3231 600 2180 1049 1061 1680 4621 700 2864 1295 1738 1920 5145 800 2848 1583 1896 2007 6352 900 3305 1867 2366 2247 8745 1000 3859 2172 2700 2670 11471 The Execution Time From Several Popular Approaches And Our Proposed Method On Synthetic Network(mu=0.5) 50
  • 49. 51 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000 Rank Dataset Rank of Modularity per Dataset(Mu=0.7) FastQ walktrap Laplacian SLAP 1st Prposed Method 2nd Proposed Method
  • 50. DataSet FastQ walktrap Laplacian SLAP 1st Proposed Method 2nd Proposed Method 100 0.234 0.196 0.244 0.242 0.23 0.254 200 0.168 0.144 0.178 0.154 0.159 0.179 300 0.155 0.174 0.189 0.166 0.141 0.19 400 0.169 0.239 0.236 0.231 0.177 0.232 500 0.180 0.247 0.245 0.238 0.204 0.247 600 0.181 0.257 0.255 0.202 0.206 0.257 700 0.184 0.26 0.254 0.236 0.229 0.259 800 0.182 0.259 0.255 0.231 0.233 0.259 900 0.185 0.262 0.258 0.252 0.23 0.262 1000 0.180 0.26 0.257 0.23 0.231 0.26 52 Modularity Obtained From Several Popular Approaches And Our Proposed Method On Synthetic Network(mu=0.7)
  • 51. 53 0 1 2 3 4 5 6 100 200 300 400 500 600 700 800 900 1000 Rank Dataset Rank of Execution Time per Dataset(Mu=0.7) FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method
  • 52. DataSet FastQ walktrap SLAP 1st Proposed Method 2nd Proposed Method 100 330 197 111 105 320 200 608 382 259 370 591 300 951 486 383 690 1345 400 1321 693 687 997 2684 500 1569 886 1045 1140 3354 600 2011 1085 1041 1620 4574 700 2198 1374 1492 1749 5354 800 2813 1541 1968 1984 6478 900 2836 1841 2408 2146 8894 1000 3823 2200 2618 2541 12577 The Execution Time From Several Popular Approaches And Our Proposed Method On Synthetic Network(mu=0.7) 54
  • 53. 55
  • 54. 56
  • 56. 58
  • 57. 59