SlideShare una empresa de Scribd logo
1 de 24
The 17th International Conference on Database Systems for Advanced Applications, Busan, South Korea.
   The 3rd International Workshop on Social Networks and Social Web Mining*




      Collaborative Similarity Measure
        for Intra-Graph Clustering*
                 Waqas Nawaz, Young-Koo Lee, Sungyoung Lee
         Department of Computer Engineering, Kyung Hee University, Korea




                                      Thursday, April 19, 2012


                                             Presenter
                                       Waqas Nawaz

                Data and Knowledge Engineering (DKE) Lab, Kyung Hee University Korea
Agenda


            Motivation


                   Related Work


                     Proposed Method (CSM-IGC)


                    Experiments


              Conclusion & Future Directions




Data & Knowledge Engineering Lab                 2
Graphs with Multiple Attributes




                                                          Attribute of Authors



        Coauthor Network of Top 200 Authors on TEL from DBLP
                                      from manyeyes.alphaworks.ibm.com

Data & Knowledge Engineering Lab                                          3
Related Work

 Structure based clustering
    Normalized cuts [Shi and Malik, TPAMI 2000]
    Modularity [Newman and Girvan, Phys. Rev. 2004]
    Scan [Xu et al., KDD'07]
   The clusters generated have a rather random distribution of
   vertex properties within clusters


 OLAP-style graph aggregation
    K-SNAP [Tian et al., SIGMOD’08]
    Attributes compatible grouping
   The clusters generated have a rather loose intra-cluster
   structure


    Data & Knowledge Engineering Lab                             4
Example: A Coauthor Network
                                                 r1. XML




                                                                                      *https://wiki.engr.illinois.edu/download/attachments/186384385/VLDB09_notes.ppt
                              r3. XML, Skyline             r2. XML



                                                       r4. XML


                                                 r5. XML
                                                                     r6. XML
                       r9. Skyline




        r10. Skyline             r11. Skyline              r7. XML      r8. XML

                                Attribute-based Cluster
                                Structure-basedCluster
                                Traditional Coauthor graph
                                Structural/Attribute Cluster

Data & Knowledge Engineering Lab                                                  5
Related Work (cont…)

 Structure/Attribute based clustering
    SA-Cluster [Yang Zhou et al., VLDB’ 2009]
       • Modify the structure of the original graph
             – add dummy vertex w.r.t each attribute instance
             – Sparse matrix and space inefficient
       • Neighborhood random walk: Matrix multiplication is performed
         iteratively




       • Fixed edge weights, and automatically update attribute weights

   Scalability issue for medium & large graphs (time complexity)



    Data & Knowledge Engineering Lab                                      6
Two-Fold Objective

 A desired clustering of attributed graph should achieve
  a good balance between the following:

    Structural cohesiveness: Vertices within one cluster are
     close to each other in terms of structure, while vertices
     between clusters are distant from each other

    Attribute homogeneity: Vertices within one cluster have
     similar attribute values, while vertices between clusters have
     quite different attribute values

 And it should be Scalable to medium scale graphs

    Data & Knowledge Engineering Lab                             7
Different Graph Clustering Approaches

 Structure-based Clustering
    Vertices with heterogeneous values in a cluster

 Attribute-based Clustering
    Lose much structure information

 Structural/Attribute Cluster
    Homogeneous vertices along structure information at the
     expense time complexity

 Intra-Graph Clustering
    Scalable while considering both aspects
    Data & Knowledge Engineering Lab                           8
Proposed Solution

 System Architecture Diagram




 INPUT                                 Processing Phase   OUTPUT

    Data & Knowledge Engineering Lab                               9
Phase 1

         Similarity Estimation (Inspired by Jaccard Index1)
                 Interaction of vertices (topology or structure)
                        • Weighted fraction of shared neighbors




                        • It will be zero for disconnected vertices
                        • Example: Structural similarity among
                                –   SIM(V1, V2) = (1/3)*5 = 1.667
                                –   SIM(V1, V3) = (1/4)*4 = 1.0
                                –   SIM(V2, V3) = (1/4)*3 = 0.75
                                –   V1 & V4 = (1/4)*0 = 0.0
                        • Transitive Property…!
                                – SIM(V1, V4) = SIM(V1,V3) * SIM(V3,V4)
1P.   Jaccard, Etude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura., Soci`et`e Vaudoise des Sciences Naturelles, Vol.37, (1901)

                   Data & Knowledge Engineering Lab                                                                                                    10
Transitive Property






    Data & Knowledge Engineering Lab             11
Phase 1 (cont…)

         Similarity Estimation (Inspired by Jaccard Index1)
                 Context of vertices (attributes regularity)
                        • Weighted fraction of shared attributes instances




                        • It will be zero for contextually disjoint vertices
                        • Example: Contextual similarity among
                                –   Lets Wa1 = 1 and Wa2 = 2 then
                                –   SIM(V1, V3) = (2/2) = 1.0
                                –   SIM(V3, V4) = (1/2) = 0.5
                                –   V1 & V4 = 0.0



1P.   Jaccard, Etude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura., Soci`et`e Vaudoise des Sciences Naturelles, Vol.37, (1901)

                   Data & Knowledge Engineering Lab                                                                                                    12
Collaborative Similarity Measure

 Structural


 Contextual


 Collaborative Measure




    Data & Knowledge Engineering Lab          13
Phase 2

 Clustering (K-Medoid Approach)




    Data & Knowledge Engineering Lab             14
Algorithm Details




Single Pass                                           Similarity Calculation




 Iterative
                                                               Node
                                                               Clustering




        Data & Knowledge Engineering Lab                                 15
Example

Fig. 3. Scenarios for similarity
between source (green) and
destination(red)          nodes
following some intermediate
nodes (yellow) (a) No direct
path     exist    (b)    Directly
connected      (c)    In-directly
connected, shortest path                                      (a)                                 (b)                               (c)




Table 2. (a) Collaborative                             vertex V1 V2 V3 V4 V5 V6
                                                                                                   K       Clustered Vertices     Density Entropy
Similarity among vertices given                         V1    1     2.67 1.17 0.20 0.18 0.18
in Fig. 3-c using Collaborative
                                      𝐂𝐒𝐢𝐦 𝒗 𝒂 , 𝒗 𝒃




                                                                                                   2    {V1,V2,V3},{V4,V5,V6}      0.42    0.133
                                                        V2   2.67    1    0.92 0.15 0.14 0.14
Similarity    Measure,       (b)
                                                        V3   1.17 0.92         1 0.17 0.15 0.15
Clustering results by varying                                                                      3 {V1,V3},{V2},{V4,V5,V6}       0.28    0.084
                                                        V4   0.2 0.15 0.17 1 0.92 0.92
number of clusters (K), quality
of each measure is calculated                           V5   0.18 0.14 0.15 0.92 1         2.5     4 {V5},{V6},{V4},{V1,V2,V3}     0.21    0.084
using Density and Entropy                               V6   0.18 0.14 0.15 0.92 2.5        1
                                                                         (a)                                                (b)


        Data & Knowledge Engineering Lab                                                                                                     16
Experiments

 Real Dataset
   Political Blogs Dataset: 1490 vertices, 19090 edges, one
    attribute political leaning
       • Liberal
       • Conservative


 Methods
   K-SNAP: Attributes only
   S-Cluster: Structure-based clustering
   W-Cluster: Weighted random walk strategy
   SA-Cluster: Consider both factors (matrix manipulation)
   IGC-CSM: Our proposed method


    Data & Knowledge Engineering Lab                           17
Evaluation Metrics

  Density*: intra-cluster structural cohesiveness




  Entropy*: intra-cluster attribute homogeneity




*Yang Zhou et al.,Graph Clustering Based on Structural/Attribute Similarities,Proceedings of   VLDB Endowment,France (2009)

           Data & Knowledge Engineering Lab                                                                           18
Evaluation Metrics (cont…)

  F-Measure*: has the ability to evaluate the collective
   qualitative nature of the formed cluster




*Tijn Witsenburg et al., Improving the Accuracy of Similarity Measures by Using Link Information, International Symposium on
                                 Methodologies for Intelligent Systems Edition 9, Poland (2011)

          Data & Knowledge Engineering Lab                                                                              19
Results (Time Complexity)

 Synthetic Dataset                    Graph size vs. time
    Varying No. of Node




                                                                  *http://www-personal.umich.edu/mejn/netdata
 Real Dataset
    Political Blog*
    No. of Clusters vs. Time



    Data & Knowledge Engineering Lab                         20
Results (Quality)

 Density Evaluation
    Clusters vs. Density Value




 Entropy Evaluation
    Clusters vs. Entropy Value




    Data & Knowledge Engineering Lab              21
Results (Quality)

 F-Measure Estimation
   Clusters vs. F-measure Value




    Data & Knowledge Engineering Lab              22
Conclusion

 We study the problem of graph node clustering based
  on homogeneous characteristics in terms of context
  and topology
    collaborative similarity measure to reflect the relational
     model among pair of vertices
    k-Medoid clustering framework is adopted for grouping
     similar nodes
 The resulting solution is estimated using state of the
  art evaluation measures:
    Density, Entropy, and F-measure
 Comparatively scalable to medium scale graphs
  without compromising on the quality of results
    Data & Knowledge Engineering Lab                              23
Thanks
                Any wicky786@khu.ac.kr
                    Question…?
                              wicky786@khu.ac.kr
                                yklee@khu.ac.kr
                                yklee@khu.ac.kr
                             sylee@oslab.khu.ac.kr
                             sylee@oslab.khu.ac.kr




Data & Knowledge Engineering Lab                     24

Más contenido relacionado

La actualidad más candente

Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Scientific Review
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian ProcessHa Phuong
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsNAVER Engineering
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2maikroeder
 
Duchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedDuchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedKalle
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTIONijcses
 
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMPERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMIJCNCJournal
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognitionzukun
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkAlpine Data
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionActiveEon
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Christopher Morris
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Christopher Morris
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationChristopher Morris
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...wl820609
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIWanjin Yu
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection aftab alam
 
Scafi: Scala with Computational Fields
Scafi: Scala with Computational FieldsScafi: Scala with Computational Fields
Scafi: Scala with Computational FieldsRoberto Casadei
 
Tuple-Based Coordination in Large-Scale Situated Systems
Tuple-Based Coordination in Large-Scale Situated SystemsTuple-Based Coordination in Large-Scale Situated Systems
Tuple-Based Coordination in Large-Scale Situated SystemsRoberto Casadei
 

La actualidad más candente (20)

Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
Classification of Iris Data using Kernel Radial Basis Probabilistic Neural Ne...
 
010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process010_20160216_Variational Gaussian Process
010_20160216_Variational Gaussian Process
 
Attentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernelsAttentive semantic alignment with offset aware correlation kernels
Attentive semantic alignment with offset aware correlation kernels
 
Introduction to ggplot2
Introduction to ggplot2Introduction to ggplot2
Introduction to ggplot2
 
Duchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison RevisitedDuchowski Scanpath Comparison Revisited
Duchowski Scanpath Comparison Revisited
 
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
A NOBEL HYBRID APPROACH FOR EDGE  DETECTIONA NOBEL HYBRID APPROACH FOR EDGE  DETECTION
A NOBEL HYBRID APPROACH FOR EDGE DETECTION
 
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHMPERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
PERFORMANCE AND COMPLEXITY ANALYSIS OF A REDUCED ITERATIONS LLL ALGORITHM
 
Manifold learning with application to object recognition
Manifold learning with application to object recognitionManifold learning with application to object recognition
Manifold learning with application to object recognition
 
Enterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using SparkEnterprise Scale Topological Data Analysis Using Spark
Enterprise Scale Topological Data Analysis Using Spark
 
Recent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detectionRecent advances on low-rank and sparse decomposition for moving object detection
Recent advances on low-rank and sparse decomposition for moving object detection
 
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks
 
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
Glocalized Weisfeiler-Lehman Graph Kernels: Global-Local Feature Maps of Graphs
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
Recent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph ClassificationRecent Advances in Kernel-Based Graph Classification
Recent Advances in Kernel-Based Graph Classification
 
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
Dimension Reduction And Visualization Of Large High Dimensional Data Via Inte...
 
Object Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet IIIObject Detection Beyond Mask R-CNN and RetinaNet III
Object Detection Beyond Mask R-CNN and RetinaNet III
 
Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection Compressing Graphs and Indexes with Recursive Graph Bisection
Compressing Graphs and Indexes with Recursive Graph Bisection
 
Scafi: Scala with Computational Fields
Scafi: Scala with Computational FieldsScafi: Scala with Computational Fields
Scafi: Scala with Computational Fields
 
Tuple-Based Coordination in Large-Scale Situated Systems
Tuple-Based Coordination in Large-Scale Situated SystemsTuple-Based Coordination in Large-Scale Situated Systems
Tuple-Based Coordination in Large-Scale Situated Systems
 

Similar a Collaborative Similarity Measure for Intra-Graph Clustering

High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingNesreen K. Ahmed
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationKAMAL CHOUDHARY
 
Semi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSemi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSymeon Papadopoulos
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupOscar Corcho
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Databricks
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Eiji Sekiya
 
Planning-Based Approach for Automating Sequence Diagram Generation
Planning-Based Approach for Automating Sequence Diagram GenerationPlanning-Based Approach for Automating Sequence Diagram Generation
Planning-Based Approach for Automating Sequence Diagram GenerationYaser Sulaiman
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptxthanhdowork
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...ssuser2624f71
 
用R语言做曲线拟合
用R语言做曲线拟合用R语言做曲线拟合
用R语言做曲线拟合Wenxiang Zhu
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesIan Foster
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)Hajime Sasaki
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingAlexander Schätzle
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Rob Emanuele
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discoveryaftab alam
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextSangmin Woo
 
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...Xiaoyu Shi
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterACSG Section Montréal
 

Similar a Collaborative Similarity Measure for Intra-Graph Clustering (20)

High-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and ModelingHigh-Performance Graph Analysis and Modeling
High-Performance Graph Analysis and Modeling
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
Semi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphsSemi-supervised concept detection by learning the structure of similarity graphs
Semi-supervised concept detection by learning the structure of similarity graphs
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Data Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering GroupData Integration at the Ontology Engineering Group
Data Integration at the Ontology Engineering Group
 
Graph Theory and Databases
Graph Theory and DatabasesGraph Theory and Databases
Graph Theory and Databases
 
Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™Web-Scale Graph Analytics with Apache® Spark™
Web-Scale Graph Analytics with Apache® Spark™
 
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
Semi-Supervised Classification with Graph Convolutional Networks @ICLR2017読み会
 
Planning-Based Approach for Automating Sequence Diagram Generation
Planning-Based Approach for Automating Sequence Diagram GenerationPlanning-Based Approach for Automating Sequence Diagram Generation
Planning-Based Approach for Automating Sequence Diagram Generation
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En..."Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
"Sparse Graph Attention Networks", IEEE Transactions on Knowledge and Data En...
 
用R语言做曲线拟合
用R语言做曲线拟合用R语言做曲线拟合
用R语言做曲线拟合
 
RAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme ScalesRAMSES: Robust Analytic Models for Science at Extreme Scales
RAMSES: Robust Analytic Models for Science at Extreme Scales
 
論文サーベイ(Sasaki)
論文サーベイ(Sasaki)論文サーベイ(Sasaki)
論文サーベイ(Sasaki)
 
Map-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP ProcessingMap-Side Merge Joins for Scalable SPARQL BGP Processing
Map-Side Merge Joins for Scalable SPARQL BGP Processing
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Locally densest subgraph discovery
Locally densest subgraph discoveryLocally densest subgraph discovery
Locally densest subgraph discovery
 
Neural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global contextNeural motifs scene graph parsing with global context
Neural motifs scene graph parsing with global context
 
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
CSBP: A Fast Circuit Similarity-Based Placement for FPGA Incremental Design a...
 
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT RasterLe projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
Le projet “Canadian Spatial Data Foundry”: Introduction à PostGIS WKT Raster
 

Más de Waqas Nawaz

Design and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewDesign and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewWaqas Nawaz
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphsWaqas Nawaz
 
(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...Waqas Nawaz
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...Waqas Nawaz
 
Andrewng webinar moocs
Andrewng webinar moocsAndrewng webinar moocs
Andrewng webinar moocsWaqas Nawaz
 
Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Waqas Nawaz
 
Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Waqas Nawaz
 
Social Media and We
Social Media and WeSocial Media and We
Social Media and WeWaqas Nawaz
 
Social Media vs. Social Relationships
Social Media vs. Social RelationshipsSocial Media vs. Social Relationships
Social Media vs. Social RelationshipsWaqas Nawaz
 
Fourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperFourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperWaqas Nawaz
 
강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012Waqas Nawaz
 

Más de Waqas Nawaz (12)

Design and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract ViewDesign and analysis of algorithms - Abstract View
Design and analysis of algorithms - Abstract View
 
(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs(Icca 2014) shortest path analysis in social graphs
(Icca 2014) shortest path analysis in social graphs
 
(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...(Icmia 2013) personalized community detection using collaborative similarity ...
(Icmia 2013) personalized community detection using collaborative similarity ...
 
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
ICDE-2015 Shortest Path Traversal Optimization and Analysis for Large Graph C...
 
Andrewng webinar moocs
Andrewng webinar moocsAndrewng webinar moocs
Andrewng webinar moocs
 
Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014Oritentation session at Kyung Hee University for new students 2014
Oritentation session at Kyung Hee University for new students 2014
 
Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...Fast directional weighted median filter for removal of random valued impulse ...
Fast directional weighted median filter for removal of random valued impulse ...
 
Social Media and We
Social Media and WeSocial Media and We
Social Media and We
 
Social Media vs. Social Relationships
Social Media vs. Social RelationshipsSocial Media vs. Social Relationships
Social Media vs. Social Relationships
 
Fourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paperFourteen steps to a clearly written technical paper
Fourteen steps to a clearly written technical paper
 
Big data
Big dataBig data
Big data
 
강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012강의(영어) 한국의Smu(이재창)-2012
강의(영어) 한국의Smu(이재창)-2012
 

Último

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfNirmal Dwivedi
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentationcamerronhm
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

Último (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdfUGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
UGC NET Paper 1 Mathematical Reasoning & Aptitude.pdf
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

Collaborative Similarity Measure for Intra-Graph Clustering

  • 1. The 17th International Conference on Database Systems for Advanced Applications, Busan, South Korea. The 3rd International Workshop on Social Networks and Social Web Mining* Collaborative Similarity Measure for Intra-Graph Clustering* Waqas Nawaz, Young-Koo Lee, Sungyoung Lee Department of Computer Engineering, Kyung Hee University, Korea Thursday, April 19, 2012 Presenter Waqas Nawaz Data and Knowledge Engineering (DKE) Lab, Kyung Hee University Korea
  • 2. Agenda Motivation Related Work Proposed Method (CSM-IGC) Experiments Conclusion & Future Directions Data & Knowledge Engineering Lab 2
  • 3. Graphs with Multiple Attributes Attribute of Authors Coauthor Network of Top 200 Authors on TEL from DBLP from manyeyes.alphaworks.ibm.com Data & Knowledge Engineering Lab 3
  • 4. Related Work  Structure based clustering  Normalized cuts [Shi and Malik, TPAMI 2000]  Modularity [Newman and Girvan, Phys. Rev. 2004]  Scan [Xu et al., KDD'07] The clusters generated have a rather random distribution of vertex properties within clusters  OLAP-style graph aggregation  K-SNAP [Tian et al., SIGMOD’08]  Attributes compatible grouping The clusters generated have a rather loose intra-cluster structure Data & Knowledge Engineering Lab 4
  • 5. Example: A Coauthor Network r1. XML *https://wiki.engr.illinois.edu/download/attachments/186384385/VLDB09_notes.ppt r3. XML, Skyline r2. XML r4. XML r5. XML r6. XML r9. Skyline r10. Skyline r11. Skyline r7. XML r8. XML Attribute-based Cluster Structure-basedCluster Traditional Coauthor graph Structural/Attribute Cluster Data & Knowledge Engineering Lab 5
  • 6. Related Work (cont…)  Structure/Attribute based clustering  SA-Cluster [Yang Zhou et al., VLDB’ 2009] • Modify the structure of the original graph – add dummy vertex w.r.t each attribute instance – Sparse matrix and space inefficient • Neighborhood random walk: Matrix multiplication is performed iteratively • Fixed edge weights, and automatically update attribute weights Scalability issue for medium & large graphs (time complexity) Data & Knowledge Engineering Lab 6
  • 7. Two-Fold Objective  A desired clustering of attributed graph should achieve a good balance between the following:  Structural cohesiveness: Vertices within one cluster are close to each other in terms of structure, while vertices between clusters are distant from each other  Attribute homogeneity: Vertices within one cluster have similar attribute values, while vertices between clusters have quite different attribute values  And it should be Scalable to medium scale graphs Data & Knowledge Engineering Lab 7
  • 8. Different Graph Clustering Approaches  Structure-based Clustering  Vertices with heterogeneous values in a cluster  Attribute-based Clustering  Lose much structure information  Structural/Attribute Cluster  Homogeneous vertices along structure information at the expense time complexity  Intra-Graph Clustering  Scalable while considering both aspects Data & Knowledge Engineering Lab 8
  • 9. Proposed Solution  System Architecture Diagram INPUT Processing Phase OUTPUT Data & Knowledge Engineering Lab 9
  • 10. Phase 1  Similarity Estimation (Inspired by Jaccard Index1)  Interaction of vertices (topology or structure) • Weighted fraction of shared neighbors • It will be zero for disconnected vertices • Example: Structural similarity among – SIM(V1, V2) = (1/3)*5 = 1.667 – SIM(V1, V3) = (1/4)*4 = 1.0 – SIM(V2, V3) = (1/4)*3 = 0.75 – V1 & V4 = (1/4)*0 = 0.0 • Transitive Property…! – SIM(V1, V4) = SIM(V1,V3) * SIM(V3,V4) 1P. Jaccard, Etude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura., Soci`et`e Vaudoise des Sciences Naturelles, Vol.37, (1901) Data & Knowledge Engineering Lab 10
  • 11. Transitive Property  Data & Knowledge Engineering Lab 11
  • 12. Phase 1 (cont…)  Similarity Estimation (Inspired by Jaccard Index1)  Context of vertices (attributes regularity) • Weighted fraction of shared attributes instances • It will be zero for contextually disjoint vertices • Example: Contextual similarity among – Lets Wa1 = 1 and Wa2 = 2 then – SIM(V1, V3) = (2/2) = 1.0 – SIM(V3, V4) = (1/2) = 0.5 – V1 & V4 = 0.0 1P. Jaccard, Etude Comparative de la Distribution Florale dans une Portion des Alpes et des Jura., Soci`et`e Vaudoise des Sciences Naturelles, Vol.37, (1901) Data & Knowledge Engineering Lab 12
  • 13. Collaborative Similarity Measure  Structural  Contextual  Collaborative Measure Data & Knowledge Engineering Lab 13
  • 14. Phase 2  Clustering (K-Medoid Approach) Data & Knowledge Engineering Lab 14
  • 15. Algorithm Details Single Pass Similarity Calculation Iterative Node Clustering Data & Knowledge Engineering Lab 15
  • 16. Example Fig. 3. Scenarios for similarity between source (green) and destination(red) nodes following some intermediate nodes (yellow) (a) No direct path exist (b) Directly connected (c) In-directly connected, shortest path (a) (b) (c) Table 2. (a) Collaborative vertex V1 V2 V3 V4 V5 V6 K Clustered Vertices Density Entropy Similarity among vertices given V1 1 2.67 1.17 0.20 0.18 0.18 in Fig. 3-c using Collaborative 𝐂𝐒𝐢𝐦 𝒗 𝒂 , 𝒗 𝒃 2 {V1,V2,V3},{V4,V5,V6} 0.42 0.133 V2 2.67 1 0.92 0.15 0.14 0.14 Similarity Measure, (b) V3 1.17 0.92 1 0.17 0.15 0.15 Clustering results by varying 3 {V1,V3},{V2},{V4,V5,V6} 0.28 0.084 V4 0.2 0.15 0.17 1 0.92 0.92 number of clusters (K), quality of each measure is calculated V5 0.18 0.14 0.15 0.92 1 2.5 4 {V5},{V6},{V4},{V1,V2,V3} 0.21 0.084 using Density and Entropy V6 0.18 0.14 0.15 0.92 2.5 1 (a) (b) Data & Knowledge Engineering Lab 16
  • 17. Experiments  Real Dataset  Political Blogs Dataset: 1490 vertices, 19090 edges, one attribute political leaning • Liberal • Conservative  Methods  K-SNAP: Attributes only  S-Cluster: Structure-based clustering  W-Cluster: Weighted random walk strategy  SA-Cluster: Consider both factors (matrix manipulation)  IGC-CSM: Our proposed method Data & Knowledge Engineering Lab 17
  • 18. Evaluation Metrics  Density*: intra-cluster structural cohesiveness  Entropy*: intra-cluster attribute homogeneity *Yang Zhou et al.,Graph Clustering Based on Structural/Attribute Similarities,Proceedings of VLDB Endowment,France (2009) Data & Knowledge Engineering Lab 18
  • 19. Evaluation Metrics (cont…)  F-Measure*: has the ability to evaluate the collective qualitative nature of the formed cluster *Tijn Witsenburg et al., Improving the Accuracy of Similarity Measures by Using Link Information, International Symposium on Methodologies for Intelligent Systems Edition 9, Poland (2011) Data & Knowledge Engineering Lab 19
  • 20. Results (Time Complexity)  Synthetic Dataset Graph size vs. time  Varying No. of Node *http://www-personal.umich.edu/mejn/netdata  Real Dataset  Political Blog*  No. of Clusters vs. Time Data & Knowledge Engineering Lab 20
  • 21. Results (Quality)  Density Evaluation  Clusters vs. Density Value  Entropy Evaluation  Clusters vs. Entropy Value Data & Knowledge Engineering Lab 21
  • 22. Results (Quality)  F-Measure Estimation  Clusters vs. F-measure Value Data & Knowledge Engineering Lab 22
  • 23. Conclusion  We study the problem of graph node clustering based on homogeneous characteristics in terms of context and topology  collaborative similarity measure to reflect the relational model among pair of vertices  k-Medoid clustering framework is adopted for grouping similar nodes  The resulting solution is estimated using state of the art evaluation measures:  Density, Entropy, and F-measure  Comparatively scalable to medium scale graphs without compromising on the quality of results Data & Knowledge Engineering Lab 23
  • 24. Thanks Any wicky786@khu.ac.kr Question…? wicky786@khu.ac.kr yklee@khu.ac.kr yklee@khu.ac.kr sylee@oslab.khu.ac.kr sylee@oslab.khu.ac.kr Data & Knowledge Engineering Lab 24

Notas del editor

  1. Many graphs with vertex attributes include social networks, World Wide Web, sensor networks, and etc.Let’s look at an example of a coauthor network of the top 200 authors on technology-enhanced learning from DBLP where a vertex represents an author and an edge represents the coauthor relationship between two authors. Each author contains multiple attributes: ID, Name, Affiliation, Research Interests, the number of coauthors, the number of publications, and etc.
  2. There are mainly three approaches: structure based clustering,OLAP-style graph aggregation, structural/attribute clustering. Structure based clustering includes, for example, normalized cuts by Shi and Malik, modularity by Newman and Girvan and Scan byXu et al.. It only considers structure similarity but ignore the vertex attribute. Therefore, the clusters generated have a rather random distribution of vertex properties withinclusters.For the second approach, there is a recent study K-SNAP by Tian et al.. It follows the attributes compatible grouping. As a result, the clusters generated have a rather loose intra-cluster structure.
  3. There are mainly three approaches: structure based clustering,OLAP-style graph aggregation, structural/attribute clustering. Structure based clustering includes, for example, normalized cuts by Shi and Malik, modularity by Newman and Girvan and Scan byXu et al.. It only considers structure similarity but ignore the vertex attribute. Therefore, the clusters generated have a rather random distribution of vertex properties withinclusters.For the second approach, there is a recent study K-SNAP by Tian et al.. It follows the attributes compatible grouping. As a result, the clusters generated have a rather loose intra-cluster structure.
  4. In this paper, we will study the problem of “An Intra-Graph Clustering Based on Collaborative Similarity Measure”.Two fold objectives are:A desired clustering should achieve a good balance between the following two properties: The first is structural cohesiveness, which means vertices within one cluster are close to each otherin terms of structure, while vertices between clusters are distantfrom each other. The second is attribute homogeneity, which says vertices within one cluster have similarattribute values, while vertices between clusters have quitedifferent attribute values.And should be scalable to medium (and large) scale graphs [in terms of time complexity without compromising on the quality of the results].
  5. For the structure-based clustering, although vertices within clusters are closely connected, they could have quite attribute values.For the attribute-based clustering, although vertices within clusters have the same attribute values, much structure information may be lost.For the structural/attribute clustering, both vertices within clusters are homogeneous, and vertices within clusters are closely connected and the graph keeps most structure information.On the other hand, Intra-Graph Clustering consider both factors (Structure and Homogeneity) for even medium scale graphs (Comparatively performs better in time as compared to the state of the art method)