SlideShare una empresa de Scribd logo
1 de 25
Knowledge Discovery in Databases Group




Link Prediction in Social Networks
                           Friday, 07 May 2009


                Svitlana Volkova
      Fulbright Master Student in Computer Science
     Computing and Information Sciences Department
                 Kansas State University

             234 Nichols Hall room 218, Manhattan, KS 66506-2302
           E-mail: svitlana[AT]k-state.edu or svitlana.volkova[AT]gmail.com
             Phones: mob. +1(785) 320 0113 | work +1(785) 532 7853
Agenda

 Introduction
 Related studies
 Methodology
   Mathematical representation for link prediction task
   Similarity measures
 Experiment
   Crawling Facebook
   Facebook Database
   Visualization Tools
 Conclusions
Why am I interested in social networks?


Visible Reasons
   Young
   Curious
   Like challenging tasks




Invisible Reasons
   Links from the social network reflect social behaviors of individuals




   Quantitative and Qualitative assessment of human relationships
Phenomenon of social networks

 The person who built the modern social network theory was
  the Stanley Milgram.




       [Social network] is a map of the individuals,
     and the ways how they are related to each other.
www.trustmesecurity.com/.../case/socialnetwork
Why is it difficult to predict links in social
networks?




 Collective structure


 Highly dynamic


 Sparse
Supervised vs. unsupervised methods in link
                       prediction task
                                Unsupervised                        Single Relational Table
                                 methods use
                                various similarity                    Data representation is
                                    measures                            “propositional” = “feature
                                                                        vector” or “attribute value”
              Supervised
           methods extract
         structural features to                                     Relational Data Mining
            learn a mapping                                           Inductive Logic Programming
                function
                                                                       (ILP)
Learning a binary classifier that will predict whether a
      link exists between a given pair of nodes
J48          OneR         IB1          Logistic      NaiveBayes
                               OR
AdaBoost                   Bagging                    RandForest
                               OR
Support Vector Machines (SVM)           Genetic Programming (GP)
                               OR
Bayesian networks(BN) and Probabilistic Relational Models (PRMs)
Classification based on features of entities

 Dr. William Hsu considered the problems of predicting,
 classifying annotating friends relations in social networks by
 application feature constructing approach




 Tim Weninger proposed genetic programming-based symbolic
 regression approach to the construction of the relational
 features for link analysis task in social networks

         Entity Attributes             Graph-Based Features
      (user/pair dependent)                (relational)

  Number of neighbors             Length of shortest path
  Interests                       Neighborhood overlap
  Topic model                     Relative importance
  Geographical location


  Interest popularity             Node’s Indegree/Outdegree
  Friends/Friend’s age            Forward/Backward deleted distance
Related Investigations in link prediction area


  exploring relational structure, clustering
                                                  [Jensen 2003, Getoor 2001]


  using links to predict classes/attributes of entities
                                       [Getoor,Taskar, Koller, Provost, Jensen]


  predicting link types based on known entity classes
                                                         [Taskar, Koller 2003]
  predicting links based on location in high-dimensional space
                                                            [Hoff et al., 2003]
  ranking potential links using a single graph-based feature
                                                             [Kleinberg 2004]
Mining tasks in network-structured data
                                                                  The identity of all objects is known
Node-related Tasks                                                + some link structure is known =>
                                                                  predict unobserved links
• Node-ranking
• Node-classification
• Node-clustering                                                 New objects arrive with information
                                                                  about some of their links + info
Structure-related Tasks                                           about some attributes => predict
                                                                  links among new objects

• Link prediction
• Structured pattern mining




                                               Link
                                            Prediction
                                              Tasks


                           Link                                               Link
                                     Link Type      Link Weight
                         Existence                                         Cardinality
Mathematic representation for unobserved link
     prediction task in social networks




                                        Time
Classification of measures for link prediction
                 approaches
                                       Link Prediction
                                         Approaches




       Node-wise Similarity           Topological Pattern         Probabalistic Model
        based Approaches               based Approaches            based Approaches




            Similarity measure in                                     Probabilistic relational
                                            Node base patterns
              binary classifiers                                             models




               Pairwise kernel                                          Bayesian relational
                                            Path based patterns
                  matrices                                                   models




             Statictical relational                                    Stochastic relational
                                           Graph based patterns
                   learning                                                  models
Node-wise Similarity based Approaches
Node-wise Similarity based Approaches (cont.)
Topological pattern based Approaches
Topological pattern based Approaches (cont.)
Comparison of similarity measures
                 Common      Jaccard’s    Adamic/Adar   Preferential     Kartz
                Neighbors    Similarity     Measure      Measure        Measure
  Common
 Neighbors          1           0.92          0.94         0.31          0.61
 Jaccard’s
 Similarity        0.92          1            0.97         0.53          0.75
Adamic/Adar
  Measure          0.94         0.97           1           0.49          0.70
Preferential
 Measure           0.31         0.53          0.49              1        0.84

   Katz
  Measure          0.61         0.75          0.70         0.84            1


http://www.cs.cornell.edu/home/kleinber/link-pred.pdf                    Correlation among differemt similarity measures
                                                          1.2
                                                                                            y = 0.0736x + 0.5443                     Common
                                                                                                 R² = 0.9102                         Neighbors
                                                            1
                                                                                                              y = 0.1173x + 0.2586   Jaccard’s
                                                                                                                   R² = 0.6507       Similarity
                                                          0.8
                                                                                                            y = -0.0604x + 1.0273    Adamic/Adar
                                                                                                                 R² = 0.3531         Measure
                                                          0.6                                                                        Preferential
                                                                                                                                     Measure
                                                                                                             y = -0.073x + 1.0535
                                                          0.4                                                                        Kartz Measure
                                                                                                                  R² = 0.4092
                                                                                              y = -0.1038x + 1.0881
                                                          0.2                                      R² = 0.4681


                                                            0
                                                                    0          2              4             6               8
                                                                                   Level of similarity, x
Crawling
  Facebook Social
     Network
 Crawler is automatic program which
   explores the WWW, following the
  links and searching for information
         or building a database.
It is used to build automated indexes
   for the Web, allowing users to do
       keyword searches for Web
              documents.




                                        www2007.org/posters/poster1057.pdf
http://www.flickr.com/photos/ikhnaton2/533233247/sizes/o/
Why we are more interested in Facebook?


     Betweenness                                       Degree
                          200 millions users
        Bridge                                    Flow betweenness
      Centrality                                      centrality
    Centralization        Doubling in size     Eigenvector centrality
                           once every six           Local Bridge
       Closeness
                               months                 Prestige
     Path Length
                          by 100,000 users           Radiality
 Clustering coefficient
                               per day                 Reach
       Cohesion
        Degree                                  Structural cohesion
  (Individual-level)                           Structural equivalence
         Density
Conclusions
Prediction task for previously unobserved links in social networks


 Concept of social network + social graph representation + mining tasks in network-
   structured data
 Related studies + existed approaches
    supervised vs. unsupervised methods
    single table data representation as feature vector vs. relational data mining
    link prediction task as classification with range of induces: J48, OneR, IB1, Logistic,
     NaiveBayes etc.
    other available approaches for resolving given task e.g. SVM, GP, BN, PRMs etc.
 Mathematic representation + classification and description of similarity measures
 The experiment was planned based on crawling technique with application of free open-
   source SQL full-text search engine Sphinx on Facebook corpus
 The visualization tools for social networks graph representation


                     http://www.youtube.com/watch?v=neAAzVquaRU
Thank you for attention!!!

Más contenido relacionado

Destacado

[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
Kyunghoon Kim
 

Destacado (6)

What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?What Is the Added Value of Negative Links in Online Social Networks?
What Is the Added Value of Negative Links in Online Social Networks?
 
Content-based link prediction
Content-based link predictionContent-based link prediction
Content-based link prediction
 
Arab Blogs Test
Arab Blogs TestArab Blogs Test
Arab Blogs Test
 
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
[20150829, PyCon2015] NetworkX를 이용한 네트워크 링크 예측
 
Ppt
PptPpt
Ppt
 
Who to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanationsWho to follow and why: link prediction with explanations
Who to follow and why: link prediction with explanations
 

Similar a Social Networks

Abstract
AbstractAbstract
Abstract
butest
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012
KM Chicago
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
Dmitry Grapov
 

Similar a Social Networks (20)

Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data GraphsSelectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
Selectivity Estimation for Hybrid Queries over Text-Rich Data Graphs
 
Advanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data AnalysisAdvanced strategies for Metabolomics Data Analysis
Advanced strategies for Metabolomics Data Analysis
 
UTS workshop talk
UTS workshop talkUTS workshop talk
UTS workshop talk
 
A Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social NetworksA Survey On Link Prediction In Social Networks
A Survey On Link Prediction In Social Networks
 
JOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in PracticeJOSA TechTalks - Machine Learning in Practice
JOSA TechTalks - Machine Learning in Practice
 
Mining Social Graph Data
Mining Social Graph DataMining Social Graph Data
Mining Social Graph Data
 
How Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and BeyondHow Graph Algorithms Answer your Business Questions in Banking and Beyond
How Graph Algorithms Answer your Business Questions in Banking and Beyond
 
about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.about data mining and Exp about data mining and Exp.
about data mining and Exp about data mining and Exp.
 
Chapter 10 link prediction
Chapter 10 link predictionChapter 10 link prediction
Chapter 10 link prediction
 
Attentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene GraphsAttentive Relational Networks for Mapping Images to Scene Graphs
Attentive Relational Networks for Mapping Images to Scene Graphs
 
Declarative analysis of noisy information networks
Declarative analysis of noisy information networksDeclarative analysis of noisy information networks
Declarative analysis of noisy information networks
 
Public profile
Public profilePublic profile
Public profile
 
Abstract
AbstractAbstract
Abstract
 
Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012Metaphors as design points for collaboration 2012
Metaphors as design points for collaboration 2012
 
A comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detectionA comprehensive survey of link mining and anomalies detection
A comprehensive survey of link mining and anomalies detection
 
Introduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCAIntroduction to Dimension Reduction with PCA
Introduction to Dimension Reduction with PCA
 
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...NS-CUK Seminar: H.B.Kim,  Review on "Deep Gaussian Embedding of Graphs: Unsup...
NS-CUK Seminar: H.B.Kim, Review on "Deep Gaussian Embedding of Graphs: Unsup...
 
Community detection in social networks[1]
Community detection in social networks[1]Community detection in social networks[1]
Community detection in social networks[1]
 
Advanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic DataAdvanced Strategies for Analysis of Metabolomic Data
Advanced Strategies for Analysis of Metabolomic Data
 
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
240325_JW_labseminar[node2vec: Scalable Feature Learning for Networks].pptx
 

Más de Svitlana volkova (18)

EACL'12 Poster
EACL'12 PosterEACL'12 Poster
EACL'12 Poster
 
Grace Hopper Celebration 2010
Grace Hopper Celebration 2010Grace Hopper Celebration 2010
Grace Hopper Celebration 2010
 
Multimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location RetrievalMultimodal Information Extraction: Disease, Date and Location Retrieval
Multimodal Information Extraction: Disease, Date and Location Retrieval
 
Web Intelligence 2010
Web Intelligence 2010Web Intelligence 2010
Web Intelligence 2010
 
Master Thesis
Master ThesisMaster Thesis
Master Thesis
 
MS Thesis Short
MS Thesis ShortMS Thesis Short
MS Thesis Short
 
IEEE ISI'10
IEEE ISI'10IEEE ISI'10
IEEE ISI'10
 
MedEx'10
MedEx'10MedEx'10
MedEx'10
 
Multilingual Ner Using Wiki
Multilingual Ner Using WikiMultilingual Ner Using Wiki
Multilingual Ner Using Wiki
 
WiML Poster
WiML PosterWiML Poster
WiML Poster
 
Topics Modeling
Topics ModelingTopics Modeling
Topics Modeling
 
Project Proposal Topics Modeling (Ir)
Project Proposal    Topics Modeling (Ir)Project Proposal    Topics Modeling (Ir)
Project Proposal Topics Modeling (Ir)
 
Methods Of Reliability Analysis
Methods Of Reliability AnalysisMethods Of Reliability Analysis
Methods Of Reliability Analysis
 
Ohio Project
Ohio ProjectOhio Project
Ohio Project
 
Ukraine Presentation
Ukraine PresentationUkraine Presentation
Ukraine Presentation
 
Ukraine Presentation at Kansas State University
Ukraine Presentation at Kansas State UniversityUkraine Presentation at Kansas State University
Ukraine Presentation at Kansas State University
 
Communicatons Fulbright
Communicatons FulbrightCommunicatons Fulbright
Communicatons Fulbright
 
Communications Ternopil
Communications TernopilCommunications Ternopil
Communications Ternopil
 

Último

Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
ciinovamais
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Último (20)

Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)Accessible Digital Futures project (20/03/2024)
Accessible Digital Futures project (20/03/2024)
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Third Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptxThird Battle of Panipat detailed notes.pptx
Third Battle of Panipat detailed notes.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 

Social Networks

  • 1. Knowledge Discovery in Databases Group Link Prediction in Social Networks Friday, 07 May 2009 Svitlana Volkova Fulbright Master Student in Computer Science Computing and Information Sciences Department Kansas State University 234 Nichols Hall room 218, Manhattan, KS 66506-2302 E-mail: svitlana[AT]k-state.edu or svitlana.volkova[AT]gmail.com Phones: mob. +1(785) 320 0113 | work +1(785) 532 7853
  • 2. Agenda  Introduction  Related studies  Methodology  Mathematical representation for link prediction task  Similarity measures  Experiment  Crawling Facebook  Facebook Database  Visualization Tools  Conclusions
  • 3. Why am I interested in social networks? Visible Reasons  Young  Curious  Like challenging tasks Invisible Reasons  Links from the social network reflect social behaviors of individuals  Quantitative and Qualitative assessment of human relationships
  • 4. Phenomenon of social networks  The person who built the modern social network theory was the Stanley Milgram. [Social network] is a map of the individuals, and the ways how they are related to each other.
  • 6. Why is it difficult to predict links in social networks?  Collective structure  Highly dynamic  Sparse
  • 7. Supervised vs. unsupervised methods in link prediction task Unsupervised  Single Relational Table methods use various similarity  Data representation is measures “propositional” = “feature vector” or “attribute value” Supervised methods extract structural features to  Relational Data Mining learn a mapping  Inductive Logic Programming function (ILP) Learning a binary classifier that will predict whether a link exists between a given pair of nodes J48 OneR IB1 Logistic NaiveBayes OR AdaBoost Bagging RandForest OR Support Vector Machines (SVM) Genetic Programming (GP) OR Bayesian networks(BN) and Probabilistic Relational Models (PRMs)
  • 8. Classification based on features of entities Dr. William Hsu considered the problems of predicting, classifying annotating friends relations in social networks by application feature constructing approach Tim Weninger proposed genetic programming-based symbolic regression approach to the construction of the relational features for link analysis task in social networks Entity Attributes Graph-Based Features (user/pair dependent) (relational)  Number of neighbors  Length of shortest path  Interests  Neighborhood overlap  Topic model  Relative importance  Geographical location  Interest popularity  Node’s Indegree/Outdegree  Friends/Friend’s age  Forward/Backward deleted distance
  • 9. Related Investigations in link prediction area  exploring relational structure, clustering [Jensen 2003, Getoor 2001]  using links to predict classes/attributes of entities [Getoor,Taskar, Koller, Provost, Jensen]  predicting link types based on known entity classes [Taskar, Koller 2003]  predicting links based on location in high-dimensional space [Hoff et al., 2003]  ranking potential links using a single graph-based feature [Kleinberg 2004]
  • 10. Mining tasks in network-structured data The identity of all objects is known Node-related Tasks + some link structure is known => predict unobserved links • Node-ranking • Node-classification • Node-clustering New objects arrive with information about some of their links + info Structure-related Tasks about some attributes => predict links among new objects • Link prediction • Structured pattern mining Link Prediction Tasks Link Link Link Type Link Weight Existence Cardinality
  • 11. Mathematic representation for unobserved link prediction task in social networks Time
  • 12. Classification of measures for link prediction approaches Link Prediction Approaches Node-wise Similarity Topological Pattern Probabalistic Model based Approaches based Approaches based Approaches Similarity measure in Probabilistic relational Node base patterns binary classifiers models Pairwise kernel Bayesian relational Path based patterns matrices models Statictical relational Stochastic relational Graph based patterns learning models
  • 14. Node-wise Similarity based Approaches (cont.)
  • 16. Topological pattern based Approaches (cont.)
  • 17. Comparison of similarity measures Common Jaccard’s Adamic/Adar Preferential Kartz Neighbors Similarity Measure Measure Measure Common Neighbors 1 0.92 0.94 0.31 0.61 Jaccard’s Similarity 0.92 1 0.97 0.53 0.75 Adamic/Adar Measure 0.94 0.97 1 0.49 0.70 Preferential Measure 0.31 0.53 0.49 1 0.84 Katz Measure 0.61 0.75 0.70 0.84 1 http://www.cs.cornell.edu/home/kleinber/link-pred.pdf Correlation among differemt similarity measures 1.2 y = 0.0736x + 0.5443 Common R² = 0.9102 Neighbors 1 y = 0.1173x + 0.2586 Jaccard’s R² = 0.6507 Similarity 0.8 y = -0.0604x + 1.0273 Adamic/Adar R² = 0.3531 Measure 0.6 Preferential Measure y = -0.073x + 1.0535 0.4 Kartz Measure R² = 0.4092 y = -0.1038x + 1.0881 0.2 R² = 0.4681 0 0 2 4 6 8 Level of similarity, x
  • 18. Crawling Facebook Social Network Crawler is automatic program which explores the WWW, following the links and searching for information or building a database. It is used to build automated indexes for the Web, allowing users to do keyword searches for Web documents. www2007.org/posters/poster1057.pdf
  • 20. Why we are more interested in Facebook? Betweenness Degree 200 millions users Bridge Flow betweenness Centrality centrality Centralization Doubling in size Eigenvector centrality once every six Local Bridge Closeness months Prestige Path Length by 100,000 users Radiality Clustering coefficient per day Reach Cohesion Degree Structural cohesion (Individual-level) Structural equivalence Density
  • 21.
  • 22.
  • 23.
  • 24. Conclusions Prediction task for previously unobserved links in social networks  Concept of social network + social graph representation + mining tasks in network- structured data  Related studies + existed approaches  supervised vs. unsupervised methods  single table data representation as feature vector vs. relational data mining  link prediction task as classification with range of induces: J48, OneR, IB1, Logistic, NaiveBayes etc.  other available approaches for resolving given task e.g. SVM, GP, BN, PRMs etc.  Mathematic representation + classification and description of similarity measures  The experiment was planned based on crawling technique with application of free open- source SQL full-text search engine Sphinx on Facebook corpus  The visualization tools for social networks graph representation http://www.youtube.com/watch?v=neAAzVquaRU
  • 25. Thank you for attention!!!