SlideShare una empresa de Scribd logo
1 de 31
Inference and Validation of Networks
ECML/PKDD 2009
Ilias N. Flaounas1, Marco Turchi2,
Tijl De Bie2 and Nello Cristianini1,2
Department of Computer Science, Bristol University1
Department of Engineering Mathematics, Bristol University2
September 10, 2009
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 1 / 31
Some Network Inference Examples
Karate Club (Zachary, 1972) Emails (Adamic et al, 2003)
Dating (Berman et al, 2004) Internet Nodes (Alvarez et al, 2007)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 2 / 31
Some more difficult examples
Gene pathways Protein interactions
(Conti et al, 2007) (Jeong et al, 2001)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 3 / 31
Our problem: The News Media Network
But, is this network valid? How to validate a network inferred from data?
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 4 / 31
Pattern Validation
In general, patterns found in data can be validated in two different ways:
by assessing their significance
by measuring their predictive power
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 5 / 31
Network Validation
We argue that inferred network need to satisfy some key properties:
It should to be stable.
It should be related to any available ground truth.
The topology should be predictable.
The first two properties can be verified by testing if the inferred network is
similar to a reference network - Hypothesis Testing. We verify the third
property by using Generalised Linear Models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 6 / 31
Hypothesis Testing
The key idea is to quantify the probability that a the value of a test
statistic evaluated on observed data could have been found also in
random data (p-value).
The p-value is calculated by:
p = PG∼H0 (tGR
(G) ≤ tGR
(GI )) (1)
p ≈
#{G : tGR
(G) ≤ tGR
(GI )} + 1
K + 1
(2)
A significance threshold α is chosen, and the null hypothesis is rejected if
p < α.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 7 / 31
Test Statistic
As a test statistic we will use the similarity or distance of the inferred
network to a reference network. In literature several approaches have been
proposed for the measurement of similarity between networks.
In the case of comparing two networks GA and GB that have the same set
of nodes, we can use Jaccard Distance.
Jaccard Distance
JD(EA, EB) =
|EA ∪ EB| − |EA ∩ EB|
|EA ∪ EB|
(3)
A value of zero indicates identical networks, and a value of one indicates
no shared edges.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 8 / 31
Null Models
In the case of validating network patterns, we need to specify a model of
random network generation.
Erd¨os-R´enyi Model
A random graph G(n, p) is comprised of n nodes
Two nodes have a probability p to be connected
Switching Randomisation
Starts from a given graph and randomises it by switching edges:
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 9 / 31
Prediction of Topology
If one or more ground truth components are available, they are expected
to be able to ‘explain’ the existence of some edges of the network.
The combination of ground truth elements can be made using
Generalized Linear Models (GLMs).
To assess the ability of GLM to predict the network structure we use
a methodology similar to this found in supervised classification.
Area Under Curve (AUC) was used as the performance measure.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 10 / 31
The News Outlets case
We analysed a dataset of
1,017,348 news articles
over a period of 12 consecutive weeks (from October 1st, 2008)
543 news outlets
32 different locations
7 different media types (e.g., newspapers, blogs, etc)
22 different languages
81,816 stories (clusters of articles)
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 11 / 31
Comparison of three Inference methods
Method A: Connection of two outlets if they share a minimum
number of stories.
Method B: Based on TF-IDF scheme where outlets correspond to
documents and clusters to terms.
Method C: Use a weighting scheme for each cluster, based on how
common it is among outlets.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 12 / 31
Validation of Outlets Network
First property states that the network should be stable.
The reference network would be a network inferred based on
independent data
We use 12 different datasets, one for each of the 12 weeks
K = 1000 random networks per null model
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 13 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
the network inferred from data of the previous week (Week 0).
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 14 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on the Erd¨os Model.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 15 / 31
Stability
We measure the distance of the network inferred from data of Week 1, to
a random network constructed based on the Switching Model.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 16 / 31
Stability
We measure the distance of the network inferred from data of Week 2, to
the network inferred from data of the previous week (Week 1).
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 17 / 31
Stability
We measure the distance of the network inferred from data of Week 2, to
the networks inferred from the Null models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 18 / 31
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 19 / 31
Stability
Each week has three comparisons:
to the network of the previous week.
to an Erd¨os Random Graph
to a Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 20 / 31
Stability
We compare the inferred networks of 12 sequential weeks.
Method A Method B Method C
The networks inferred by each method are significantly similar (p < 0.001)
to those inferred the previous week with the same method, under both null
models.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 21 / 31
Comparison to ground truth components
We compare to the corresponding networks of some available ground
truth networks. For the case of news outlets we compared to the networks
of:
Location
Language
Type
Example
The ‘Location’ network is comprised of cliques:
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 22 / 31
Location
Method A Method B Method C
Each week has three comparisons:
to the real Location-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 23 / 31
Language
Method A Method B Method C
Each week has three comparisons:
to the real Language-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 24 / 31
Media Type
Method A Method B Method C
Each week has three comparisons:
to the real Media Type-Network
to the corresponding Erd¨os Random Graph
to the corresponding Switching Randomized Graph
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 25 / 31
Selecting Inference Method
The selection of the inference method will be made based on their ability
to create significant results.
Table: The number of weeks that each Method presented significant results with
p < 0.001.
Previous week Location Language Media-Type
ER SR ER SR ER SR ER SR
Method A 11 11 1 11 0 9 11 11
Method B 11 11 11 11 11 11 2 11
Method C 11 11 11 11 11 11 11 11
Only Method C passes all tests successfully
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 26 / 31
Prediction power
How well can we predict the existence of an edge given some ground truth
network?
We measure AUC under a 100-folds cross validation scheme.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 27 / 31
Prediction power
Using all ground truth networks together and each one seperately, we get:
Method A Method B Method C
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 28 / 31
Prediction power
Using pairs of ground truth networks we get:
Method A Method B Method C
Overall, Method C gives the best results of 77.1% using all three ground
truth components.
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 29 / 31
Conclusions
We showed how:
to select the inference algorithm
to validate the network in terms of stability, relation to ground truth
components and topology prediction
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 30 / 31
Thank you!
I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31

Más contenido relacionado

Similar a Inference and validation of networks

Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
Ashwani kumar
 
nmat4448 (1) (1)
nmat4448 (1) (1)nmat4448 (1) (1)
nmat4448 (1) (1)
Petra Scudo
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.ppt
SanjarMadraximov
 

Similar a Inference and validation of networks (20)

Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
Calculation of Torsion Capacity of the Reinforced Concrete Beams Using Artifi...
 
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
Further Analysis Of A Framework To Analyze Network Performance Based On Infor...
 
Node similarity
Node similarityNode similarity
Node similarity
 
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
Redundancy elimination of_big_sensor_data_using_bayesian_networks (1)
 
Report-de Bruijn Graph
Report-de Bruijn GraphReport-de Bruijn Graph
Report-de Bruijn Graph
 
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
Clustered Network MIMO and Fractional Frequency Reuse for the Downlink in LTE...
 
Brema tarigan 09030581721015
Brema tarigan 09030581721015Brema tarigan 09030581721015
Brema tarigan 09030581721015
 
nmat4448 (1) (1)
nmat4448 (1) (1)nmat4448 (1) (1)
nmat4448 (1) (1)
 
modularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.pptmodularity Modularity Modularity Modularity Modularity.ppt
modularity Modularity Modularity Modularity Modularity.ppt
 
On the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood modelOn the identifiability of phylogenetic networks under a pseudolikelihood model
On the identifiability of phylogenetic networks under a pseudolikelihood model
 
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone SystemsVBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
VBCA: A Virtual Forces Clustering Algorithm for Autonomous Aerial Drone Systems
 
D05422835
D05422835D05422835
D05422835
 
2013 03-20 zhang
2013 03-20 zhang2013 03-20 zhang
2013 03-20 zhang
 
Satellite imageclassification
Satellite imageclassificationSatellite imageclassification
Satellite imageclassification
 
Network Models.pptx
Network  Models.pptxNetwork  Models.pptx
Network Models.pptx
 
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentationUNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
UNet-VGG16 with transfer learning for MRI-based brain tumor segmentation
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
 
Apt thomas kelly
Apt thomas kellyApt thomas kelly
Apt thomas kelly
 
Radio propagation models in wireless
Radio propagation models in wirelessRadio propagation models in wireless
Radio propagation models in wireless
 

Más de Ilias Flaounas

Más de Ilias Flaounas (8)

The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at Atlassian
 
On Storing Big Data
On Storing Big DataOn Storing Big Data
On Storing Big Data
 
Readability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of NewsReadability and Linguistic Subjectivity of News
Readability and Linguistic Subjectivity of News
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphere
 
Detecting Patterns in News Media Content
Detecting Patterns in News Media ContentDetecting Patterns in News Media Content
Detecting Patterns in News Media Content
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translation
 
Data Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisationData Science at Atlassian: 
The transition towards a data-driven organisation
Data Science at Atlassian: 
The transition towards a data-driven organisation
 

Último

Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
RohitNehra6
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 

Último (20)

Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 

Inference and validation of networks

  • 1. Inference and Validation of Networks ECML/PKDD 2009 Ilias N. Flaounas1, Marco Turchi2, Tijl De Bie2 and Nello Cristianini1,2 Department of Computer Science, Bristol University1 Department of Engineering Mathematics, Bristol University2 September 10, 2009 I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 1 / 31
  • 2. Some Network Inference Examples Karate Club (Zachary, 1972) Emails (Adamic et al, 2003) Dating (Berman et al, 2004) Internet Nodes (Alvarez et al, 2007) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 2 / 31
  • 3. Some more difficult examples Gene pathways Protein interactions (Conti et al, 2007) (Jeong et al, 2001) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 3 / 31
  • 4. Our problem: The News Media Network But, is this network valid? How to validate a network inferred from data? I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 4 / 31
  • 5. Pattern Validation In general, patterns found in data can be validated in two different ways: by assessing their significance by measuring their predictive power I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 5 / 31
  • 6. Network Validation We argue that inferred network need to satisfy some key properties: It should to be stable. It should be related to any available ground truth. The topology should be predictable. The first two properties can be verified by testing if the inferred network is similar to a reference network - Hypothesis Testing. We verify the third property by using Generalised Linear Models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 6 / 31
  • 7. Hypothesis Testing The key idea is to quantify the probability that a the value of a test statistic evaluated on observed data could have been found also in random data (p-value). The p-value is calculated by: p = PG∼H0 (tGR (G) ≤ tGR (GI )) (1) p ≈ #{G : tGR (G) ≤ tGR (GI )} + 1 K + 1 (2) A significance threshold α is chosen, and the null hypothesis is rejected if p < α. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 7 / 31
  • 8. Test Statistic As a test statistic we will use the similarity or distance of the inferred network to a reference network. In literature several approaches have been proposed for the measurement of similarity between networks. In the case of comparing two networks GA and GB that have the same set of nodes, we can use Jaccard Distance. Jaccard Distance JD(EA, EB) = |EA ∪ EB| − |EA ∩ EB| |EA ∪ EB| (3) A value of zero indicates identical networks, and a value of one indicates no shared edges. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 8 / 31
  • 9. Null Models In the case of validating network patterns, we need to specify a model of random network generation. Erd¨os-R´enyi Model A random graph G(n, p) is comprised of n nodes Two nodes have a probability p to be connected Switching Randomisation Starts from a given graph and randomises it by switching edges: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 9 / 31
  • 10. Prediction of Topology If one or more ground truth components are available, they are expected to be able to ‘explain’ the existence of some edges of the network. The combination of ground truth elements can be made using Generalized Linear Models (GLMs). To assess the ability of GLM to predict the network structure we use a methodology similar to this found in supervised classification. Area Under Curve (AUC) was used as the performance measure. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 10 / 31
  • 11. The News Outlets case We analysed a dataset of 1,017,348 news articles over a period of 12 consecutive weeks (from October 1st, 2008) 543 news outlets 32 different locations 7 different media types (e.g., newspapers, blogs, etc) 22 different languages 81,816 stories (clusters of articles) I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 11 / 31
  • 12. Comparison of three Inference methods Method A: Connection of two outlets if they share a minimum number of stories. Method B: Based on TF-IDF scheme where outlets correspond to documents and clusters to terms. Method C: Use a weighting scheme for each cluster, based on how common it is among outlets. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 12 / 31
  • 13. Validation of Outlets Network First property states that the network should be stable. The reference network would be a network inferred based on independent data We use 12 different datasets, one for each of the 12 weeks K = 1000 random networks per null model I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 13 / 31
  • 14. Stability We measure the distance of the network inferred from data of Week 1, to the network inferred from data of the previous week (Week 0). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 14 / 31
  • 15. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Erd¨os Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 15 / 31
  • 16. Stability We measure the distance of the network inferred from data of Week 1, to a random network constructed based on the Switching Model. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 16 / 31
  • 17. Stability We measure the distance of the network inferred from data of Week 2, to the network inferred from data of the previous week (Week 1). I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 17 / 31
  • 18. Stability We measure the distance of the network inferred from data of Week 2, to the networks inferred from the Null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 18 / 31
  • 19. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 19 / 31
  • 20. Stability Each week has three comparisons: to the network of the previous week. to an Erd¨os Random Graph to a Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 20 / 31
  • 21. Stability We compare the inferred networks of 12 sequential weeks. Method A Method B Method C The networks inferred by each method are significantly similar (p < 0.001) to those inferred the previous week with the same method, under both null models. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 21 / 31
  • 22. Comparison to ground truth components We compare to the corresponding networks of some available ground truth networks. For the case of news outlets we compared to the networks of: Location Language Type Example The ‘Location’ network is comprised of cliques: I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 22 / 31
  • 23. Location Method A Method B Method C Each week has three comparisons: to the real Location-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 23 / 31
  • 24. Language Method A Method B Method C Each week has three comparisons: to the real Language-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 24 / 31
  • 25. Media Type Method A Method B Method C Each week has three comparisons: to the real Media Type-Network to the corresponding Erd¨os Random Graph to the corresponding Switching Randomized Graph I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 25 / 31
  • 26. Selecting Inference Method The selection of the inference method will be made based on their ability to create significant results. Table: The number of weeks that each Method presented significant results with p < 0.001. Previous week Location Language Media-Type ER SR ER SR ER SR ER SR Method A 11 11 1 11 0 9 11 11 Method B 11 11 11 11 11 11 2 11 Method C 11 11 11 11 11 11 11 11 Only Method C passes all tests successfully I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 26 / 31
  • 27. Prediction power How well can we predict the existence of an edge given some ground truth network? We measure AUC under a 100-folds cross validation scheme. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 27 / 31
  • 28. Prediction power Using all ground truth networks together and each one seperately, we get: Method A Method B Method C I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 28 / 31
  • 29. Prediction power Using pairs of ground truth networks we get: Method A Method B Method C Overall, Method C gives the best results of 77.1% using all three ground truth components. I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 29 / 31
  • 30. Conclusions We showed how: to select the inference algorithm to validate the network in terms of stability, relation to ground truth components and topology prediction I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 30 / 31
  • 31. Thank you! I. Flaounas et al. (Bristol University) Inference and Validation of Networks September 10, 2009 31 / 31