SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1
“when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
Local Ranking Problem
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”

- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and
reverse PageRank”

- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2
The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages 

and edges are browsing transitions”
user navigation

(e.g. Flickr)
construction
Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years

- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic
discovery and photostream recommendation”

- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing
behavior”

- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings 

compared to standard hyperlinks graphs

- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page
importance.”
4
Local Ranking Problem
on the BrowseGraph
WHY?
5
Local Ranking Problem
on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012

We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could
vary having more
information (i.e. nodes)?
6
BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based 

on the referrer domain
Recommend news articles
following the ReferrerGraphs
BrowseGraph
Twitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on 

centrality-based algorithms
to infer news importance?
Local Ranking Problem
on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and
global PageRank exploiting the structural properties of the
local graph
Discover the referrer domain when it is not available 

(not discussed in the presentation—please see the paper)
8
Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)
2. Construct the ReferrerGraphs (our “local graphs”)
9
Very different dimensions
Subgraph Comparison
Very well connected 

(also Reddit—the smallest one)
10
Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)

~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11
1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighborhood of
nodes
4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph 

by incrementally expand the local graph
K(local+0, global) ~0.307
K(local+1, global) ~0.524
K(local+2, global) ~0.740
K(local+3, global) ~0.912
12
Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experiment
13
Same size referrer-based (SRB) to measure the
impact of the graph size
Random (R) : 7 random graphs reflecting the
size of the original RB graphs
Growing Rings Experiment
14
ReferrerGraphs
Growing Rings Experiment
15
same size RGs RandomReferrerGraphs
Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes
bring less noise and therefore a quicker
convergence (Cho et al. [13] in the paper)
How does the expansion influences
convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16
Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representative nodes
lead to a better estimation of
PageRank values in the first
iteration
in the long run, expansions with
the highest number of nodes
present the best convergence
17
Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance”
between the local and global PageRank
only considering information available
in the local graph ?
18
Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local and global ranks.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
19
Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling 

(1%, 5%, 10%, 20%)
homepage
Kendall-tau distance

between ReferrerGraph

and reduced subgraphs
20
Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with nodes with similar degree
Degree (D) : statistics on the degree distribution
Weighted degree (W) : same as degree but considering the
weight on edges (transitions)
Local PageRank (P) : stats on the PageRank values
Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks”
• S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
Predicting Kendall-tau Distance
We compute 62
structural graphs
metrics for each
training instance
Extract Structural Properties of each Graph
21
Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features
~better than using all the features
assortativity : less predictive power 

~too many features and too little training data?
22
Predicting Kendall-tau Distance
Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution
of in- and out- degree:
very straightforward to compute
information alway available in the
local graph
23
YES.

With just few structural properties
features of the of the local graph.
Predicting Kendall-tau Distance
Can we estimate the distance

between the local and global PageRank
only considering information available
in the local graph ?
24
Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” experiment)
or with the most representative nodes

(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local
and global PageRank exploiting the structural properties of
the local graph
25
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
26
Thanks.

Más contenido relacionado

Similar a Presentation @SIGIR2015

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfnoureddinebassa1
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAbhishek Mungoli
 
Ranking spatial data by quality preferences ppt
Ranking spatial data by quality preferences  pptRanking spatial data by quality preferences  ppt
Ranking spatial data by quality preferences pptSaurav Kumar
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Symeon Papadopoulos
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015Ioan Toma
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Konstantinos Zagoris
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIsam Al Jawarneh, PhD
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors LuceneSease
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneSease
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceNeo4j
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AIDatabricks
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filtersaman shaheen
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Ram Sriharsha
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaGezim Sejdiu
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POIIRJET Journal
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...dbpublications
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AINeo4j
 

Similar a Presentation @SIGIR2015 (20)

IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdfIGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
IGIS Workshop - Introduction to ArcGIS Pro - Apr 2022 - Presentation.pdf
 
Analysis of different similarity measures: Simrank
Analysis of different similarity measures: SimrankAnalysis of different similarity measures: Simrank
Analysis of different similarity measures: Simrank
 
Ranking spatial data by quality preferences ppt
Ranking spatial data by quality preferences  pptRanking spatial data by quality preferences  ppt
Ranking spatial data by quality preferences ppt
 
Manos
ManosManos
Manos
 
Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...Placing Images with Refined Language Models and Similarity Search with PCA-re...
Placing Images with Refined Language Models and Similarity Search with PCA-re...
 
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
The LDBC Social Network Benchmark Interactive Workload - SIGMOD 2015
 
Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...Segmentation - based Historical Handwritten Word Spotting using document-spec...
Segmentation - based Historical Handwritten Word Spotting using document-spec...
 
IEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al JawarnehIEEE Camad20 presentation - Isam Al Jawarneh
IEEE Camad20 presentation - Isam Al Jawarneh
 
Multi Valued Vectors Lucene
Multi Valued Vectors LuceneMulti Valued Vectors Lucene
Multi Valued Vectors Lucene
 
Introducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache LuceneIntroducing Multi Valued Vectors Fields in Apache Lucene
Introducing Multi Valued Vectors Fields in Apache Lucene
 
GraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data ScienceGraphTour 2020 - Graphs & AI: A Path for Data Science
GraphTour 2020 - Graphs & AI: A Path for Data Science
 
How Graph Technology is Changing AI
How Graph Technology is Changing AIHow Graph Technology is Changing AI
How Graph Technology is Changing AI
 
Comparison of papers NN-filter
Comparison of papers NN-filterComparison of papers NN-filter
Comparison of papers NN-filter
 
Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017Magellan FOSS4G Talk, Boston 2017
Magellan FOSS4G Talk, Boston 2017
 
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD VivaEfficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
Efficient Distributed In-Memory Processing of RDF Datasets - PhD Viva
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Survey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POISurvey on Location Based Recommendation System Using POI
Survey on Location Based Recommendation System Using POI
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
 
How Graphs are Changing AI
How Graphs are Changing AIHow Graphs are Changing AI
How Graphs are Changing AI
 

Último

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7DianaGray10
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URLRuncy Oommen
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfDaniel Santiago Silva Capera
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsSeth Reyes
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsSafe Software
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxMatsuo Lab
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDELiveplex
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-pyJamie (Taka) Wang
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfJamie (Taka) Wang
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfDianaGray10
 

Último (20)

UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7UiPath Studio Web workshop series - Day 7
UiPath Studio Web workshop series - Day 7
 
Designing A Time bound resource download URL
Designing A Time bound resource download URLDesigning A Time bound resource download URL
Designing A Time bound resource download URL
 
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdfIaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
IaC & GitOps in a Nutshell - a FridayInANuthshell Episode.pdf
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Computer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and HazardsComputer 10: Lesson 10 - Online Crimes and Hazards
Computer 10: Lesson 10 - Online Crimes and Hazards
 
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration WorkflowsIgniting Next Level Productivity with AI-Infused Data Integration Workflows
Igniting Next Level Productivity with AI-Infused Data Integration Workflows
 
Introduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptxIntroduction to Matsuo Laboratory (ENG).pptx
Introduction to Matsuo Laboratory (ENG).pptx
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDEADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
ADOPTING WEB 3 FOR YOUR BUSINESS: A STEP-BY-STEP GUIDE
 
201610817 - edge part1
201610817 - edge part1201610817 - edge part1
201610817 - edge part1
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
20150722 - AGV
20150722 - AGV20150722 - AGV
20150722 - AGV
 
20230202 - Introduction to tis-py
20230202 - Introduction to tis-py20230202 - Introduction to tis-py
20230202 - Introduction to tis-py
 
20230104 - machine vision
20230104 - machine vision20230104 - machine vision
20230104 - machine vision
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
activity_diagram_combine_v4_20190827.pdfactivity_diagram_combine_v4_20190827.pdf
 
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdfUiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
UiPath Solutions Management Preview - Northern CA Chapter - March 22.pdf
 

Presentation @SIGIR2015

  • 1. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 1
  • 2. “when the centrality-like rank computed on a local graph differ from the ones on the global graph” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 Local Ranking Problem - Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”
 - Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”
 - Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 0.3 0.6 0.3 0.3 0.2 0.4 0.3 0.6 0.2 2
  • 3. The BrowseGraph user session BrowseGraph 3 “a graph where nodes are webpages 
 and edges are browsing transitions” user navigation
 (e.g. Flickr) construction
  • 4. Centrality Metrics applied to the BrowseGraph Increasing popularity in recent years
 - Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”
 - Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”
 - Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling” Provide higher-quality rankings 
 compared to standard hyperlinks graphs
 - Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.” 4
  • 5. Local Ranking Problem on the BrowseGraph WHY? 5
  • 6. Local Ranking Problem on the BrowseGraph WHY? Image Ranking in Flickr in SIGIR 2012 We compared different ranking approaches on the BrowseGraph (PageRank and BrowseRank among others) How much our rank could vary having more information (i.e. nodes)? 6
  • 7. BrowseGraph and ReferrerGraphs ReferrerGraphs: Domain-dependent Browse Graph Construct different BrowseGraphs based 
 on the referrer domain Recommend news articles following the ReferrerGraphs BrowseGraph Twitter ReferrerGraph Facebook ReferrerGraph 7 Can we rely on 
 centrality-based algorithms to infer news importance?
  • 8. Local Ranking Problem on the BrowseGraph Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment) How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph Discover the referrer domain when it is not available 
 (not discussed in the presentation—please see the paper) 8
  • 9. Social Networks Search Engines News Homepage Yahoo News BrowseGraph ~500M pageviews Local Ranking Problem on the BrowseGraph 1. Construct the BrowseGraph (our “global graph”) 2. Construct the ReferrerGraphs (our “local graphs”) 9
  • 10. Very different dimensions Subgraph Comparison Very well connected 
 (also Reddit—the smallest one) 10
  • 11. Cross-distance Kendall-tau among common nodes (min overlap 1k) In general the similarities are very low (<0.3)
 ~different content or different users’ interest Search engines are the most similar (>0.5) Subgraph Comparison 11
  • 12. 1. For each ReferrerGraph 2. Compare the PageRank values with the global one (Kendall-tau) 3. Expand with the next neighborhood of nodes 4. Iterate till the convergence is closer to 1 Growing Rings Experiment Study of the LRP on the BrowseGraph 
 by incrementally expand the local graph K(local+0, global) ~0.307 K(local+1, global) ~0.524 K(local+2, global) ~0.740 K(local+3, global) ~0.912 12
  • 13. Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing) Growing Rings Experiment 13 Same size referrer-based (SRB) to measure the impact of the graph size Random (R) : 7 random graphs reflecting the size of the original RB graphs
  • 15. Growing Rings Experiment 15 same size RGs RandomReferrerGraphs
  • 16. Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper) Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper) How does the expansion influences convergence if only few more representative nodes are selected ? Growing Rings Experiment with Selection of Nodes 16
  • 17. Growing Rings Experiment with Selection of Nodes • 5 • 10 • 30 • 50 • 100 • 100 • 50 • 30 • 10 • 5 fewer more representative nodes lead to a better estimation of PageRank values in the first iteration in the long run, expansions with the highest number of nodes present the best convergence 17
  • 18. Growing Rings Expansion ..with Selected Nodes ~1 or 2 steps can be enough to estimate the PageRank score of the global graph Predicting Kendall-tau Distance Can we estimate the “distance” between the local and global PageRank only considering information available in the local graph ? 18
  • 19. Hypothesis : some structural properties of the graph could be a good proxies for the tau value difference between local and global ranks. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 19
  • 20. Training Set Construction Predicting Kendall-tau Distance ReferrerGraph Jackknife resampling 
 (1%, 5%, 10%, 20%) homepage Kendall-tau distance
 between ReferrerGraph
 and reduced subgraphs 20
  • 21. Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops) • A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications” Predicting Kendall-tau Distance We compute 62 structural graphs metrics for each training instance Extract Structural Properties of each Graph 21
  • 22. Regression Analysis (RF) in a five-fold CV over 10 iterations weighted degree : most predictive features ~better than using all the features assortativity : less predictive power 
 ~too many features and too little training data? 22 Predicting Kendall-tau Distance
  • 23. Predicting Kendall-tau Distance Most importance features in weighted degree : features based on the distribution of in- and out- degree: very straightforward to compute information alway available in the local graph 23
  • 24. YES.
 With just few structural properties features of the of the local graph. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 24
  • 25. Summary How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes
 (“Growing Rings with Selection of Nodes”) It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph 25
  • 26. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 26 Thanks.