The “Local Ranking Problem” (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a webserver has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8.
2. “when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
Local Ranking Problem
- Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”
- Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and
reverse PageRank”
- Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values”
0.4
0.6
0.5
0.1
0.2
0.3
0.01
0.01
0.1
0.3
0.6
0.3
0.3
0.2
0.4
0.3
0.6
0.2
2
4. Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years
- Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic
discovery and photostream recommendation”
- Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing
behavior”
- Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling”
Provide higher-quality rankings
compared to standard hyperlinks graphs
- Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page
importance.”
4
6. Local Ranking Problem
on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012
We compared different ranking approaches on the BrowseGraph
(PageRank and BrowseRank among others)
How much our rank could
vary having more
information (i.e. nodes)?
6
7. BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based
on the referrer domain
Recommend news articles
following the ReferrerGraphs
BrowseGraph
Twitter ReferrerGraph
Facebook ReferrerGraph
7
Can we rely on
centrality-based algorithms
to infer news importance?
8. Local Ranking Problem
on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Growing Rings” experiment)
How to estimate the “distance” between the local and
global PageRank exploiting the structural properties of the
local graph
Discover the referrer domain when it is not available
(not discussed in the presentation—please see the paper)
8
9. Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGraph
1. Construct the BrowseGraph (our “global graph”)
2. Construct the ReferrerGraphs (our “local graphs”)
9
11. Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)
~different content or different users’ interest
Search engines are the most similar (>0.5)
Subgraph Comparison
11
12. 1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighborhood of
nodes
4. Iterate till the convergence is closer to 1
Growing Rings Experiment
Study of the LRP on the BrowseGraph
by incrementally expand the local graph
K(local+0, global) ~0.307
K(local+1, global) ~0.524
K(local+2, global) ~0.740
K(local+3, global) ~0.912
12
13. Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experiment
13
Same size referrer-based (SRB) to measure the
impact of the graph size
Random (R) : 7 random graphs reflecting the
size of the original RB graphs
16. Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi et al. [6] in the paper)
Hypothesis 2 : the most representative nodes
bring less noise and therefore a quicker
convergence (Cho et al. [13] in the paper)
How does the expansion influences
convergence if only few more
representative nodes are selected ?
Growing Rings Experiment with Selection of Nodes
16
17. Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representative nodes
lead to a better estimation of
PageRank values in the first
iteration
in the long run, expansions with
the highest number of nodes
present the best convergence
17
18. Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global graph
Predicting Kendall-tau Distance
Can we estimate the “distance”
between the local and global PageRank
only considering information available
in the local graph ?
18
19. Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local and global ranks.
Predicting Kendall-tau Distance
Can we estimate the distance
between the local and global PageRank
only considering information available
in the local graph ?
19
20. Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling
(1%, 5%, 10%, 20%)
homepage
Kendall-tau distance
between ReferrerGraph
and reduced subgraphs
20
21. Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with nodes with similar degree
Degree (D) : statistics on the degree distribution
Weighted degree (W) : same as degree but considering the
weight on edges (transitions)
Local PageRank (P) : stats on the PageRank values
Closeness centralization (C) : statistics on the distance (no hops)
• A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks”
• S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications”
Predicting Kendall-tau Distance
We compute 62
structural graphs
metrics for each
training instance
Extract Structural Properties of each Graph
21
22. Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features
~better than using all the features
assortativity : less predictive power
~too many features and too little training data?
22
Predicting Kendall-tau Distance
23. Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution
of in- and out- degree:
very straightforward to compute
information alway available in the
local graph
23
24. YES.
With just few structural properties
features of the of the local graph.
Predicting Kendall-tau Distance
Can we estimate the distance
between the local and global PageRank
only considering information available
in the local graph ?
24
25. Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” experiment)
or with the most representative nodes
(“Growing Rings with Selection of Nodes”)
It is possible to estimate the “distance” between the local
and global PageRank exploiting the structural properties of
the local graph
25