Bài review cách tính nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị. Ứng dụng trong nhiều lĩnh vực như: telecome, internet routing, social network analysis, etc.
Distance oracle - Truy vấn nhanh khoảng cách giữa hai điểm bất kỳ trên đồ thị
1. A Geometric Distance Oracle for Large Real-World
Graphs
Hong, Ong Xuan
Data Science School
November 16, 2017
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 1 / 30
2. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 2 / 30
3. Introduction
Explosion of available
information → Mining
information about interactions
between: Subscribers, Groups,
People, Objects, etc.
Fundamental graph
computational is computing
shortest path distance
between arbitrary nodes, but:
Slow calculating and querying
distance results.
Limited memory for storing
graph.
How to do this analysis
effectively?
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 3 / 30
4. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 4 / 30
5. Background
Graph theory.
Distance oracle.
Approximate distance.
Metric space: Euclidean, Hyperbolic.
δ - hyperbolic metric space.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 5 / 30
6. Graph theory
Let G(V , E) be an undirected, weighted graph, with n = |N| nodes and
m = |E| edges. What is the distance between the nodes s and t?
Dijkstra algorithm: O(m + nlogn) with Fibonacci heap, requires no
extra space.
Adjacency matrix: query time O(1), requires O(n2) extra space.
Floyd-Warshall algorithm: return all-pairs shortest paths, initialized
in time O(n3)
How to use less than O(n2) space and answer queries in less than
O(m + nlogn)?
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 6 / 30
7. Distance oracle
A distance oracle (constant query time) is a data structure which is
cheaper to compute, fast to query, and satisfy 4 properties:
Preprocessing time should be O(n) or O(nlogn).
Storage less than O(n2).
Query less than O(m + nlogn).
Fidelity: approximated distance as close as possible to the actual
distances.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 7 / 30
8. Approximate distance oracles
Using spanning trees and distance labeling for approximating distances
(Thorup and Zwick):
Preprocessing time: O(kmn1/k).
Storage: O(kn1+1/k).
Query less than O(k).
Fidelity: estimated distance vs actual distance ∈ [1, 2k − 1].
Note: k = 1, 2, logn, higher values of k do not improve the space or
preprocessing time.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 8 / 30
9. Metric space
Ordered pair (M, d) where M is a set and d is a metric
d : M × M → R
∀x, y, z ∈ M, the following holds:
d(x, y) ≥ 0
d(x, y) = 0 ⇐⇒ x = y
d(x, y) = d(y, x)
d(x, z) ≤ d(x, y) + d(y, z)
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 9 / 30
10. Euclidean distance
d(p, q) = d(q, p) = (q1 − p1)2 + (q2 − p2)2 + ... + (qn − pn)2
=
n
i=1
(qi − pi )2
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 10 / 30
11. Hyperbolic distance
d( x1, y1 , x2, y2 ) = arcosh(coshy1cosh(x2 − x1)coshy2 − sinhy1sinhy2)
Where:
sinhx = ex −e−x
2 (hyperbolic Sine).
coshx = ex +e−x
2 (hyperbolic Cosine).
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 11 / 30
12. δ - hyperbolic metric space
Given metric space (V , d) embeds into tree metric iff 4-point condition
holds:
∀w, x, y, z ∈ V :
S := S(w, x, y, z) = d(w, x) + d(y, z)
M := M(w, x, y, z) = d(x, y) + d(w, z)
L := L(w, x, y, z) = d(x, z) + d(w, y)
S ≤ M ≤ L
Then: ∀δ ≥ 0, (L − M)/2 ≤ δ
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 12 / 30
13. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 13 / 30
14. Related works
Theoretical results provide guaranteed approximation bounds for
specific graph classes:
Distance labeling in hyperbolic graphs
A Note on Distance Approximating Trees in Graphs
Additive spanners and distance and routing labeling schemes for
hyperbolic graphs
A compact routing scheme and approximate distance oracle for
power-law graphs
Reconstructing approximate tree metrics
Essays in Group Theory
Diameters, centers, and approximating trees of δ-hyperbolic geodesic
spaces and graphs
But has not been empirically evaluated on real-world graphs.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 14 / 30
15. Related works
Spanning trees
Quick query O(nlogn).
Reduce space storage.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 15 / 30
16. Related works
Developing approximate distance oracles on empirical Graphs small world
graphs, hypergrid graphs, Facebook, telecom, Google news graph, web
graph, etc.
Efficient Shortest Paths on Massive Social Graphs
Fast fully dynamic landmark-based estimation of shortest path
distances in very large graphs
Querying Shortest Path Distance with Bounded Errors in Large
Graphs
Orion: shortest path estimation for large social graphs
Approximating Shortest Paths in Social Graphs
Fast exact shortest-path distance queries on large networks by pruned
landmark labeling
Toward a distance oracle for billion-node graphs
Heuristics lack a theoretical foundation.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 16 / 30
17. Related works
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 17 / 30
18. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 18 / 30
19. Proposed method
Hyperbolicity-based Breath First Search (HyperBFS). Notation from graph
hyperbolicity on real world networks for developing spanning trees:
Height ≤ O(logn)
Distance queries: O(logn)
Storage O(n) words of space for an n-node graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 19 / 30
20. Algorithm
Hyperbolicity-based Tree Oracle: constructing geometric oracle
Choose highly central vertex (measure of centrality in graph based on
shortest paths) as root. But we use out degree instead (power-law
network) cause they are correlated.
Build 1-10 trees (BFS algorithm) with distinct root by ordered degree
for approximation → parallel computing distance labeling.
Distances between x and y is minimum distances in different trees
constructed.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 20 / 30
21. Algorithm
Set 1: Embedding graph into multi-dimensional geometric space
Mapping the nodes of the graph into points in the hyperbolic space.
Distance between two d-dimension points x = (x1, x2, ..., xd ) and
y = (y1, y2, ..., yd ) is defined as follow:
arcosh( (1 +
d
i=1
x2
i )(1 +
d
i=1
y2
i ) −
d
i=1
xi yi ).|c|
Note: no guarantees on the distance estimation error
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 21 / 30
22. Algorithm
Set 2: Gromov-type tree contraction: improves the accuracy of distance
estimates.
partitioning tree into i-level connected component (coalesce multiple
edges into a single edge)
additive error guaranteed not to exceed 2δlogn, where δ is the
hyperbolic constant of the graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 22 / 30
23. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 23 / 30
24. Evaluation
Four Bench-marked:
Gromov-type contraction-based tree.
Steiner trees with proven multiplicative bound.
Rigel: landmark-based approach.
HyperBFS: centrality-based spanning tree oracle.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 24 / 30
25. Setup
2.4 GHz Intel(R) Xeon(R) processor with 190GB of RAM.
Calculate distortion: Let x, y be vertices of a graph G and let dA be the
distance approximated by a distance oracle:
Additive distortion: dG − dA.
Absolute distortion: |dG − dA|.
Multiplicative distortion: |dG −dA|
dG
.
Figure: Computational Time of Hyper BFS on Call Graph II.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 25 / 30
26. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 26 / 30
27. Average absolute error
Figure: Average absolute error on various real-world graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 27 / 30
28. Average additive and multiplicative error
Figure: Average additive and multiplicative error on SantaBarbara Facebook
graph.
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 28 / 30
29. Contents
1 Introduction
2 Background
3 Related works
4 Proposed method
5 Evaluation
6 Results
7 Discussion
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 29 / 30
30. Discussion
Exact and approximate algorithms for computing the hyperbolicity of
large-scale graphs (N. Cohen, D. Coudert, A. Lancin)
Indexing and space O(nm) vs O(n).
Query O(n) vs O(logn).
Exact distance vs error bound 2δlogn.
Extending metrics:
Clustering local coefficient: Ci =
2|{eji :vj ,vk ∈Ni ,ejk ∈E}|
ki (ki −1)
Hong, Ong Xuan (Data Science School) A Geometric Distance Oracle for Large Real-World GraphsNovember 16, 2017 30 / 30