SlideShare una empresa de Scribd logo
1 de 83
Descargar para leer sin conexión
1/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
                     Two unrelated talks
Local ranking in
practice
Conclusions

psort, yet another      M ARCO B RESSAN
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
                        January 30, 2012
Conclusions

Conclusions
Outline

     2/43

                     1 Local computation of PageRank: the ranking side
Local
computation of           Introduction
PageRank: the
ranking side             Motivations
Introduction
Motivations              Local ranking in theory
Local ranking in
theory                   Local ranking in practice
Local ranking in
practice                 Conclusions
Conclusions

psort, yet another
fast stable          2   psort, yet another fast stable external sorting software
external sorting
software                  Introduction
Introduction
Making sorting a          Making sorting a complicate task
complicate task
Inside psort              Inside psort
Conclusions
                          Conclusions
Conclusions


                     3 Conclusions
3/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions
                     Local computation of PageRank:
psort, yet another
fast stable
                             the ranking side
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Ranking robustly

     4/43


Local                Rank a graph’s nodes
computation of
PageRank: the
ranking side
Introduction
                          1. the graph       2. external factors
Motivations
Local ranking in
theory                                      • (varying) parameters
Local ranking in
practice
Conclusions
                                            • graph availability
psort, yet another
fast stable
                                            • ...
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Ranking robustly

     4/43


Local                Rank a graph’s nodes
computation of
PageRank: the
ranking side
Introduction
                           1. the graph            2. external factors
Motivations
Local ranking in
theory                                           • (varying) parameters
Local ranking in
practice
Conclusions
                                                 • graph availability
psort, yet another
fast stable
                                                 • ...
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions
                     Is ranking robust?
Conclusions
                        How is ranking influenced by external factors?
PageRank

     5/43


Local
                                    PageRank of node v:
computation of
PageRank: the
ranking side               u                           P (u)
Introduction
                                       P (v) =
Motivations
                                                 u→v
                                                       o(u)
                                v
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
PageRank

     5/43


Local
                                    PageRank of node v:
computation of
PageRank: the
ranking side               u                             P (u)   1−α
Introduction
                                       P (v) = α               +
Motivations
                                                   u→v
                                                         o(u)     n
                                v
Local ranking in
theory
Local ranking in
practice
                                      n = |G|      α = damping factor
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
PageRank

     5/43


Local
                                                             PageRank of node v:
computation of
PageRank: the
ranking side                          u                                                P (u)   1−α
Introduction
                                                                   P (v) = α                 +
Motivations
                                                                                 u→v
                                                                                       o(u)     n
                                               v
Local ranking in
theory
Local ranking in
practice
                                                                  n = |G|        α = damping factor
Conclusions

psort, yet another
fast stable          Applications
external sorting
software             web search, web crawling, web spam detection, personalized web search, social network
Introduction
Making sorting a     mining, ranking in databases, structural re-ranking, opinion mining, word sense
complicate task
Inside psort         disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . .
Conclusions

Conclusions
                     Among top data mining algorithms
                     Wu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.
Choose the damping, choose the ranking?

     6/43
                                                     Is PageRank’s ranking
                                       P (u)   1−α   robust to small variations
Local
computation of
                     P (v) = α               +
PageRank: the                    u→v
                                       o(u)     n    in α ?
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Choose the damping, choose the ranking?

     6/43
                                                                        Is PageRank’s ranking
                                         P (u)   1−α                    robust to small variations
Local
computation of
                     P (v) = α                 +
PageRank: the                     u→v
                                         o(u)     n                     in α ?
ranking side
Introduction
Motivations
Local ranking in
theory               Results
Local ranking in
practice
Conclusions
                     1. not robust in theory (permutation theorem, reversal theorem)
psort, yet another
fast stable
                     2. novel tools for checking robustness (lineage analysis)
external sorting
software             3. somewhat robust in real-world graphs (experiments)
Introduction
Making sorting a
complicate task
Inside psort
                     Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?
Conclusions

Conclusions          J. Discrete Algorithms 8(2): 199-213 (2010)

                     Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?
                     Proc. of WAW 2009: 76-89
Is it possible to compute the rank locally?

     7/43

                        Local computation       Ranking
Local
computation of
PageRank: the
                                                                  0.15
ranking side
Introduction
Motivations
                                                  0.3      0.1
Local ranking in
theory                                 u
Local ranking in                                                 0.2
practice
Conclusions                                 v           0.25
psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Is it possible to compute the rank locally?

     7/43

                        Local computation       Ranking
Local
computation of
PageRank: the
                                                                     4th
                                                                     0.15
ranking side
Introduction                                      1st     5th
Motivations
                                                  0.3         0.1
                                                                    3rd
Local ranking in
theory                                 u
                                                                    0.2
                                                        2nd
Local ranking in
practice
Conclusions                                 v           0.25
psort, yet another
fast stable                                     In many applications
external sorting
software                                        only the rank matters!
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Is it possible to compute the rank locally?

     7/43

                        Local computation                Ranking
Local
computation of
PageRank: the
                                                                               4th
                                                                               0.15
ranking side
Introduction                                                1st     5th
Motivations
                                                            0.3         0.1
                                                                              3rd
Local ranking in
theory                                   u
                                                                              0.2
                                                                  2nd
Local ranking in
practice
Conclusions                                  v                    0.25
psort, yet another
fast stable                                              In many applications
external sorting
software                                                 only the rank matters!
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions             Is it possible to compute the rank locally?
                          • stated by Chen et al. (CIKM 2004)
                          • restated by Bar-Yossef and Mashiach (CIKM 2008)
Motivating examples (I): crawling

     8/43


Local
computation of
PageRank: the
                                       The visited graph expands starting
ranking side
Introduction
                                       from seed nodes.
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Motivating examples (I): crawling

     8/43


Local
computation of
PageRank: the
                                       The visited graph expands starting
ranking side
Introduction
                                       from seed nodes.
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Motivating examples (I): crawling

     8/43


Local
computation of
PageRank: the
                                       The visited graph expands starting
ranking side
Introduction
                                       from seed nodes.
Motivations
Local ranking in                       Which red nodes should be visited
theory
Local ranking in                       now? And in what order?
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Motivating examples (I): crawling

     8/43


Local
computation of
PageRank: the
                                                   The visited graph expands starting
ranking side
Introduction
                                                   from seed nodes.
Motivations
Local ranking in                                   Which red nodes should be visited
theory
Local ranking in                                   now? And in what order?
practice
Conclusions

psort, yet another                                 Order the nodes with PageRank!
fast stable
external sorting
software
                                                   Cho et al. Efficient crawling through URL
Introduction                                       ordering. Computer Networks, 1998.
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions

                     Is it possible to rank the red frontier for a low cost, without visiting
                     the whole crawled graph?
Motivating examples (II): ranking with
                     competitors
     9/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
                     Retrieve graph structure using e.g. Google’s link:
Inside psort
Conclusions          Bar-Yossef and Mashiach. Local approximation of PageRank and reverse
Conclusions          PageRank. Proc. ACM CIKM, 2008.
Motivating examples (II): ranking with
                     competitors
     9/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
                     Retrieve graph structure using e.g. Google’s link:
Inside psort
Conclusions          Bar-Yossef and Mashiach. Local approximation of PageRank and reverse
Conclusions          PageRank. Proc. ACM CIKM, 2008.
Motivating examples (II): ranking with
                     competitors
     9/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
                     Retrieve graph structure using e.g. Google’s link:
Inside psort
Conclusions          Bar-Yossef and Mashiach. Local approximation of PageRank and reverse
Conclusions          PageRank. Proc. ACM CIKM, 2008.
Motivating examples (II): ranking with
                     competitors
     9/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
                     Retrieve graph structure using e.g. Google’s link:
Inside psort
Conclusions          Bar-Yossef and Mashiach. Local approximation of PageRank and reverse
Conclusions          PageRank. Proc. ACM CIKM, 2008.
Motivating examples (II): ranking with
                     competitors
     9/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
                     Retrieve graph structure using e.g. Google’s link:
Inside psort
Conclusions          Bar-Yossef and Mashiach. Local approximation of PageRank and reverse
Conclusions          PageRank. Proc. ACM CIKM, 2008.

                     Is it possible to compute this rank efficiently, using few queries?
Motivating examples (III): social network
                     mining
    10/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
                     Rank key users in social networks
complicate task
Inside psort         Heidemann et al. Identifying key users in online social networks: A
Conclusions
                     PageRank based approach. Proc. ICIS, 2010.
Conclusions

                     Full graph not available (privacy settings).
Motivating examples (III): social network
                     mining
    10/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
                     Rank key users in social networks
complicate task
Inside psort         Heidemann et al. Identifying key users in online social networks: A
Conclusions
                     PageRank based approach. Proc. ICIS, 2010.
Conclusions

                     Full graph not available (privacy settings).
Motivating examples (III): social network
                     mining
    10/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
                     Rank key users in social networks
complicate task
Inside psort         Heidemann et al. Identifying key users in online social networks: A
Conclusions
                     PageRank based approach. Proc. ICIS, 2010.
Conclusions

                     Full graph not available (privacy settings).
Motivating examples (III): social network
                     mining
    10/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
                     Rank key users in social networks
complicate task
Inside psort         Heidemann et al. Identifying key users in online social networks: A
Conclusions
                     PageRank based approach. Proc. ICIS, 2010.
Conclusions

                     Full graph not available (privacy settings).
Motivating examples (III): social network
                     mining
    10/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
                     Rank key users in social networks
complicate task
Inside psort         Heidemann et al. Identifying key users in online social networks: A
Conclusions
                     PageRank based approach. Proc. ICIS, 2010.
Conclusions

                     Full graph not available (privacy settings).
                     Is it still possible to pretend correctness of the output ranking?
Formal definition of the problem

    11/43


Local                Input                              Output
computation of
PageRank: the
ranking side
                       • graph G of size n               • ranking of {v1 , v2 , . . . , vk }
Introduction
Motivations
Local ranking in
                       • target nodes v1 , . . . , vk      If (1 − ) < P (vj ) < (1 + )
                                                                       P (vi
                                                                             )
theory
Local ranking in       • score separation > 0              any ranking of {vi , vj } is valid
practice
Conclusions

psort, yet another
fast stable
external sorting
                     Cost Model
software
Introduction
                       • computation for free
Making sorting a
complicate task        • but visiting G costs
Inside psort
Conclusions              (query to link server)
Conclusions




                               cost of ranking = |queries| = |nodes visited|
Is it possible to compute the rank locally?

    12/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Is it possible to compute the rank locally?
                     Our contribution: NO!
    12/43


Local                NO in theory: lower bounds
computation of
PageRank: the
ranking side
Introduction         1. Every deterministic local ranking algorithm has an adversarial
Motivations
Local ranking in
                        graph forcing Ω(n) queries (and can be tightened)
theory
Local ranking in
practice             2. Every randomized local ranking algorithm has an adversarial
Conclusions

psort, yet another
                        graph forcing Ω(n) queries
fast stable
external sorting                       even to rank the top k nodes,
software
Introduction                     even if their scores are highly separated!
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
                      =⇒   a general low-cost local ranking algorithm does not exist
Is it possible to compute the rank locally?
                     Our contribution: NO!
    12/43


Local
computation of
PageRank: the
ranking side
Introduction         NO in practice: experimental results
Motivations
Local ranking in
theory
Local ranking in
practice
                     1. real web/social graphs behave like worst-case input instances
Conclusions             for local ranking
psort, yet another
fast stable
external sorting     2. approximating is not trivial:
software
Introduction
                        state-of-the-art local score approximation algorithms do not
Making sorting a
complicate task
                        turn into low-cost local rank approximation algorithms
Inside psort
Conclusions

Conclusions
Lower bounds (I): deterministic algorithms

    13/43

                     Every det.
Local
computation of
                     algorithm has an
PageRank: the
ranking side
                     adversarial graph
Introduction
                     forcing cost Ω(n)
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction          Theorem 1 (paper Thm. 4)
Making sorting a
complicate task
                                                                                                        α2
Inside psort            Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and        ≤   20k .   For
Conclusions
                        any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the
Conclusions
                        top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking
                        according to Pα (·), algorithm A performs Ω(n) queries.
Lower bounds (I): deterministic algorithms

    13/43

                     Every det.
Local
computation of
                     algorithm has an
PageRank: the
ranking side
                     adversarial graph
Introduction
                     forcing cost Ω(n)
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction          Theorem 1 (paper Thm. 4)
Making sorting a
complicate task
                                                                                                        α2
Inside psort            Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and        ≤   20k .   For
Conclusions
                        any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the
Conclusions
                        top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking
                        according to Pα (·), algorithm A performs Ω(n) queries.
Lower bounds (I): deterministic algorithms

    13/43

                     Every det.
Local
computation of
                     algorithm has an
PageRank: the
ranking side
                     adversarial graph
Introduction
                     forcing cost Ω(n)
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction          Theorem 1 (paper Thm. 4)
Making sorting a
complicate task
                                                                                                        α2
Inside psort            Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and        ≤   20k .   For
Conclusions
                        any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the
Conclusions
                        top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking
                        according to Pα (·), algorithm A performs Ω(n) queries.
Lower bounds (I): deterministic algorithms

    13/43

                     Every det.
Local
computation of
                     algorithm has an
PageRank: the
ranking side
                     adversarial graph
Introduction
                     forcing cost Ω(n)
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction          Theorem 1 (paper Thm. 4)
Making sorting a
complicate task
                                                                                                        α2
Inside psort            Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and        ≤   20k .   For
Conclusions
                        any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the
Conclusions
                        top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking
                        according to Pα (·), algorithm A performs Ω(n) queries.
Lower bounds (I): deterministic algorithms

    13/43

                     Every det.
Local
computation of
                     algorithm has an
PageRank: the
ranking side
                     adversarial graph
Introduction
                     forcing cost Ω(n)
Motivations
Local ranking in     n(1 − O( k))
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction          Theorem 1 (paper Thm. 4)
Making sorting a
complicate task
                                                                                                        α2
Inside psort            Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and        ≤   20k .   For
Conclusions
                        any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the
Conclusions
                        top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking
                        according to Pα (·), algorithm A performs Ω(n) n(1 − O( k)) queries.
Lower bounds (II): randomized algorithms

    14/43
                     Every rand.
                                                                                                      v1
                     (Las Vegas or                                                          v2




                                                                              link server
                                              ARANDOM
Local
computation of       Monte Carlo)                                                                                 graph G
PageRank: the
ranking side         algorithm has an                   ~104.5 queries                                            (109 nodes)
Introduction
Motivations
                     advers. graph                                                                   v20
Local ranking in
theory
                     forcing cost
Local ranking in
                     Ω α n                                 [v3 v10 ... v7]
practice
Conclusions

psort, yet another
fast stable
external sorting      Theorem 2 (paper Thm. 3)
software
Introduction
                                                                                                     α2 k2       α2
Making sorting a
                        Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and                ∈   4n0     ,   24k   . Then
complicate task
Inside psort               1. for any Las Vegas local algorithm A
Conclusions
                           2. for any Monte Carlo local algorithm A with constant confidence
Conclusions
                        there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are
                                                                                                                            n
                         -separated and, to compute their relative ranking, A performs in expectation Ω α
                        queries.
Lower bounds (II): randomized algorithms

    14/43
                     Every rand.
                                                                                                      v1
                     (Las Vegas or                                                          v2




                                                                              link server
                                              ARANDOM
Local
computation of       Monte Carlo)                                                                                 graph G
PageRank: the
ranking side         algorithm has an                   ~104.5 108 queries                                        (109 nodes)
Introduction
Motivations
                     advers. graph                                                                   v20
Local ranking in
theory
                     forcing cost
Local ranking in
                     Ω α n Ω(n)                            [v3 v10 ... v7]
practice
Conclusions

psort, yet another
fast stable
external sorting      Theorem 2 (paper Thm. 3)
software
Introduction
                                                                                                     α2 k2       α2
Making sorting a
                        Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and                ∈   4n0     ,   24k   . Then
complicate task
Inside psort               1. for any Las Vegas local algorithm A
Conclusions
                           2. for any Monte Carlo local algorithm A with constant confidence
Conclusions
                        there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are
                                                                                                                            n
                         -separated and, to compute their relative ranking, A performs in expectation Ω α
                        queries.
What happens in practice?

    15/43
                     Two experiments
Local
computation of       1. Hardness of real-world graphs
PageRank: the
ranking side
Introduction
                     Compute the minimal number of nodes that an algorithm must
Motivations
Local ranking in
                     visit to always guarantee a correct ranking.
theory
Local ranking in
practice
Conclusions          2. Performance of approximation algorithms
psort, yet another
fast stable          Evaluate cost and accuracy of local ranking algorithms derived
external sorting
software             from state-of-the-art local score approximation algorithms.
Introduction
Making sorting a
complicate task
Inside psort         Datasets
Conclusions

Conclusions
                                     nodes    arcs      crawled     publicly available from LAW
                              .it    40M      1150M     2004        - Univ. Milan
                       LiveJournal   5M       79M       2008        http://law.dsi.unimi.it
Exp. 1: hardness of real-world graphs (1/2)

    16/43


Local
computation of       Breakdown of a local ranking algorithm
PageRank: the
ranking side
Introduction
Motivations             1. Visit ancestors              2. Compute ranking
Local ranking in
theory
Local ranking in
practice                Thm.: must visit at least       Thm.: must agree with
Conclusions
                        | minset(G, u, v)|              natural PageRank score
psort, yet another
fast stable             ancestors                       approximation
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Exp. 1: hardness of real-world graphs (1/2)

    16/43


Local
computation of       Breakdown of a local ranking algorithm
PageRank: the
ranking side
Introduction
Motivations             1. Visit ancestors               2. Compute ranking
Local ranking in
theory
Local ranking in
practice                Thm.: must visit at least        Thm.: must agree with
Conclusions
                        | minset(G, u, v)|               natural PageRank score
psort, yet another
fast stable             ancestors                        approximation
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions          | minset(G, u, v)| ≤ cost of ranking u, v in graph G
Exp. 1: hardness of real-world graphs (2/2)

    17/43


                                                       107
                     average number of visited nodes
Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
                                                       106
theory
Local ranking in
practice
Conclusions
                                                         5
psort, yet another
                                                       10
fast stable
external sorting
software

                                                       104
Introduction


                                                                                         .it web graph
Making sorting a
complicate task
Inside psort
Conclusions
                                                                                    LiveJournal graph
Conclusions                                            103
                                                             2.56 1.28 .64   .32   .16   .08   .04   .02   .01
                                                                                   ε
Exp. 2: performance of approximation
                     algorithms
    18/43
                     Improved variant of the pruned bruteforce algorithm: limit
Local
                     PageRank computation to ancestors giving a high contribution.
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions

                                                    v
                                                                  pruning
                                             threshold = 10%
Exp. 2: performance of approximation
                     algorithms
    18/43
                     Improved variant of the pruned bruteforce algorithm: limit
Local
                     PageRank computation to ancestors giving a high contribution.
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
                                              10%
Introduction
Making sorting a
complicate task
                                                            35%
                                                    24%
Inside psort
Conclusions                        17%
Conclusions

                                                    v
                                                                  pruning
                                             threshold = 10%
Exp. 2: performance of approximation
                     algorithms
    18/43
                     Improved variant of the pruned bruteforce algorithm: limit
Local
                     PageRank computation to ancestors giving a high contribution.
computation of
PageRank: the
ranking side                                              <10%
Introduction
Motivations                              <10%
Local ranking in
theory
                                                                 <10%
                                                <10%
                                                          <10%
Local ranking in
practice
Conclusions                      <10%
psort, yet another
fast stable
external sorting
software
                                                10%
Introduction
Making sorting a
complicate task
                                                                 35%
                                                      24%
Inside psort
Conclusions                        17%
Conclusions

                                                      v
                                                                        pruning
                                             threshold = 10%
Exp. 2: performance of approximation
                     algorithms
    19/43

                                                        .it web graph
                                    106
Local
computation of
PageRank: the
ranking side
Introduction
Motivations


                                                                        (2.56,5.12)
Local ranking in
                                      5
                     average cost




                                    10
theory
Local ranking in
practice                                                                (1.28,2.56)
Conclusions
                                                                        (0.64,1.28)
psort, yet another
fast stable
                                                                        (0.32,0.64)
                                                                        (0.16,0.32)
                                    104
external sorting
software
Introduction
                                                                        (0.08,0.16)
Making sorting a                                                        (0.04,0.08)
                                                                        (0.02,0.04)
complicate task
Inside psort
Conclusions
                                                                        (0.01,0.02)
                                      3
Conclusions                         10
                                         10-1   10-2   10-3    10-4      10-5     10-6   10-7
                                                         pruning threshold
Exp. 2: performance of approximation
                     algorithms
    20/43

                                                       LiveJournal graph
                                    106
Local
computation of
PageRank: the
ranking side
Introduction
Motivations


                                                                           (2.56,5.12)
Local ranking in
                                      5
                     average cost




                                    10
theory
Local ranking in
practice                                                                   (1.28,2.56)
Conclusions
                                                                           (0.64,1.28)
psort, yet another
fast stable
                                                                           (0.32,0.64)
                                                                           (0.16,0.32)
                                    104
external sorting
software
Introduction
                                                                           (0.08,0.16)
Making sorting a                                                           (0.04,0.08)
                                                                           (0.02,0.04)
complicate task
Inside psort
Conclusions
                                                                           (0.01,0.02)
                                      3
Conclusions                         10
                                         10-1   10-2   10-3    10-4      10-5        10-6   10-7
                                                         pruning threshold
Exp. 2: performance of approximation
                     algorithms
    21/43

                                                                                .it web graph
                     fraction of correctly ranked node pairs
Local
computation of
PageRank: the                                                    1
ranking side

                                                               0.8
Introduction
Motivations
Local ranking in

                                                                                                 (2.56,5.12)
theory
Local ranking in                                               0.6
practice
                                                                                                 (1.28,2.56)
                                                                                                 (0.64,1.28)
Conclusions

                                                               0.4
                                                                                                 (0.32,0.64)
psort, yet another
fast stable
external sorting
                                                                                                 (0.16,0.32)
software
Introduction
                                                               0.2                               (0.08,0.16)
Making sorting a
                                                                                                 (0.04,0.08)
                                                                 0
complicate task
Inside psort                                                                                     (0.02,0.04)
Conclusions
                                                                                                 (0.01,0.02)
Conclusions                                                    -0.2
                                                                  10-1   10-2   10-3    10-4      10-5     10-6   10-7
                                                                                  pruning threshold
Exp. 2: performance of approximation
                     algorithms
    22/43

                                                                                LiveJournal graph
                     fraction of correctly ranked node pairs
Local
computation of
PageRank: the                                                    1
ranking side

                                                               0.8
Introduction
Motivations
Local ranking in

                                                                                                    (2.56,5.12)
theory
Local ranking in                                               0.6
practice
                                                                                                    (1.28,2.56)
                                                                                                    (0.64,1.28)
Conclusions

                                                               0.4
                                                                                                    (0.32,0.64)
psort, yet another
fast stable
external sorting
                                                                                                    (0.16,0.32)
software
Introduction
                                                               0.2                                  (0.08,0.16)
Making sorting a
                                                                                                    (0.04,0.08)
                                                                 0
complicate task
Inside psort                                                                                        (0.02,0.04)
Conclusions
                                                                                                    (0.01,0.02)
Conclusions                                                    -0.2
                                                                  10-1   10-2    10-3    10-4      10-5       10-6   10-7
                                                                                   pruning threshold
Conclusions

    23/43


Local
computation of
PageRank: the
ranking side         1. Local computation of PageRank ranking is infeasible
Introduction
Motivations
Local ranking in
theory
Local ranking in
                     2. Cost of exact local ranking algorithms bounded by minsets
practice
Conclusions

psort, yet another   3. Tested real web/social graphs are near worst-case
fast stable
external sorting
software
Introduction
Making sorting a
                     4. And approximation is not trivial
complicate task
Inside psort
Conclusions

Conclusions          Marco Bressan, Luca Pretto. Local computation of PageRank: the ranking side.
                     Proc. of CIKM 2011: 631-640
24/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions
                     psort, yet another fast stable
psort, yet another
fast stable
                       external sorting software
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
In a nutshell

    25/43

                     the psort sorting library
Local
computation of
PageRank: the          • written in C++
ranking side
Introduction           • handles large datasets (> TB)
Motivations
Local ranking in
theory
                       • stable sorting
Local ranking in
practice               • fast
Conclusions

psort, yet another
                       • designed for PC-class machines
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
In a nutshell

    25/43

                     the psort sorting library
Local
computation of
PageRank: the          • written in C++
ranking side
Introduction           • handles large datasets (> TB)
Motivations
Local ranking in
theory
                       • stable sorting
Local ranking in
practice               • fast
Conclusions

psort, yet another
                       • designed for PC-class machines
fast stable
external sorting
software
Introduction         ideal applications of psort
Making sorting a
complicate task
Inside psort
                       • sorting large databases
Conclusions
                       • sorting large log files
Conclusions
                       • sorting on commodity machines
                       • ...
psort and the Sort Benchmark (1/2)

    26/43
                     The PennySort Benchmark
Local                Sort what you can in 0.01$ of computing time.
computation of
PageRank: the
ranking side
Introduction                                                          400 GB
                                     yearly record (Sort Benchmark)




                                                                                                              t
                                                                                                            or
Motivations
Local ranking in
                                                                      350 GB




                                                                                                          ps
theory
Local ranking in                                                      300 GB
practice
Conclusions                                                           250 GB
psort, yet another
fast stable
                                                                      200 GB
external sorting
software
                                                                      150 GB
Introduction
                                                                      100 GB
Making sorting a
complicate task
Inside psort
                                                                      50 GB
Conclusions
                                                                       0 GB
                                                                          98

                                                                               99

                                                                                    00

                                                                                          02

                                                                                                03

                                                                                                      07

                                                                                                            08

                                                                                                                   09

                                                                                                                        11
Conclusions
                                                                         19

                                                                               19

                                                                                    20

                                                                                         20

                                                                                               20

                                                                                                     20

                                                                                                           20

                                                                                                                   20

                                                                                                                        20
                                                                                Source: http://sortbenchmark.org
                     Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software.
                     ACM Journal of Experimental Algorithmics 16: (2011)
psort and the Sort Benchmark (2/2)

    27/43
                     The Datamation Benchmark
Local                Sort 100MB disk-to-disk as fast as you can.
computation of
PageRank: the
ranking side
Introduction
Motivations
                                               980 s
Local ranking in                               thunder (1987)
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a                                                      440 ms
complicate task
Inside psort
                                                                      NOW-sort (2001)
Conclusions                                                                             psort (2011)
Conclusions




                     Paolo Bertasi, Michele Bonazza, Marco Bressan, Enoch Peserico: Datamation. A Quarter of a
                     Century and Four Orders of Magnitude Later. CLUSTER 2011: 605-609
psort and the STXXL library

    28/43
                                                200
                                                                                      stxxl on disks (8,8)
                                                                                      stxxl on disks (8,32)
Local                                           180                                   stxxl on disks (8,128)
computation of
PageRank: the
                                                                                      stxxl on RAID (8,8)
ranking side                                    160                                   stxxl on RAID (8,32)
Introduction                                                                          stxxl on RAID (8,128)
Motivations                                                                           psort on RAID (8,8)
                                                140
Local ranking in                                                                      psort on RAID (8,32)
                         sort speed (in MB/s)




theory                                                                                psort on RAID (8,128)
Local ranking in                                120
practice
Conclusions
                                                100
psort, yet another
fast stable
external sorting                                 80
software
Introduction
                                                 60
Making sorting a
complicate task
Inside psort                                     40
Conclusions

Conclusions                                      20

                                                  0    1    2                     3               4
                                                      10   10                    10            10
                                                                sort size (in MB)
Machine budget for Sort Benchmark 2011

    29/43

                                         RAM
Local
computation of
                         Motherboard     47 EUR
PageRank: the
                             60 EUR               CPU
                                                  38 EUR
ranking side
Introduction
Motivations
Local ranking in
theory                                                Case
Local ranking in                                      22 EUR
                                                       Power Supply Unit
practice
Conclusions

psort, yet another                                     15 EUR
fast stable
external sorting
software                                              Assembly fee
Introduction
Making sorting a
                                                      35 EUR
complicate task
Inside psort
Conclusions

Conclusions
                            Hard Disks
                              215 EUR
The big picture

    30/43

                     psort execution diagram
Local
computation of
PageRank: the

                                                                      1MB, 10GB/s
ranking side
Introduction                             CPU/cache
Motivations
Local ranking in
theory
Local ranking in             mergesort   heap merge   heap merge
practice
Conclusions

psort, yet another
fast stable
external sorting
                                     main memory                      1GB, 3GB/s
software
Introduction
Making sorting a
complicate task                   1st disk pass       2nd disk pass
Inside psort
Conclusions

Conclusions
                                   external memory                    1TB, 0.7GB/s

                                             time
The big picture - now complicated

    31/43
                            Hardware/software details you must deal with:
Local
computation of
PageRank: the
ranking side
Introduction
                                      • hdd quality     • buffer size
Motivations
Local ranking in
                               I/O    • file system      • direct transfer
theory
Local ranking in
practice
                                      • scheduling      • data placement
Conclusions

psort, yet another
fast stable                           • size             • page size
external sorting
software                   memory     • bandwidth        • access pattern
Introduction
Making sorting a
complicate task                       • latency          • conflicts
Inside psort
Conclusions

Conclusions
                                      • size             • line size
                             cache
                                      • speed            • associativity
Hard disks

    32/43

                          The speed curve of 13 “identical” WD1600JS disks
Local
computation of                                 150
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
                                               100
                            Bandwidth (MB/s)




practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction                                    50
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
                                                 0
                                                  0        50                    100        150
                                                      Distance from the outer rim (in GB)
Memory

    33/43

                                                  Why main memory is not really a RAM
Local
computation of                              4.5
PageRank: the
ranking side
Introduction                                4
Motivations
Local ranking in                            3.5
theory
Local ranking in
                                            3
                         bandwidth (GB/s)




practice
Conclusions
                                            2.5
psort, yet another
fast stable
external sorting                            2



                                                                   L2 cache line size
software
Introduction                                1.5
Making sorting a                                                                                                sequential read
complicate task                                                                                                 random read
                                            1
Inside psort                                                                                                    sequential write
Conclusions                                                                                                     random write
                                            0.5
Conclusions

                                             20     22   24   26                        28     210        212     214        216   218
                                                                                    struct size (bytes)
CPU

    34/43

                                                Is a dual-core always worth its price?
Local
computation of
PageRank: the                                  3e+10
ranking side                                                         Intel dual core read
Introduction                                                        Intel dual core write
Motivations                                   2.5e+10              AMD single core read
Local ranking in
                                                                   AMD single core write
                           bandwidth (MB/s)


theory
Local ranking in
practice
                                               2e+10
Conclusions

psort, yet another                            1.5e+10
fast stable
external sorting
software
Introduction
                                               1e+10
Making sorting a
complicate task
Inside psort                                   5e+09
Conclusions

Conclusions                                        0
                                                        16   18   20 22 24 26               28   30
                                                                  log2( bytes visited )
A list of psort’s tricks

    35/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
A list of psort’s tricks

    35/43
                                   • fast polling   • key pre/post
Local                    general   • payload          processing
computation of
PageRank: the
ranking side                         detachment     • ...
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
A list of psort’s tricks

    35/43
                                    • fast polling   • key pre/post
Local                    general    • payload          processing
computation of
PageRank: the
ranking side                          detachment     • ...
Introduction
Motivations
Local ranking in
                                    • O_DIRECT
theory
Local ranking in
                             disk                    • uniform fetching
practice
                          access    • independent
Conclusions                                          • ...
psort, yet another
                                      disks
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
A list of psort’s tricks

    35/43
                                    • fast polling     • key pre/post
Local                    general    • payload            processing
computation of
PageRank: the
ranking side                          detachment       • ...
Introduction
Motivations
Local ranking in
                                    • O_DIRECT
theory
Local ranking in
                             disk                      • uniform fetching
practice
                          access    • independent
Conclusions                                            • ...
psort, yet another
                                      disks
fast stable
external sorting
software
                                    • smart merging    • special base case
Introduction
Making sorting a
                       mergesort
complicate task                     • quasi-in-place   • ...
Inside psort
Conclusions

Conclusions
A list of psort’s tricks

    35/43
                                    • fast polling     • key pre/post
Local                    general    • payload            processing
computation of
PageRank: the
ranking side                          detachment       • ...
Introduction
Motivations
Local ranking in
                                    • O_DIRECT
theory
Local ranking in
                             disk                      • uniform fetching
practice
                          access    • independent
Conclusions                                            • ...
psort, yet another
                                      disks
fast stable
external sorting
software
                                    • smart merging    • special base case
Introduction
Making sorting a
                       mergesort
complicate task                     • quasi-in-place   • ...
Inside psort
Conclusions

Conclusions
                                    • key caching      • payload interleaving
                        heapsort
                                    • key offsetting   • ...
A list of psort’s tricks

    35/43
                                    • fast polling     • key pre/post
Local                    general    • payload            processing
computation of
PageRank: the
ranking side                          detachment       • ...
Introduction
Motivations
Local ranking in
                                    • O_DIRECT
theory
Local ranking in
                             disk                      • uniform fetching
practice
                          access    • independent
Conclusions                                            • ...
psort, yet another
                                      disks
fast stable
external sorting
software
                                    • smart merging    • special base case
Introduction
Making sorting a
                       mergesort
complicate task                     • quasi-in-place   • ...
Inside psort
Conclusions

Conclusions
                                    • key caching      • payload interleaving
                        heapsort
                                    • key offsetting   • ...
Smart merging (1/3)

    36/43
                                         Naive merging
Local
computation of
PageRank: the
ranking side         void merge(T *s1, T *s2, T *out, int size) {
Introduction
Motivations
                       int i = 0, j = 0, k = 0;
Local ranking in       bool bit;
theory
                       while ((i < size) & (j < size)) {
Local ranking in
practice                 if (s1[i] > s2[j]) { // READ + READ
Conclusions                out[k] = s2[j];    // READ
psort, yet another         j++;
fast stable
external sorting
                         } else {
software                   out[k] = s1[i];    // (READ)
Introduction
                           i++;
Making sorting a
complicate task          }
Inside psort             k++;
Conclusions
                         ...
Conclusions
Smart merging (1/3)

    36/43
                                          Naive merging
Local
computation of
PageRank: the
ranking side         void merge(T *s1, T *s2, T *out, int size) {
Introduction
Motivations
                       int i = 0, j = 0, k = 0;
Local ranking in       bool bit;
theory
                       while ((i < size) & (j < size)) {
Local ranking in
practice                 if (s1[i] > s2[j]) { // READ + READ
Conclusions                out[k] = s2[j];    // READ
psort, yet another         j++;
fast stable
external sorting
                         } else {
software                   out[k] = s1[i];    // (READ)
Introduction
                           i++;
Making sorting a
complicate task          }
Inside psort             k++;
Conclusions
                         ...
Conclusions


                                  total mem READs per iteration: 3
Smart merging (2/3)

    37/43                                Smart merging

Local                void merge(T* s1, T* s2, T* out, int size) {
computation of
PageRank: the          int i = 0, j = 0, k = 0;
ranking side           bool bit;
Introduction
Motivations
                       T cache[ 2 ];
Local ranking in       cache[0] = s1[0];
theory
Local ranking in
                       cache[1] = s2[0];
practice               while ((i < size) & (j < size)) {
Conclusions
                         if (cache[0] > cache[1]) {
psort, yet another         out[k] = cache[1];
fast stable
external sorting           cache[1] = s2[j]; // READ
software                   j++;
Introduction
Making sorting a
                         } else {
complicate task            out[k] = cache[0];
Inside psort
Conclusions
                           cache[0] = s1[i]; // (READ)
                           i++;
Conclusions
                         }
                         k++;
                         ...
Smart merging (2/3)

    37/43                                 Smart merging

Local                void merge(T* s1, T* s2, T* out, int size) {
computation of
PageRank: the          int i = 0, j = 0, k = 0;
ranking side           bool bit;
Introduction
Motivations
                       T cache[ 2 ];
Local ranking in       cache[0] = s1[0];
theory
Local ranking in
                       cache[1] = s2[0];
practice               while ((i < size) & (j < size)) {
Conclusions
                         if (cache[0] > cache[1]) {
psort, yet another         out[k] = cache[1];
fast stable
external sorting           cache[1] = s2[j]; // READ
software                   j++;
Introduction
Making sorting a
                         } else {
complicate task            out[k] = cache[0];
Inside psort
Conclusions
                           cache[0] = s1[i]; // (READ)
                           i++;
Conclusions
                         }
                         k++;
                         ...

                                  total mem READs per iteration: 1
Smart merging (3/3)

    38/43
                                                             Time required to merge two sequences
Local                                        800000
computation of                                             smart merge
PageRank: the                                              naive merge
ranking side                                 700000
Introduction
Motivations
Local ranking in
                                             600000
theory
Local ranking in
                      time in microseconds




practice                                     500000
Conclusions

psort, yet another                           400000
fast stable
external sorting
software                                     300000
Introduction
Making sorting a
complicate task                              200000
Inside psort
Conclusions
                                             100000
Conclusions

                                                 0
                                                      10        12       14   16            18     20   22   24
                                                                              log2( merge size )
Quasi-in-place mergesort (1/3)

    39/43
                                       traditional mergesort

Local
computation of
PageRank: the        void mergesort(T* input, T* output, int size) {
ranking side
Introduction
                       for (int i = 1; i < log2(size); i++) {
Motivations
                         int subsize = 1 << (i + 1);
Local ranking in         for (int j = 0; j < size/subsize; j++) {
theory
Local ranking in
                           merge(&input[j * subsize],
practice                         &input[(j + 1) * subsize],
Conclusions
                                 &output[j * subsize * 2],
psort, yet another
fast stable
                                 subsize);
external sorting           T* tmp = input; // swap input and output
software                   input = output;
Introduction
Making sorting a
                           output = tmp;
complicate task          }
Inside psort
Conclusions
                       }
                     }
Conclusions
Quasi-in-place mergesort (1/3)

    39/43
                                       traditional mergesort

Local
computation of
PageRank: the        void mergesort(T* input, T* output, int size) {
ranking side
Introduction
                       for (int i = 1; i < log2(size); i++) {
Motivations
                         int subsize = 1 << (i + 1);
Local ranking in         for (int j = 0; j < size/subsize; j++) {
theory
Local ranking in
                           merge(&input[j * subsize],
practice                         &input[(j + 1) * subsize],
Conclusions
                                 &output[j * subsize * 2],
psort, yet another
fast stable
                                 subsize);
external sorting           T* tmp = input; // swap input and output
software                   input = output;
Introduction
Making sorting a
                           output = tmp;
complicate task          }
Inside psort
Conclusions
                       }
                     }
Conclusions




                                         extra space = N
Quasi-in-place mergesort (2/3)

    40/43                           “quasi-in-place” mergesort

Local
computation of
PageRank: the
                     void mergesort(T* input, T* output, int size) {
ranking side           for (int i = 1; i < log2(size/2); i++) {
Introduction             int subsize = 1 << (i + 1);
Motivations
Local ranking in
                         for (int j = 0; j < size/subsize; j++) {
theory                     /* merge, overwriting the input vector */
Local ranking in
practice                   merge(&input[j * subsize],
Conclusions                      &input[(j + 1) * subsize],
psort, yet another               &input[(j - 1) * subsize],
fast stable
external sorting
                                 subsize);
software                 }
Introduction             input = &input[-subsize]; // shift input left
Making sorting a
complicate task        }
Inside psort           // finally merge into the output vector
Conclusions
                       merge(input, &input[size/2], output, size/2);
Conclusions          }
Quasi-in-place mergesort (2/3)

    40/43                           “quasi-in-place” mergesort

Local
computation of
PageRank: the
                     void mergesort(T* input, T* output, int size) {
ranking side           for (int i = 1; i < log2(size/2); i++) {
Introduction             int subsize = 1 << (i + 1);
Motivations
Local ranking in
                         for (int j = 0; j < size/subsize; j++) {
theory                     /* merge, overwriting the input vector */
Local ranking in
practice                   merge(&input[j * subsize],
Conclusions                      &input[(j + 1) * subsize],
psort, yet another               &input[(j - 1) * subsize],
fast stable
external sorting
                                 subsize);
software                 }
Introduction             input = &input[-subsize]; // shift input left
Making sorting a
complicate task        }
Inside psort           // finally merge into the output vector
Conclusions
                       merge(input, &input[size/2], output, size/2);
Conclusions          }


                                        extra space = N/2
Quasi-in-place mergesort (3/3)

    41/43
                                                    Average time required to compare two keys
Local                                     4
computation of
PageRank: the
ranking side                             3.5
Introduction
Motivations
Local ranking in                          3
theory
Local ranking in
                                         2.5
                      relative unities




practice
Conclusions

psort, yet another                        2
fast stable
external sorting
software                                 1.5
Introduction
Making sorting a
complicate task                           1
Inside psort
Conclusions
                                         0.5
Conclusions
                                                                                              quasi-in-place
                                          0
                                               10       12    14          16        18         20      22      24
                                                                   log2( input size in bytes )
Conclusions

    42/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
                     1. Solving old problems really fast is still tricky
Local ranking in
theory
Local ranking in
practice             2. To do it, you must match today’s hardware
Conclusions

psort, yet another
fast stable
external sorting     3. Solution: software engineering and tuning
software
Introduction
Making sorting a
complicate task
Inside psort         Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software.
Conclusions
                     ACM Journal of Experimental Algorithmics 16: (2011)
Conclusions
Conclusions

    43/43


Local
computation of
PageRank: the
ranking side
Introduction
Motivations
Local ranking in
theory
Local ranking in
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Conclusions

    43/43


Local                Ranking
computation of
PageRank: the
ranking side
                     1. Local computation of PageRank ranking infeasible in theory
Introduction
Motivations          2. On tested web/social graphs, infeasible also in practice
Local ranking in
theory
Local ranking in
                     3. Rank analysis requires novel tools!
practice
Conclusions

psort, yet another
fast stable
external sorting
software
Introduction
Making sorting a
complicate task
Inside psort
Conclusions

Conclusions
Conclusions

    43/43


Local                Ranking
computation of
PageRank: the
ranking side
                     1. Local computation of PageRank ranking infeasible in theory
Introduction
Motivations          2. On tested web/social graphs, infeasible also in practice
Local ranking in
theory
Local ranking in
                     3. Rank analysis requires novel tools!
practice
Conclusions

psort, yet another
fast stable
                     Sorting
external sorting
software             1. Solving old problems really fast is still tricky
Introduction
Making sorting a
complicate task
                     2. To do it, you must match today’s hardware
Inside psort
Conclusions          3. Software engineering and tuning are the ways
Conclusions
Conclusions

    43/43


Local                Ranking
computation of
PageRank: the
ranking side
                     1. Local computation of PageRank ranking infeasible in theory
Introduction
Motivations          2. On tested web/social graphs, infeasible also in practice
Local ranking in
theory
Local ranking in
                     3. Rank analysis requires novel tools!
practice
Conclusions

psort, yet another
fast stable
                     Sorting
external sorting
software             1. Solving old problems really fast is still tricky
Introduction
Making sorting a
complicate task
                     2. To do it, you must match today’s hardware
Inside psort
Conclusions          3. Software engineering and tuning are the ways
Conclusions



                     And of course now you should pay me twice! :-)

Más contenido relacionado

Último

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 

Último (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Two Unrelated Talks

  • 1. 1/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Two unrelated talks Local ranking in practice Conclusions psort, yet another M ARCO B RESSAN fast stable external sorting software Introduction Making sorting a complicate task Inside psort January 30, 2012 Conclusions Conclusions
  • 2. Outline 2/43 1 Local computation of PageRank: the ranking side Local computation of Introduction PageRank: the ranking side Motivations Introduction Motivations Local ranking in theory Local ranking in theory Local ranking in practice Local ranking in practice Conclusions Conclusions psort, yet another fast stable 2 psort, yet another fast stable external sorting software external sorting software Introduction Introduction Making sorting a Making sorting a complicate task complicate task Inside psort Inside psort Conclusions Conclusions Conclusions 3 Conclusions
  • 3. 3/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions Local computation of PageRank: psort, yet another fast stable the ranking side external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 4. Ranking robustly 4/43 Local Rank a graph’s nodes computation of PageRank: the ranking side Introduction 1. the graph 2. external factors Motivations Local ranking in theory • (varying) parameters Local ranking in practice Conclusions • graph availability psort, yet another fast stable • ... external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 5. Ranking robustly 4/43 Local Rank a graph’s nodes computation of PageRank: the ranking side Introduction 1. the graph 2. external factors Motivations Local ranking in theory • (varying) parameters Local ranking in practice Conclusions • graph availability psort, yet another fast stable • ... external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Is ranking robust? Conclusions How is ranking influenced by external factors?
  • 6. PageRank 5/43 Local PageRank of node v: computation of PageRank: the ranking side u P (u) Introduction P (v) = Motivations u→v o(u) v Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 7. PageRank 5/43 Local PageRank of node v: computation of PageRank: the ranking side u P (u) 1−α Introduction P (v) = α + Motivations u→v o(u) n v Local ranking in theory Local ranking in practice n = |G| α = damping factor Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 8. PageRank 5/43 Local PageRank of node v: computation of PageRank: the ranking side u P (u) 1−α Introduction P (v) = α + Motivations u→v o(u) n v Local ranking in theory Local ranking in practice n = |G| α = damping factor Conclusions psort, yet another fast stable Applications external sorting software web search, web crawling, web spam detection, personalized web search, social network Introduction Making sorting a mining, ranking in databases, structural re-ranking, opinion mining, word sense complicate task Inside psort disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . . Conclusions Conclusions Among top data mining algorithms Wu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.
  • 9. Choose the damping, choose the ranking? 6/43 Is PageRank’s ranking P (u) 1−α robust to small variations Local computation of P (v) = α + PageRank: the u→v o(u) n in α ? ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 10. Choose the damping, choose the ranking? 6/43 Is PageRank’s ranking P (u) 1−α robust to small variations Local computation of P (v) = α + PageRank: the u→v o(u) n in α ? ranking side Introduction Motivations Local ranking in theory Results Local ranking in practice Conclusions 1. not robust in theory (permutation theorem, reversal theorem) psort, yet another fast stable 2. novel tools for checking robustness (lineage analysis) external sorting software 3. somewhat robust in real-world graphs (experiments) Introduction Making sorting a complicate task Inside psort Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking? Conclusions Conclusions J. Discrete Algorithms 8(2): 199-213 (2010) Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking? Proc. of WAW 2009: 76-89
  • 11. Is it possible to compute the rank locally? 7/43 Local computation Ranking Local computation of PageRank: the 0.15 ranking side Introduction Motivations 0.3 0.1 Local ranking in theory u Local ranking in 0.2 practice Conclusions v 0.25 psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 12. Is it possible to compute the rank locally? 7/43 Local computation Ranking Local computation of PageRank: the 4th 0.15 ranking side Introduction 1st 5th Motivations 0.3 0.1 3rd Local ranking in theory u 0.2 2nd Local ranking in practice Conclusions v 0.25 psort, yet another fast stable In many applications external sorting software only the rank matters! Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 13. Is it possible to compute the rank locally? 7/43 Local computation Ranking Local computation of PageRank: the 4th 0.15 ranking side Introduction 1st 5th Motivations 0.3 0.1 3rd Local ranking in theory u 0.2 2nd Local ranking in practice Conclusions v 0.25 psort, yet another fast stable In many applications external sorting software only the rank matters! Introduction Making sorting a complicate task Inside psort Conclusions Conclusions Is it possible to compute the rank locally? • stated by Chen et al. (CIKM 2004) • restated by Bar-Yossef and Mashiach (CIKM 2008)
  • 14. Motivating examples (I): crawling 8/43 Local computation of PageRank: the The visited graph expands starting ranking side Introduction from seed nodes. Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 15. Motivating examples (I): crawling 8/43 Local computation of PageRank: the The visited graph expands starting ranking side Introduction from seed nodes. Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 16. Motivating examples (I): crawling 8/43 Local computation of PageRank: the The visited graph expands starting ranking side Introduction from seed nodes. Motivations Local ranking in Which red nodes should be visited theory Local ranking in now? And in what order? practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 17. Motivating examples (I): crawling 8/43 Local computation of PageRank: the The visited graph expands starting ranking side Introduction from seed nodes. Motivations Local ranking in Which red nodes should be visited theory Local ranking in now? And in what order? practice Conclusions psort, yet another Order the nodes with PageRank! fast stable external sorting software Cho et al. Efficient crawling through URL Introduction ordering. Computer Networks, 1998. Making sorting a complicate task Inside psort Conclusions Conclusions Is it possible to rank the red frontier for a low cost, without visiting the whole crawled graph?
  • 18. Motivating examples (II): ranking with competitors 9/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Retrieve graph structure using e.g. Google’s link: Inside psort Conclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverse Conclusions PageRank. Proc. ACM CIKM, 2008.
  • 19. Motivating examples (II): ranking with competitors 9/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Retrieve graph structure using e.g. Google’s link: Inside psort Conclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverse Conclusions PageRank. Proc. ACM CIKM, 2008.
  • 20. Motivating examples (II): ranking with competitors 9/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Retrieve graph structure using e.g. Google’s link: Inside psort Conclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverse Conclusions PageRank. Proc. ACM CIKM, 2008.
  • 21. Motivating examples (II): ranking with competitors 9/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Retrieve graph structure using e.g. Google’s link: Inside psort Conclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverse Conclusions PageRank. Proc. ACM CIKM, 2008.
  • 22. Motivating examples (II): ranking with competitors 9/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Retrieve graph structure using e.g. Google’s link: Inside psort Conclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverse Conclusions PageRank. Proc. ACM CIKM, 2008. Is it possible to compute this rank efficiently, using few queries?
  • 23. Motivating examples (III): social network mining 10/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a Rank key users in social networks complicate task Inside psort Heidemann et al. Identifying key users in online social networks: A Conclusions PageRank based approach. Proc. ICIS, 2010. Conclusions Full graph not available (privacy settings).
  • 24. Motivating examples (III): social network mining 10/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a Rank key users in social networks complicate task Inside psort Heidemann et al. Identifying key users in online social networks: A Conclusions PageRank based approach. Proc. ICIS, 2010. Conclusions Full graph not available (privacy settings).
  • 25. Motivating examples (III): social network mining 10/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a Rank key users in social networks complicate task Inside psort Heidemann et al. Identifying key users in online social networks: A Conclusions PageRank based approach. Proc. ICIS, 2010. Conclusions Full graph not available (privacy settings).
  • 26. Motivating examples (III): social network mining 10/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a Rank key users in social networks complicate task Inside psort Heidemann et al. Identifying key users in online social networks: A Conclusions PageRank based approach. Proc. ICIS, 2010. Conclusions Full graph not available (privacy settings).
  • 27. Motivating examples (III): social network mining 10/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a Rank key users in social networks complicate task Inside psort Heidemann et al. Identifying key users in online social networks: A Conclusions PageRank based approach. Proc. ICIS, 2010. Conclusions Full graph not available (privacy settings). Is it still possible to pretend correctness of the output ranking?
  • 28. Formal definition of the problem 11/43 Local Input Output computation of PageRank: the ranking side • graph G of size n • ranking of {v1 , v2 , . . . , vk } Introduction Motivations Local ranking in • target nodes v1 , . . . , vk If (1 − ) < P (vj ) < (1 + ) P (vi ) theory Local ranking in • score separation > 0 any ranking of {vi , vj } is valid practice Conclusions psort, yet another fast stable external sorting Cost Model software Introduction • computation for free Making sorting a complicate task • but visiting G costs Inside psort Conclusions (query to link server) Conclusions cost of ranking = |queries| = |nodes visited|
  • 29. Is it possible to compute the rank locally? 12/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 30. Is it possible to compute the rank locally? Our contribution: NO! 12/43 Local NO in theory: lower bounds computation of PageRank: the ranking side Introduction 1. Every deterministic local ranking algorithm has an adversarial Motivations Local ranking in graph forcing Ω(n) queries (and can be tightened) theory Local ranking in practice 2. Every randomized local ranking algorithm has an adversarial Conclusions psort, yet another graph forcing Ω(n) queries fast stable external sorting even to rank the top k nodes, software Introduction even if their scores are highly separated! Making sorting a complicate task Inside psort Conclusions Conclusions =⇒ a general low-cost local ranking algorithm does not exist
  • 31. Is it possible to compute the rank locally? Our contribution: NO! 12/43 Local computation of PageRank: the ranking side Introduction NO in practice: experimental results Motivations Local ranking in theory Local ranking in practice 1. real web/social graphs behave like worst-case input instances Conclusions for local ranking psort, yet another fast stable external sorting 2. approximating is not trivial: software Introduction state-of-the-art local score approximation algorithms do not Making sorting a complicate task turn into low-cost local rank approximation algorithms Inside psort Conclusions Conclusions
  • 32. Lower bounds (I): deterministic algorithms 13/43 Every det. Local computation of algorithm has an PageRank: the ranking side adversarial graph Introduction forcing cost Ω(n) Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Theorem 1 (paper Thm. 4) Making sorting a complicate task α2 Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . For Conclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the Conclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  • 33. Lower bounds (I): deterministic algorithms 13/43 Every det. Local computation of algorithm has an PageRank: the ranking side adversarial graph Introduction forcing cost Ω(n) Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Theorem 1 (paper Thm. 4) Making sorting a complicate task α2 Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . For Conclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the Conclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  • 34. Lower bounds (I): deterministic algorithms 13/43 Every det. Local computation of algorithm has an PageRank: the ranking side adversarial graph Introduction forcing cost Ω(n) Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Theorem 1 (paper Thm. 4) Making sorting a complicate task α2 Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . For Conclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the Conclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  • 35. Lower bounds (I): deterministic algorithms 13/43 Every det. Local computation of algorithm has an PageRank: the ranking side adversarial graph Introduction forcing cost Ω(n) Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Theorem 1 (paper Thm. 4) Making sorting a complicate task α2 Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . For Conclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the Conclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  • 36. Lower bounds (I): deterministic algorithms 13/43 Every det. Local computation of algorithm has an PageRank: the ranking side adversarial graph Introduction forcing cost Ω(n) Motivations Local ranking in n(1 − O( k)) theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Theorem 1 (paper Thm. 4) Making sorting a complicate task α2 Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . For Conclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where the Conclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) n(1 − O( k)) queries.
  • 37. Lower bounds (II): randomized algorithms 14/43 Every rand. v1 (Las Vegas or v2 link server ARANDOM Local computation of Monte Carlo) graph G PageRank: the ranking side algorithm has an ~104.5 queries (109 nodes) Introduction Motivations advers. graph v20 Local ranking in theory forcing cost Local ranking in Ω α n [v3 v10 ... v7] practice Conclusions psort, yet another fast stable external sorting Theorem 2 (paper Thm. 3) software Introduction α2 k2 α2 Making sorting a Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and ∈ 4n0 , 24k . Then complicate task Inside psort 1. for any Las Vegas local algorithm A Conclusions 2. for any Monte Carlo local algorithm A with constant confidence Conclusions there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are n -separated and, to compute their relative ranking, A performs in expectation Ω α queries.
  • 38. Lower bounds (II): randomized algorithms 14/43 Every rand. v1 (Las Vegas or v2 link server ARANDOM Local computation of Monte Carlo) graph G PageRank: the ranking side algorithm has an ~104.5 108 queries (109 nodes) Introduction Motivations advers. graph v20 Local ranking in theory forcing cost Local ranking in Ω α n Ω(n) [v3 v10 ... v7] practice Conclusions psort, yet another fast stable external sorting Theorem 2 (paper Thm. 3) software Introduction α2 k2 α2 Making sorting a Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and ∈ 4n0 , 24k . Then complicate task Inside psort 1. for any Las Vegas local algorithm A Conclusions 2. for any Monte Carlo local algorithm A with constant confidence Conclusions there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are n -separated and, to compute their relative ranking, A performs in expectation Ω α queries.
  • 39. What happens in practice? 15/43 Two experiments Local computation of 1. Hardness of real-world graphs PageRank: the ranking side Introduction Compute the minimal number of nodes that an algorithm must Motivations Local ranking in visit to always guarantee a correct ranking. theory Local ranking in practice Conclusions 2. Performance of approximation algorithms psort, yet another fast stable Evaluate cost and accuracy of local ranking algorithms derived external sorting software from state-of-the-art local score approximation algorithms. Introduction Making sorting a complicate task Inside psort Datasets Conclusions Conclusions nodes arcs crawled publicly available from LAW .it 40M 1150M 2004 - Univ. Milan LiveJournal 5M 79M 2008 http://law.dsi.unimi.it
  • 40. Exp. 1: hardness of real-world graphs (1/2) 16/43 Local computation of Breakdown of a local ranking algorithm PageRank: the ranking side Introduction Motivations 1. Visit ancestors 2. Compute ranking Local ranking in theory Local ranking in practice Thm.: must visit at least Thm.: must agree with Conclusions | minset(G, u, v)| natural PageRank score psort, yet another fast stable ancestors approximation external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 41. Exp. 1: hardness of real-world graphs (1/2) 16/43 Local computation of Breakdown of a local ranking algorithm PageRank: the ranking side Introduction Motivations 1. Visit ancestors 2. Compute ranking Local ranking in theory Local ranking in practice Thm.: must visit at least Thm.: must agree with Conclusions | minset(G, u, v)| natural PageRank score psort, yet another fast stable ancestors approximation external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions | minset(G, u, v)| ≤ cost of ranking u, v in graph G
  • 42. Exp. 1: hardness of real-world graphs (2/2) 17/43 107 average number of visited nodes Local computation of PageRank: the ranking side Introduction Motivations Local ranking in 106 theory Local ranking in practice Conclusions 5 psort, yet another 10 fast stable external sorting software 104 Introduction .it web graph Making sorting a complicate task Inside psort Conclusions LiveJournal graph Conclusions 103 2.56 1.28 .64 .32 .16 .08 .04 .02 .01 ε
  • 43. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limit Local PageRank computation to ancestors giving a high contribution. computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions v pruning threshold = 10%
  • 44. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limit Local PageRank computation to ancestors giving a high contribution. computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software 10% Introduction Making sorting a complicate task 35% 24% Inside psort Conclusions 17% Conclusions v pruning threshold = 10%
  • 45. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limit Local PageRank computation to ancestors giving a high contribution. computation of PageRank: the ranking side <10% Introduction Motivations <10% Local ranking in theory <10% <10% <10% Local ranking in practice Conclusions <10% psort, yet another fast stable external sorting software 10% Introduction Making sorting a complicate task 35% 24% Inside psort Conclusions 17% Conclusions v pruning threshold = 10%
  • 46. Exp. 2: performance of approximation algorithms 19/43 .it web graph 106 Local computation of PageRank: the ranking side Introduction Motivations (2.56,5.12) Local ranking in 5 average cost 10 theory Local ranking in practice (1.28,2.56) Conclusions (0.64,1.28) psort, yet another fast stable (0.32,0.64) (0.16,0.32) 104 external sorting software Introduction (0.08,0.16) Making sorting a (0.04,0.08) (0.02,0.04) complicate task Inside psort Conclusions (0.01,0.02) 3 Conclusions 10 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  • 47. Exp. 2: performance of approximation algorithms 20/43 LiveJournal graph 106 Local computation of PageRank: the ranking side Introduction Motivations (2.56,5.12) Local ranking in 5 average cost 10 theory Local ranking in practice (1.28,2.56) Conclusions (0.64,1.28) psort, yet another fast stable (0.32,0.64) (0.16,0.32) 104 external sorting software Introduction (0.08,0.16) Making sorting a (0.04,0.08) (0.02,0.04) complicate task Inside psort Conclusions (0.01,0.02) 3 Conclusions 10 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  • 48. Exp. 2: performance of approximation algorithms 21/43 .it web graph fraction of correctly ranked node pairs Local computation of PageRank: the 1 ranking side 0.8 Introduction Motivations Local ranking in (2.56,5.12) theory Local ranking in 0.6 practice (1.28,2.56) (0.64,1.28) Conclusions 0.4 (0.32,0.64) psort, yet another fast stable external sorting (0.16,0.32) software Introduction 0.2 (0.08,0.16) Making sorting a (0.04,0.08) 0 complicate task Inside psort (0.02,0.04) Conclusions (0.01,0.02) Conclusions -0.2 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  • 49. Exp. 2: performance of approximation algorithms 22/43 LiveJournal graph fraction of correctly ranked node pairs Local computation of PageRank: the 1 ranking side 0.8 Introduction Motivations Local ranking in (2.56,5.12) theory Local ranking in 0.6 practice (1.28,2.56) (0.64,1.28) Conclusions 0.4 (0.32,0.64) psort, yet another fast stable external sorting (0.16,0.32) software Introduction 0.2 (0.08,0.16) Making sorting a (0.04,0.08) 0 complicate task Inside psort (0.02,0.04) Conclusions (0.01,0.02) Conclusions -0.2 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  • 50. Conclusions 23/43 Local computation of PageRank: the ranking side 1. Local computation of PageRank ranking is infeasible Introduction Motivations Local ranking in theory Local ranking in 2. Cost of exact local ranking algorithms bounded by minsets practice Conclusions psort, yet another 3. Tested real web/social graphs are near worst-case fast stable external sorting software Introduction Making sorting a 4. And approximation is not trivial complicate task Inside psort Conclusions Conclusions Marco Bressan, Luca Pretto. Local computation of PageRank: the ranking side. Proc. of CIKM 2011: 631-640
  • 51. 24/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable psort, yet another fast stable external sorting software external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 52. In a nutshell 25/43 the psort sorting library Local computation of PageRank: the • written in C++ ranking side Introduction • handles large datasets (> TB) Motivations Local ranking in theory • stable sorting Local ranking in practice • fast Conclusions psort, yet another • designed for PC-class machines fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 53. In a nutshell 25/43 the psort sorting library Local computation of PageRank: the • written in C++ ranking side Introduction • handles large datasets (> TB) Motivations Local ranking in theory • stable sorting Local ranking in practice • fast Conclusions psort, yet another • designed for PC-class machines fast stable external sorting software Introduction ideal applications of psort Making sorting a complicate task Inside psort • sorting large databases Conclusions • sorting large log files Conclusions • sorting on commodity machines • ...
  • 54. psort and the Sort Benchmark (1/2) 26/43 The PennySort Benchmark Local Sort what you can in 0.01$ of computing time. computation of PageRank: the ranking side Introduction 400 GB yearly record (Sort Benchmark) t or Motivations Local ranking in 350 GB ps theory Local ranking in 300 GB practice Conclusions 250 GB psort, yet another fast stable 200 GB external sorting software 150 GB Introduction 100 GB Making sorting a complicate task Inside psort 50 GB Conclusions 0 GB 98 99 00 02 03 07 08 09 11 Conclusions 19 19 20 20 20 20 20 20 20 Source: http://sortbenchmark.org Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software. ACM Journal of Experimental Algorithmics 16: (2011)
  • 55. psort and the Sort Benchmark (2/2) 27/43 The Datamation Benchmark Local Sort 100MB disk-to-disk as fast as you can. computation of PageRank: the ranking side Introduction Motivations 980 s Local ranking in thunder (1987) theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a 440 ms complicate task Inside psort NOW-sort (2001) Conclusions psort (2011) Conclusions Paolo Bertasi, Michele Bonazza, Marco Bressan, Enoch Peserico: Datamation. A Quarter of a Century and Four Orders of Magnitude Later. CLUSTER 2011: 605-609
  • 56. psort and the STXXL library 28/43 200 stxxl on disks (8,8) stxxl on disks (8,32) Local 180 stxxl on disks (8,128) computation of PageRank: the stxxl on RAID (8,8) ranking side 160 stxxl on RAID (8,32) Introduction stxxl on RAID (8,128) Motivations psort on RAID (8,8) 140 Local ranking in psort on RAID (8,32) sort speed (in MB/s) theory psort on RAID (8,128) Local ranking in 120 practice Conclusions 100 psort, yet another fast stable external sorting 80 software Introduction 60 Making sorting a complicate task Inside psort 40 Conclusions Conclusions 20 0 1 2 3 4 10 10 10 10 sort size (in MB)
  • 57. Machine budget for Sort Benchmark 2011 29/43 RAM Local computation of Motherboard 47 EUR PageRank: the 60 EUR CPU 38 EUR ranking side Introduction Motivations Local ranking in theory Case Local ranking in 22 EUR Power Supply Unit practice Conclusions psort, yet another 15 EUR fast stable external sorting software Assembly fee Introduction Making sorting a 35 EUR complicate task Inside psort Conclusions Conclusions Hard Disks 215 EUR
  • 58. The big picture 30/43 psort execution diagram Local computation of PageRank: the 1MB, 10GB/s ranking side Introduction CPU/cache Motivations Local ranking in theory Local ranking in mergesort heap merge heap merge practice Conclusions psort, yet another fast stable external sorting main memory 1GB, 3GB/s software Introduction Making sorting a complicate task 1st disk pass 2nd disk pass Inside psort Conclusions Conclusions external memory 1TB, 0.7GB/s time
  • 59. The big picture - now complicated 31/43 Hardware/software details you must deal with: Local computation of PageRank: the ranking side Introduction • hdd quality • buffer size Motivations Local ranking in I/O • file system • direct transfer theory Local ranking in practice • scheduling • data placement Conclusions psort, yet another fast stable • size • page size external sorting software memory • bandwidth • access pattern Introduction Making sorting a complicate task • latency • conflicts Inside psort Conclusions Conclusions • size • line size cache • speed • associativity
  • 60. Hard disks 32/43 The speed curve of 13 “identical” WD1600JS disks Local computation of 150 PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in 100 Bandwidth (MB/s) practice Conclusions psort, yet another fast stable external sorting software Introduction 50 Making sorting a complicate task Inside psort Conclusions Conclusions 0 0 50 100 150 Distance from the outer rim (in GB)
  • 61. Memory 33/43 Why main memory is not really a RAM Local computation of 4.5 PageRank: the ranking side Introduction 4 Motivations Local ranking in 3.5 theory Local ranking in 3 bandwidth (GB/s) practice Conclusions 2.5 psort, yet another fast stable external sorting 2 L2 cache line size software Introduction 1.5 Making sorting a sequential read complicate task random read 1 Inside psort sequential write Conclusions random write 0.5 Conclusions 20 22 24 26 28 210 212 214 216 218 struct size (bytes)
  • 62. CPU 34/43 Is a dual-core always worth its price? Local computation of PageRank: the 3e+10 ranking side Intel dual core read Introduction Intel dual core write Motivations 2.5e+10 AMD single core read Local ranking in AMD single core write bandwidth (MB/s) theory Local ranking in practice 2e+10 Conclusions psort, yet another 1.5e+10 fast stable external sorting software Introduction 1e+10 Making sorting a complicate task Inside psort 5e+09 Conclusions Conclusions 0 16 18 20 22 24 26 28 30 log2( bytes visited )
  • 63. A list of psort’s tricks 35/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 64. A list of psort’s tricks 35/43 • fast polling • key pre/post Local general • payload processing computation of PageRank: the ranking side detachment • ... Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 65. A list of psort’s tricks 35/43 • fast polling • key pre/post Local general • payload processing computation of PageRank: the ranking side detachment • ... Introduction Motivations Local ranking in • O_DIRECT theory Local ranking in disk • uniform fetching practice access • independent Conclusions • ... psort, yet another disks fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 66. A list of psort’s tricks 35/43 • fast polling • key pre/post Local general • payload processing computation of PageRank: the ranking side detachment • ... Introduction Motivations Local ranking in • O_DIRECT theory Local ranking in disk • uniform fetching practice access • independent Conclusions • ... psort, yet another disks fast stable external sorting software • smart merging • special base case Introduction Making sorting a mergesort complicate task • quasi-in-place • ... Inside psort Conclusions Conclusions
  • 67. A list of psort’s tricks 35/43 • fast polling • key pre/post Local general • payload processing computation of PageRank: the ranking side detachment • ... Introduction Motivations Local ranking in • O_DIRECT theory Local ranking in disk • uniform fetching practice access • independent Conclusions • ... psort, yet another disks fast stable external sorting software • smart merging • special base case Introduction Making sorting a mergesort complicate task • quasi-in-place • ... Inside psort Conclusions Conclusions • key caching • payload interleaving heapsort • key offsetting • ...
  • 68. A list of psort’s tricks 35/43 • fast polling • key pre/post Local general • payload processing computation of PageRank: the ranking side detachment • ... Introduction Motivations Local ranking in • O_DIRECT theory Local ranking in disk • uniform fetching practice access • independent Conclusions • ... psort, yet another disks fast stable external sorting software • smart merging • special base case Introduction Making sorting a mergesort complicate task • quasi-in-place • ... Inside psort Conclusions Conclusions • key caching • payload interleaving heapsort • key offsetting • ...
  • 69. Smart merging (1/3) 36/43 Naive merging Local computation of PageRank: the ranking side void merge(T *s1, T *s2, T *out, int size) { Introduction Motivations int i = 0, j = 0, k = 0; Local ranking in bool bit; theory while ((i < size) & (j < size)) { Local ranking in practice if (s1[i] > s2[j]) { // READ + READ Conclusions out[k] = s2[j]; // READ psort, yet another j++; fast stable external sorting } else { software out[k] = s1[i]; // (READ) Introduction i++; Making sorting a complicate task } Inside psort k++; Conclusions ... Conclusions
  • 70. Smart merging (1/3) 36/43 Naive merging Local computation of PageRank: the ranking side void merge(T *s1, T *s2, T *out, int size) { Introduction Motivations int i = 0, j = 0, k = 0; Local ranking in bool bit; theory while ((i < size) & (j < size)) { Local ranking in practice if (s1[i] > s2[j]) { // READ + READ Conclusions out[k] = s2[j]; // READ psort, yet another j++; fast stable external sorting } else { software out[k] = s1[i]; // (READ) Introduction i++; Making sorting a complicate task } Inside psort k++; Conclusions ... Conclusions total mem READs per iteration: 3
  • 71. Smart merging (2/3) 37/43 Smart merging Local void merge(T* s1, T* s2, T* out, int size) { computation of PageRank: the int i = 0, j = 0, k = 0; ranking side bool bit; Introduction Motivations T cache[ 2 ]; Local ranking in cache[0] = s1[0]; theory Local ranking in cache[1] = s2[0]; practice while ((i < size) & (j < size)) { Conclusions if (cache[0] > cache[1]) { psort, yet another out[k] = cache[1]; fast stable external sorting cache[1] = s2[j]; // READ software j++; Introduction Making sorting a } else { complicate task out[k] = cache[0]; Inside psort Conclusions cache[0] = s1[i]; // (READ) i++; Conclusions } k++; ...
  • 72. Smart merging (2/3) 37/43 Smart merging Local void merge(T* s1, T* s2, T* out, int size) { computation of PageRank: the int i = 0, j = 0, k = 0; ranking side bool bit; Introduction Motivations T cache[ 2 ]; Local ranking in cache[0] = s1[0]; theory Local ranking in cache[1] = s2[0]; practice while ((i < size) & (j < size)) { Conclusions if (cache[0] > cache[1]) { psort, yet another out[k] = cache[1]; fast stable external sorting cache[1] = s2[j]; // READ software j++; Introduction Making sorting a } else { complicate task out[k] = cache[0]; Inside psort Conclusions cache[0] = s1[i]; // (READ) i++; Conclusions } k++; ... total mem READs per iteration: 1
  • 73. Smart merging (3/3) 38/43 Time required to merge two sequences Local 800000 computation of smart merge PageRank: the naive merge ranking side 700000 Introduction Motivations Local ranking in 600000 theory Local ranking in time in microseconds practice 500000 Conclusions psort, yet another 400000 fast stable external sorting software 300000 Introduction Making sorting a complicate task 200000 Inside psort Conclusions 100000 Conclusions 0 10 12 14 16 18 20 22 24 log2( merge size )
  • 74. Quasi-in-place mergesort (1/3) 39/43 traditional mergesort Local computation of PageRank: the void mergesort(T* input, T* output, int size) { ranking side Introduction for (int i = 1; i < log2(size); i++) { Motivations int subsize = 1 << (i + 1); Local ranking in for (int j = 0; j < size/subsize; j++) { theory Local ranking in merge(&input[j * subsize], practice &input[(j + 1) * subsize], Conclusions &output[j * subsize * 2], psort, yet another fast stable subsize); external sorting T* tmp = input; // swap input and output software input = output; Introduction Making sorting a output = tmp; complicate task } Inside psort Conclusions } } Conclusions
  • 75. Quasi-in-place mergesort (1/3) 39/43 traditional mergesort Local computation of PageRank: the void mergesort(T* input, T* output, int size) { ranking side Introduction for (int i = 1; i < log2(size); i++) { Motivations int subsize = 1 << (i + 1); Local ranking in for (int j = 0; j < size/subsize; j++) { theory Local ranking in merge(&input[j * subsize], practice &input[(j + 1) * subsize], Conclusions &output[j * subsize * 2], psort, yet another fast stable subsize); external sorting T* tmp = input; // swap input and output software input = output; Introduction Making sorting a output = tmp; complicate task } Inside psort Conclusions } } Conclusions extra space = N
  • 76. Quasi-in-place mergesort (2/3) 40/43 “quasi-in-place” mergesort Local computation of PageRank: the void mergesort(T* input, T* output, int size) { ranking side for (int i = 1; i < log2(size/2); i++) { Introduction int subsize = 1 << (i + 1); Motivations Local ranking in for (int j = 0; j < size/subsize; j++) { theory /* merge, overwriting the input vector */ Local ranking in practice merge(&input[j * subsize], Conclusions &input[(j + 1) * subsize], psort, yet another &input[(j - 1) * subsize], fast stable external sorting subsize); software } Introduction input = &input[-subsize]; // shift input left Making sorting a complicate task } Inside psort // finally merge into the output vector Conclusions merge(input, &input[size/2], output, size/2); Conclusions }
  • 77. Quasi-in-place mergesort (2/3) 40/43 “quasi-in-place” mergesort Local computation of PageRank: the void mergesort(T* input, T* output, int size) { ranking side for (int i = 1; i < log2(size/2); i++) { Introduction int subsize = 1 << (i + 1); Motivations Local ranking in for (int j = 0; j < size/subsize; j++) { theory /* merge, overwriting the input vector */ Local ranking in practice merge(&input[j * subsize], Conclusions &input[(j + 1) * subsize], psort, yet another &input[(j - 1) * subsize], fast stable external sorting subsize); software } Introduction input = &input[-subsize]; // shift input left Making sorting a complicate task } Inside psort // finally merge into the output vector Conclusions merge(input, &input[size/2], output, size/2); Conclusions } extra space = N/2
  • 78. Quasi-in-place mergesort (3/3) 41/43 Average time required to compare two keys Local 4 computation of PageRank: the ranking side 3.5 Introduction Motivations Local ranking in 3 theory Local ranking in 2.5 relative unities practice Conclusions psort, yet another 2 fast stable external sorting software 1.5 Introduction Making sorting a complicate task 1 Inside psort Conclusions 0.5 Conclusions quasi-in-place 0 10 12 14 16 18 20 22 24 log2( input size in bytes )
  • 79. Conclusions 42/43 Local computation of PageRank: the ranking side Introduction Motivations 1. Solving old problems really fast is still tricky Local ranking in theory Local ranking in practice 2. To do it, you must match today’s hardware Conclusions psort, yet another fast stable external sorting 3. Solution: software engineering and tuning software Introduction Making sorting a complicate task Inside psort Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software. Conclusions ACM Journal of Experimental Algorithmics 16: (2011) Conclusions
  • 80. Conclusions 43/43 Local computation of PageRank: the ranking side Introduction Motivations Local ranking in theory Local ranking in practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 81. Conclusions 43/43 Local Ranking computation of PageRank: the ranking side 1. Local computation of PageRank ranking infeasible in theory Introduction Motivations 2. On tested web/social graphs, infeasible also in practice Local ranking in theory Local ranking in 3. Rank analysis requires novel tools! practice Conclusions psort, yet another fast stable external sorting software Introduction Making sorting a complicate task Inside psort Conclusions Conclusions
  • 82. Conclusions 43/43 Local Ranking computation of PageRank: the ranking side 1. Local computation of PageRank ranking infeasible in theory Introduction Motivations 2. On tested web/social graphs, infeasible also in practice Local ranking in theory Local ranking in 3. Rank analysis requires novel tools! practice Conclusions psort, yet another fast stable Sorting external sorting software 1. Solving old problems really fast is still tricky Introduction Making sorting a complicate task 2. To do it, you must match today’s hardware Inside psort Conclusions 3. Software engineering and tuning are the ways Conclusions
  • 83. Conclusions 43/43 Local Ranking computation of PageRank: the ranking side 1. Local computation of PageRank ranking infeasible in theory Introduction Motivations 2. On tested web/social graphs, infeasible also in practice Local ranking in theory Local ranking in 3. Rank analysis requires novel tools! practice Conclusions psort, yet another fast stable Sorting external sorting software 1. Solving old problems really fast is still tricky Introduction Making sorting a complicate task 2. To do it, you must match today’s hardware Inside psort Conclusions 3. Software engineering and tuning are the ways Conclusions And of course now you should pay me twice! :-)