SlideShare una empresa de Scribd logo
1 de 120
Descargar para leer sin conexión
Filtering: A Method for Solving
                           Graph Problems in
                               MapReduce
               Benjamin Moseley   Silvio Lattanzi   Siddharth Suri    Sergei Vassilvitski
                    UIUC              Google           Yahoo!              Yahoo!




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Overview

 • Introduction to MapReduce model

 • Our settings

 • Our results

 • Open questions




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Introduction to the MapReduce
                              model




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
XXL Data

 • Huge amount of data

 • Main problem is to analyze information quickly




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
XXL Data

 • Huge amount of data

 • Main problem is to analyze information quickly

 • New tools

 • Suitable efficient algorithms




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                            INPUT




                                                                             MAP
                                                                            PHASE




                                                                            REDUCE
                                                                             PHASE




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE




                                                                                         REDUCE
                                                                                          PHASE




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE
 • Mapper decides how data is
   distributed
                                                                                         REDUCE
                                                                                          PHASE




                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce

 • MapReduce is the platform of choice for processing
   massive data
                                                                                         INPUT

 • Data are represented as tuple
                            < key, value >
                                                                                          MAP
                                                                                         PHASE
 • Mapper decides how data is
   distributed
                                                                                         REDUCE

 • Reducer performs non-trivial                                                           PHASE


   computation locally


                                         Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
How can we model MapReduce?

   PRAM

             •No limit on the number of processors

             •Memory is uniformly accessible from any
              processor

             •No limit on the memory available




                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
How can we model MapReduce?

   STREAMING

             •Just one processor

             •There is a limited amount of memory

             •No parallelization




                               Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                            1−
             •Less than N         machines




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                            1−
             •Less than N         machines
                            1−
             •Less than N         memory on each machine




                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce model

   [Karloff, Suri and Vassilvitskii]

             • N is the input size and   0 is some fixed
              constant
                                         1−
             •Less than N                      machines
                                         1−
             •Less than N                      memory on each machine

                            i                                                          i
    MRC : problem that can be solved in O(log N )
                                rounds
                                           Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph


•Keep time in each round polynomial




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Combining map and reduce phase


•Mapper and Reducer work only on a subgraph


•Keep time in each round polynomial


•Time is constrained by the number of rounds




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input


•No communication between machines during each
 phase




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic challenges

•No machine can see the entire input


•No communication between machines during each
 phase

                                2−2
•Total memory is            N


                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
MapReduce vs MUD algorithms


•In MUD framework each reducer operates on a
 stream of data.


•In MUD, each reducer is restricted to only using
 polylogarithmic space.




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings




                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We study the Karloff, Suri and Vassilvitskii model




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We study the Karloff, Suri and Vassilvitskii model



                                      0
   •We focus on class MRC




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We assume to work with dense graph m                         = n,       1+c
    for some constant c  0




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our settings


   •We assume to work with dense graph m                         = n,       1+c
    for some constant c  0



   •Empirical evidences that social networks are dense
    graphs
    [Leskovec, Kleinberg and Faloutsos]



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks

         •They show that several graphs from different
                         1+c
          domains have n     edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Dense graph motivation

   [Leskovec, Kleinberg and Faloutsos]

         •They study 9 different social networks

         •They show that several graphs from different
                         1+c
          domains have n     edges

         •Lowest value of c founded .08 and four graphs
          have c  .5



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Our results




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Results

     Constant rounds algorithms for
         •Maximal matching

         •Minimum cut

         •8-approx for maximum weighted matching

         •2-approx for vertex cover

         •3/2-approx for edge cover

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Notation


   • G = (V, E) : Input graph

   •     n: number of nodes
   •     m : number of edges
   •     η : memory available on each machine

   •    N : input size
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel


  •Next some computation is performed on the filtered
   input




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

  •Part of the input is dropped or filtered on the first
   stage in parallel


  •Next some computation is performed on the filtered
   input


  •Finally some patchwork is done to ensure a proper
   solution

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION

                              PATCHWORK
                             ON REST OF THE
                                 GRAPH




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Filtering

                                 FILTER




                                                     COMPUTE
                                                     SOLUTION

                              PATCHWORK
                             ON REST OF THE
                                 GRAPH




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




          PARTITION
           EDGES
                                        nc/2




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree

        m=n                 1+c




          PARTITION
           EDGES
                                        nc/2




  n1+c/2 edges

                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                     n1+c/2   edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                               n1+c/2   edges




                            COMPUTE
                              MST
                                          nc/2




                                      Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Warm-up: Compute the minimum
  spanning tree
                                                                               n1+c/2   edges




                            COMPUTE
                              MST
                                          nc/2




                            SECOND
                            ROUND
                                          nc/2




         n1+c/2 edges


                                      Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                             MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                                   MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds
                                 c   
 •No more than              O n   2       machines are used




                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Show that the algorithm is in                                   MRC0


 •The algorithm is correct

 •No edge in the final solution is discarded in partial
  solution

 •The algorithm runs in two rounds
                                 c   
 •No more than              O n   2       machines are used

 •No machine uses memory greater than O(n
                                                                           1+c/2
                                                                                   )

                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximal matching



    •Algorithmic difficulty is that each machine can only
     see edges assigned to the machine




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximal matching



    •Algorithmic difficulty is that each machine can only
     see edges assigned to the machine


    •Is a partitioning based algorithm feasible?




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                                            COMBINE
                                           MATCHINGS




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Partitioning algorithm

                            PARTITIONING




                                            COMBINE
                                           MATCHINGS




   It is impossible to build a maximal matching using only
                           red edges!!
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 


                            
  •Let M be a maximal matching on
     G[E ]          




                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight


  •Consider any subset of the edges E 


                            
  •Let M be a maximal matching on
     G[E ]          




  •The unmatched vertices form a
   independent set

                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices

•For dense portions of the graph, many
 edges should be sampled




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithmic insight

•We pick each edge with probability p

•Find a matching on a sample and then find a
 matching on unmatched vertices

•For dense portions of the graph, many
 edges should be sampled

•Sparse portions of the graph are small
 and can be placed on a single machine

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample

•Consider the induced subgraph on unmatched
 vertices



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

•Sample each edge independently with probability
    10 log n
 p=
     nc/2

•Find a matching on the sample

•Consider the induced subgraph on unmatched
 vertices

•Find a matching on this graph and output the union

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING



                               MATCHING ON
                             UNMATCHED NODES




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Algorithm

                                SAMPLE




                                                           FIRST
                                                         MATCHING



                               MATCHING ON
                             UNMATCHED NODES




                   UNION



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm


  •All unmatched vertices are placed on a machine




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Correctness

  •Consider the last step of the algorithm


  •All unmatched vertices are placed on a machine


  •All unmatched vertices or are matched at the last
   step or have only matched neighbors



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges

 •Find a matching on a single machine for the sampled
  edges




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds

 Three rounds:

 •Sampling the edges

 •Find a matching on a single machine for the sampled
  edges

 •Find a matching for the unmatched vertices



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2
                                            ˜ 1+c/2 )
 •Using Chernoff the sampled graph has size O(n
    with high probability




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                             10 log n
 •Each edge was sampled with probability p =
                                              nc/2
                                            ˜ 1+c/2 )
 •Using Chernoff the sampled graph has size O(n
    with high probability

 •Can we bound the size of the induced subgraph on
  the unmatched vertices?



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                                                                     1+c/2
•For a fixed induced subgraph with at least n
 edges the probability an edge is not sampled is:
                                         n1+c/2
                               10 log n
                            1−                      ≤ exp(−10n log n)
                                nc/2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the memory

                                                                                     1+c/2
•For a fixed induced subgraph with at least n
 edges the probability an edge is not sampled is:
                                         n1+c/2
                               10 log n
                            1−                      ≤ exp(−10n log n)
                                nc/2

•Union bounding over all 2 induced subgraphs shows   n

 that at least one edge is sampled from every dense
 subgraph with probability
                                                         1
                                1 − exp(−10 log n) ≥ 1 − 10
                                                        n

                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory


                                            ˜ 1+c/2 )
   •Can we run the algorithm with less then O(n
      memory?




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory


                                            ˜ 1+c/2 )
   •Can we run the algorithm with less then O(n
      memory?


   •We can amplify our sampling technique!




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Using less memory

•Sample as many edges as possible that fits in memory

•Find a matching

•If the edges between unmatched vertices fit onto a
 single machine, find a matching on those vertices

•Otherwise, recurse on the unmatched nodes

                        1+                                                   
•With n     memory each iteration a factor of n edges
 are removed resulting in O(c/) rounds
                              Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization


  •We use a single machine in every step




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Parallel computation power

  •Maximal matching algorithm does not use
   parallelization


  •We use a single machine in every step


  •When parallelization is used?



                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                            8

                            2                       2
                                            7
                                    1

                                4
                                                3
                            4           2
                                                     2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                8

                                2                       2
                                                7
                                        1

                                    4
                                                    3
                                4           2
                                                         2


                                                                         SPLIT THE GRAPH




                                                                                8

       2                    2
                                                                               7
             1
                                        4           3
                    2               4
                            2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                                           8

         2                  2
                                                                          7
                1
                                    4        3
                      2         4
                            2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching
                                                                                   8

         2                  2
                                                                                  7
                1
                                            4        3
                      2                 4
                            2




                            MATCHINGS




         2
                                                                                  7

                                                     3
                      2                 4
                            2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching

         2
                                                                         7

                                            3
                      2         4
                            2




                                    Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Maximum weighted matching

         2
                                                                                 7

                                                        3
                      2                4
                            2



                            SCAN THE EDGE
                            SEQUENTIALLY




                                       2
                                                    7

                                                        3
                                       4        2
                                                            2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Bounding the rounds and memory


  Four rounds:

  •Splitting the graph and running the maximal
   matching:
                ˜ 1+c/2 ) memory
   3 rounds and O(n


  •Compute the final solution:
   1 round and O(n log n) memory

                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Approximation guarantee (sketch)

 •An edge in the solution can block at most 2 edges in
  each subgraph of smaller weight
                                                                            8

              2
                                                                            7

                                                 3
                            2       4
                                2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Approximation guarantee (sketch)

 •An edge in the solution can block at most 2 edges in
  each subgraph of smaller weight
                                                                            8

              2
                                                                            7

                                                 3
                            2       4
                                2




 •We loose a factor or 2 because we do not consider
  the weight


                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Other algorithms

 Based on the same intuition:


 •2-approx for vertex cover


 •3/2-approx for edge cover




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a
 filtering technique




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut

•Partition does not work, because we loose structural
 informations

•Sampling does not seem to work either

•We can use the first steps of Karger algorithm as a
 filtering technique

•The random choices made in the early rounds
 succeed with high probability, whereas the later rounds
 have a much lower probability of success
                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut




       1




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                       n δ1
                 3                  1                                 5
       1                    2   3               2             2           3           1     1       3
                                            4                                                   4
           4                            3                         4
                                                                          4                 3
                  1         3                                                                       2




                                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                                              n δ1
                 3                                     1                                     5
       1                    2            3                             2             2           3           1       1           3
                                                                   4                                                         4
           4                                               3                             4
                                                                                                 4                   3
                  1         3                                                                                                    2

                                 COMPRESS
                                 EDGES WITH
                                SMALL WEIGTH

                                                                                             2
                                               2                                                                         2
                                                               2
                      2                            2                                                             2




                                                                   Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut



      RANDOM
      WEIGHT
                                                                              n δ1
                 3                                     1                                     5
       1                    2            3                             2             2           3           1       1           3
                                                                   4                                                         4
           4                                               3                             4
                                                                                                 4                   3
                  1         3                                                                                                    2

                                 COMPRESS
                                 EDGES WITH
                                SMALL WEIGTH

                                                                                             2
                                               2                                                                         2
                                                               2
                      2                            2                                                             2




                      From Karger at least one graph will contain a min cut w.h.p.
                                                                   Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut
                                                         2
                            2                                                     2
                                    2
                 2              2                                           2




                                        Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Minimum cut
                                                                   2
                                      2                                                     2
                                              2
                 2                        2                                           2




                            COMPUTE
                            MINIMUM
                              CUT

                                                                   2
                                      2                                                      2
                                              2
                 2                        2                                           2




           We will find the minimum cut w.h.p.


                                                  Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Empirical result (matching)
                                      %#!

                                                      '()(**+*,*-.)/012
                                                      30)+(2/4-
                                      %!!
                  !#$%'(%)#*+,'




                                      $#!




                                      $!!




                                       #!




                                        !
                                             !!!$          !!!#         !!$            !!%            !!#   !$   $
                                                                                    -.%/0'1234.4)0)*5'(/,'




                                                                             Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems




                                Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems 1

  •Maximum matching


  •Shortest path


  •Dynamic programming




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Open problems 2


  •Algorithms for sparse graph


  •Does connected components require more than 2
   rounds?


  •Lower bounds


                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012
Thank you!




                            Large Scale Distributed Computation, Jan 2012
Tuesday, January 17, 2012

Más contenido relacionado

Último

Best VIP Call Girls Noida Sector 47 Call Me: 8448380779
Best VIP Call Girls Noida Sector 47 Call Me: 8448380779Best VIP Call Girls Noida Sector 47 Call Me: 8448380779
Best VIP Call Girls Noida Sector 47 Call Me: 8448380779Delhi Call girls
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130Suhani Kapoor
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...Call Girls in Nagpur High Profile
 
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...home
 
Kindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpKindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpmainac1
 
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Gariahat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Roomdivyansh0kumar0
 
The history of music videos a level presentation
The history of music videos a level presentationThe history of music videos a level presentation
The history of music videos a level presentationamedia6
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Call Girls in Nagpur High Profile
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...babafaisel
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricksabhishekparmar618
 
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 night
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 nightCheap Rate Call girls Malviya Nagar 9205541914 shot 1500 night
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 nightDelhi Call girls
 
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfThe_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfAmirYakdi
 
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...BarusRa
 
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre GurgaonCheap Rate ➥8448380779 ▻Call Girls In Huda City Centre Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre GurgaonDelhi Call girls
 
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...Pooja Nehwal
 
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 nightCheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 nightDelhi Call girls
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 

Último (20)

Best VIP Call Girls Noida Sector 47 Call Me: 8448380779
Best VIP Call Girls Noida Sector 47 Call Me: 8448380779Best VIP Call Girls Noida Sector 47 Call Me: 8448380779
Best VIP Call Girls Noida Sector 47 Call Me: 8448380779
 
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
VIP Call Girls Service Kukatpally Hyderabad Call +91-8250192130
 
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
young call girls in Vivek Vihar🔝 9953056974 🔝 Delhi escort Service
 
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...Booking open Available Pune Call Girls Nanded City  6297143586 Call Hot India...
Booking open Available Pune Call Girls Nanded City 6297143586 Call Hot India...
 
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...
Recommendable # 971589162217 # philippine Young Call Girls in Dubai By Marina...
 
Kindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUpKindergarten Assessment Questions Via LessonUp
Kindergarten Assessment Questions Via LessonUp
 
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130  Available With RoomVIP Kolkata Call Girl Gariahat 👉 8250192130  Available With Room
VIP Kolkata Call Girl Gariahat 👉 8250192130 Available With Room
 
The history of music videos a level presentation
The history of music videos a level presentationThe history of music videos a level presentation
The history of music videos a level presentation
 
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...Top Rated  Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
Top Rated Pune Call Girls Koregaon Park ⟟ 6297143586 ⟟ Call Me For Genuine S...
 
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
Kala jadu for love marriage | Real amil baba | Famous amil baba | kala jadu n...
 
Cosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable BricksCosumer Willingness to Pay for Sustainable Bricks
Cosumer Willingness to Pay for Sustainable Bricks
 
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance  VVIP 🍎 SER...
Call Girls Service Mukherjee Nagar @9999965857 Delhi 🫦 No Advance VVIP 🍎 SER...
 
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 night
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 nightCheap Rate Call girls Malviya Nagar 9205541914 shot 1500 night
Cheap Rate Call girls Malviya Nagar 9205541914 shot 1500 night
 
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
young call girls in Pandav nagar 🔝 9953056974 🔝 Delhi escort Service
 
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdfThe_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
The_Canvas_of_Creative_Mastery_Newsletter_April_2024_Version.pdf
 
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...AMBER GRAIN EMBROIDERY | Growing folklore elements |  Root-based materials, w...
AMBER GRAIN EMBROIDERY | Growing folklore elements | Root-based materials, w...
 
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre GurgaonCheap Rate ➥8448380779 ▻Call Girls In Huda City Centre Gurgaon
Cheap Rate ➥8448380779 ▻Call Girls In Huda City Centre Gurgaon
 
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...Kurla Call Girls Pooja Nehwal📞 9892124323 ✅  Vashi Call Service Available Nea...
Kurla Call Girls Pooja Nehwal📞 9892124323 ✅ Vashi Call Service Available Nea...
 
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 nightCheap Rate Call girls Kalkaji 9205541914 shot 1500 night
Cheap Rate Call girls Kalkaji 9205541914 shot 1500 night
 
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service AmravatiVIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
VIP Call Girl Amravati Aashi 8250192130 Independent Escort Service Amravati
 

Destacado

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Destacado (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Filtering: a method for solving graph problems in map reduce

  • 1. Filtering: A Method for Solving Graph Problems in MapReduce Benjamin Moseley Silvio Lattanzi Siddharth Suri Sergei Vassilvitski UIUC Google Yahoo! Yahoo! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 2. Overview • Introduction to MapReduce model • Our settings • Our results • Open questions Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 3. Introduction to the MapReduce model Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 4. XXL Data • Huge amount of data • Main problem is to analyze information quickly Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 5. XXL Data • Huge amount of data • Main problem is to analyze information quickly • New tools • Suitable efficient algorithms Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 6. MapReduce • MapReduce is the platform of choice for processing massive data INPUT MAP PHASE REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 7. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 8. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE • Mapper decides how data is distributed REDUCE PHASE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 9. MapReduce • MapReduce is the platform of choice for processing massive data INPUT • Data are represented as tuple < key, value > MAP PHASE • Mapper decides how data is distributed REDUCE • Reducer performs non-trivial PHASE computation locally Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 10. How can we model MapReduce? PRAM •No limit on the number of processors •Memory is uniformly accessible from any processor •No limit on the memory available Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 11. How can we model MapReduce? STREAMING •Just one processor •There is a limited amount of memory •No parallelization Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 12. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 13. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 14. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines 1− •Less than N memory on each machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 15. MapReduce model [Karloff, Suri and Vassilvitskii] • N is the input size and 0 is some fixed constant 1− •Less than N machines 1− •Less than N memory on each machine i i MRC : problem that can be solved in O(log N ) rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 16. Combining map and reduce phase •Mapper and Reducer work only on a subgraph Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 17. Combining map and reduce phase •Mapper and Reducer work only on a subgraph •Keep time in each round polynomial Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 18. Combining map and reduce phase •Mapper and Reducer work only on a subgraph •Keep time in each round polynomial •Time is constrained by the number of rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 19. Algorithmic challenges •No machine can see the entire input Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 20. Algorithmic challenges •No machine can see the entire input •No communication between machines during each phase Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 21. Algorithmic challenges •No machine can see the entire input •No communication between machines during each phase 2−2 •Total memory is N Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 22. MapReduce vs MUD algorithms •In MUD framework each reducer operates on a stream of data. •In MUD, each reducer is restricted to only using polylogarithmic space. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 23. Our settings Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 24. Our settings •We study the Karloff, Suri and Vassilvitskii model Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 25. Our settings •We study the Karloff, Suri and Vassilvitskii model 0 •We focus on class MRC Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 26. Our settings •We assume to work with dense graph m = n, 1+c for some constant c 0 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 27. Our settings •We assume to work with dense graph m = n, 1+c for some constant c 0 •Empirical evidences that social networks are dense graphs [Leskovec, Kleinberg and Faloutsos] Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 28. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 29. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks •They show that several graphs from different 1+c domains have n edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 30. Dense graph motivation [Leskovec, Kleinberg and Faloutsos] •They study 9 different social networks •They show that several graphs from different 1+c domains have n edges •Lowest value of c founded .08 and four graphs have c .5 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 31. Our results Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 32. Results Constant rounds algorithms for •Maximal matching •Minimum cut •8-approx for maximum weighted matching •2-approx for vertex cover •3/2-approx for edge cover Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 33. Notation • G = (V, E) : Input graph • n: number of nodes • m : number of edges • η : memory available on each machine • N : input size Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 34. Filtering •Part of the input is dropped or filtered on the first stage in parallel Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 35. Filtering •Part of the input is dropped or filtered on the first stage in parallel •Next some computation is performed on the filtered input Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 36. Filtering •Part of the input is dropped or filtered on the first stage in parallel •Next some computation is performed on the filtered input •Finally some patchwork is done to ensure a proper solution Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 37. Filtering Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 38. Filtering FILTER Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 39. Filtering FILTER COMPUTE SOLUTION Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 40. Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 41. Filtering FILTER COMPUTE SOLUTION PATCHWORK ON REST OF THE GRAPH Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 42. Warm-up: Compute the minimum spanning tree m=n 1+c Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 43. Warm-up: Compute the minimum spanning tree m=n 1+c PARTITION EDGES nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 44. Warm-up: Compute the minimum spanning tree m=n 1+c PARTITION EDGES nc/2 n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 45. Warm-up: Compute the minimum spanning tree n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 46. Warm-up: Compute the minimum spanning tree n1+c/2 edges COMPUTE MST nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 47. Warm-up: Compute the minimum spanning tree n1+c/2 edges COMPUTE MST nc/2 SECOND ROUND nc/2 n1+c/2 edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 48. Show that the algorithm is in MRC0 •The algorithm is correct Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 49. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 50. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 51. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds c •No more than O n 2 machines are used Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 52. Show that the algorithm is in MRC0 •The algorithm is correct •No edge in the final solution is discarded in partial solution •The algorithm runs in two rounds c •No more than O n 2 machines are used •No machine uses memory greater than O(n 1+c/2 ) Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 53. Maximal matching •Algorithmic difficulty is that each machine can only see edges assigned to the machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 54. Maximal matching •Algorithmic difficulty is that each machine can only see edges assigned to the machine •Is a partitioning based algorithm feasible? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 55. Partitioning algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 56. Partitioning algorithm PARTITIONING Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 57. Partitioning algorithm PARTITIONING COMBINE MATCHINGS Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 58. Partitioning algorithm PARTITIONING COMBINE MATCHINGS It is impossible to build a maximal matching using only red edges!! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 59. Algorithmic insight •Consider any subset of the edges E Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 60. Algorithmic insight •Consider any subset of the edges E •Let M be a maximal matching on G[E ] Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 61. Algorithmic insight •Consider any subset of the edges E •Let M be a maximal matching on G[E ] •The unmatched vertices form a independent set Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 62. Algorithmic insight •We pick each edge with probability p Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 63. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 64. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices •For dense portions of the graph, many edges should be sampled Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 65. Algorithmic insight •We pick each edge with probability p •Find a matching on a sample and then find a matching on unmatched vertices •For dense portions of the graph, many edges should be sampled •Sparse portions of the graph are small and can be placed on a single machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 66. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 67. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 68. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample •Consider the induced subgraph on unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 69. Algorithm •Sample each edge independently with probability 10 log n p= nc/2 •Find a matching on the sample •Consider the induced subgraph on unmatched vertices •Find a matching on this graph and output the union Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 70. Algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 71. Algorithm SAMPLE Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 72. Algorithm SAMPLE FIRST MATCHING Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 73. Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 74. Algorithm SAMPLE FIRST MATCHING MATCHING ON UNMATCHED NODES UNION Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 75. Correctness •Consider the last step of the algorithm Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 76. Correctness •Consider the last step of the algorithm •All unmatched vertices are placed on a machine Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 77. Correctness •Consider the last step of the algorithm •All unmatched vertices are placed on a machine •All unmatched vertices or are matched at the last step or have only matched neighbors Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 78. Bounding the rounds Three rounds: •Sampling the edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 79. Bounding the rounds Three rounds: •Sampling the edges •Find a matching on a single machine for the sampled edges Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 80. Bounding the rounds Three rounds: •Sampling the edges •Find a matching on a single machine for the sampled edges •Find a matching for the unmatched vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 81. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 82. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 ˜ 1+c/2 ) •Using Chernoff the sampled graph has size O(n with high probability Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 83. Bounding the memory 10 log n •Each edge was sampled with probability p = nc/2 ˜ 1+c/2 ) •Using Chernoff the sampled graph has size O(n with high probability •Can we bound the size of the induced subgraph on the unmatched vertices? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 84. Bounding the memory 1+c/2 •For a fixed induced subgraph with at least n edges the probability an edge is not sampled is: n1+c/2 10 log n 1− ≤ exp(−10n log n) nc/2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 85. Bounding the memory 1+c/2 •For a fixed induced subgraph with at least n edges the probability an edge is not sampled is: n1+c/2 10 log n 1− ≤ exp(−10n log n) nc/2 •Union bounding over all 2 induced subgraphs shows n that at least one edge is sampled from every dense subgraph with probability 1 1 − exp(−10 log n) ≥ 1 − 10 n Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 86. Using less memory ˜ 1+c/2 ) •Can we run the algorithm with less then O(n memory? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 87. Using less memory ˜ 1+c/2 ) •Can we run the algorithm with less then O(n memory? •We can amplify our sampling technique! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 88. Using less memory •Sample as many edges as possible that fits in memory Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 89. Using less memory •Sample as many edges as possible that fits in memory •Find a matching Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 90. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 91. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices •Otherwise, recurse on the unmatched nodes Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 92. Using less memory •Sample as many edges as possible that fits in memory •Find a matching •If the edges between unmatched vertices fit onto a single machine, find a matching on those vertices •Otherwise, recurse on the unmatched nodes 1+ •With n memory each iteration a factor of n edges are removed resulting in O(c/) rounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 93. Parallel computation power •Maximal matching algorithm does not use parallelization Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 94. Parallel computation power •Maximal matching algorithm does not use parallelization •We use a single machine in every step Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 95. Parallel computation power •Maximal matching algorithm does not use parallelization •We use a single machine in every step •When parallelization is used? Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 96. Maximum weighted matching 8 2 2 7 1 4 3 4 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 97. Maximum weighted matching 8 2 2 7 1 4 3 4 2 2 SPLIT THE GRAPH 8 2 2 7 1 4 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 98. Maximum weighted matching 8 2 2 7 1 4 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 99. Maximum weighted matching 8 2 2 7 1 4 3 2 4 2 MATCHINGS 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 100. Maximum weighted matching 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 101. Maximum weighted matching 2 7 3 2 4 2 SCAN THE EDGE SEQUENTIALLY 2 7 3 4 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 102. Bounding the rounds and memory Four rounds: •Splitting the graph and running the maximal matching: ˜ 1+c/2 ) memory 3 rounds and O(n •Compute the final solution: 1 round and O(n log n) memory Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 103. Approximation guarantee (sketch) •An edge in the solution can block at most 2 edges in each subgraph of smaller weight 8 2 7 3 2 4 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 104. Approximation guarantee (sketch) •An edge in the solution can block at most 2 edges in each subgraph of smaller weight 8 2 7 3 2 4 2 •We loose a factor or 2 because we do not consider the weight Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 105. Other algorithms Based on the same intuition: •2-approx for vertex cover •3/2-approx for edge cover Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 106. Minimum cut •Partition does not work, because we loose structural informations Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 107. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 108. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either •We can use the first steps of Karger algorithm as a filtering technique Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 109. Minimum cut •Partition does not work, because we loose structural informations •Sampling does not seem to work either •We can use the first steps of Karger algorithm as a filtering technique •The random choices made in the early rounds succeed with high probability, whereas the later rounds have a much lower probability of success Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 110. Minimum cut 1 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 111. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 112. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 COMPRESS EDGES WITH SMALL WEIGTH 2 2 2 2 2 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 113. Minimum cut RANDOM WEIGHT n δ1 3 1 5 1 2 3 2 2 3 1 1 3 4 4 4 3 4 4 3 1 3 2 COMPRESS EDGES WITH SMALL WEIGTH 2 2 2 2 2 2 2 From Karger at least one graph will contain a min cut w.h.p. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 114. Minimum cut 2 2 2 2 2 2 2 Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 115. Minimum cut 2 2 2 2 2 2 2 COMPUTE MINIMUM CUT 2 2 2 2 2 2 2 We will find the minimum cut w.h.p. Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 116. Empirical result (matching) %#! '()(**+*,*-.)/012 30)+(2/4- %!! !#$%'(%)#*+,' $#! $!! #! ! !!!$ !!!# !!$ !!% !!# !$ $ -.%/0'1234.4)0)*5'(/,' Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 117. Open problems Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 118. Open problems 1 •Maximum matching •Shortest path •Dynamic programming Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 119. Open problems 2 •Algorithms for sparse graph •Does connected components require more than 2 rounds? •Lower bounds Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012
  • 120. Thank you! Large Scale Distributed Computation, Jan 2012 Tuesday, January 17, 2012