SlideShare a Scribd company logo
1 of 68
Download to read offline
Making Static Pivoting Scalable and Dependable
                          Ph.D. Dissertation Talk


                               E. Jason Riedy
                               jason@acm.org

                               EECS Department
                        University of California, Berkeley
  Committee: Dr. James Demmel (chair), Dr. Katherine Yelick, Dr. Sanjay Govindjee


                            17 December, 2010
Outline


1   Introduction

2   Solving Ax = b dependably

3   Extending dependability to static pivoting

4   Distributed matching for static pivoting

5   Summary




     Jason Riedy (UCB)          Static Pivoting   17 Dec, 2010   2 / 59
Motivation: Ever Larger Ax = b

Systems Ax = b are growing larger, more difficult
   Omega3P: n = 7.5 million with τ = 300 million entries
   Quantum Mechanics: precondition with blocks of dimension
   200-350 thousand
   Large barrier-based optimization problems: Many solves, similar
   structure, increasing condition number

   Huge systems are generated, solved, and analyzed automatically.
   Large, highly unsymmetric systems need scalable parallel solvers.
   Low-level routines: No expert in the loop!



   Jason Riedy (UCB)         Static Pivoting           17 Dec, 2010   3 / 59
Motivation: Solving Ax = b better


             Many people work to solve Ax = b faster.
            Today we start with how to solve it better.
                     Better enables faster.

   Use extra floating-point precision within iterative refinement to
   obtain a dependable solution, adding O(n2 ) work after an O(n3 )
   factorization.
   Accelerate sparse factorization through static pivoting,
   decoupling symbolic, numeric phases.
   Refine the perturbed solution without needing extra triangular
   solves for condition estimation.


   Jason Riedy (UCB)         Static Pivoting          17 Dec, 2010   4 / 59
Contributions
Iterative refinement
    Extend iterative refinement to provide small forward errors
    dependably (to be defined)
    Set and use a methodology to demonstrate dependability
    Show that condition estimation (expensive for sparse systems) is
    not necessary for obtaining a dependable solution

Static pivoting
    Improve static pivoting heuristics
    Demonstrate that an approximate maximum weight bipartite
    matching is faster and just as accurate
    Develop a memory-scalable distributed memory auction
    algorithm for static pivoting
   Jason Riedy (UCB)         Static Pivoting           17 Dec, 2010   5 / 59
Defining “dependable”

A dependable solver for Ax = b returns a result x with small error
often enough that you expect success with a small error, and clearly
signals results that likely contain large errors.

         True error     Difficulty      Alg. reports      w/likeliness
   O(mach. precision)    not bad          success        Very likely
                                           failure    Somewhat rare
            larger       not bad          success      (not yet seen)
                                           failure   Practically certain
   O(mach. precision)    difficult          success    Whenever feasible
                                           failure   Practically certain
            larger       difficult          success      (not yet seen)
                                           failure       Very likely


    Jason Riedy (UCB)         Static Pivoting                 17 Dec, 2010   6 / 59
Introducing the errors and targets
                                                                                                                                y1

                                                                         A b
                                                                          −1


                                 (A, b)

                                 (A, b)
                                                                        A −1b
                                                                                                                                x
      LU: Small backward error                                                      LU: Error in y ∝ difficulty
            2−25


                                                                                             20


            2−30                                                                                                                                      Percent

                                                            Percent                                                                                       0.5%

                                                                1%                                                                                        1.0%
                                                                                            2−10
    Error




                                                                                                                                                          1.5%
                                                                                    Error
                                                                2%
                −35                                                                                                                                       2.0%
            2                                                   3%
                                                                4%                                                                                        2.5%
                                                                                                                                                          3.0%
                                                                                                                                                          3.5%
                −40
                                                                                            2−20
            2




            2−45                                                                            2−30

                      25   210        215       220   225                                          20   25   210        215         220   225   230
                                   Difficulty                                                                      Difficulty




   Jason Riedy (UCB)                                              Static Pivoting                                                           17 Dec, 2010         7 / 59
Introducing the errors and targets
                                                                                                                               y1

                                                                            A b
                                                                             −1
                                                                                                                    yk
                 (A, b)

                  (A, b)                                                                                                            yk
                                                                          A −1b
                                                                                                                               x
 Refined: Accepted with small errors in y , or flagged with unknown error.
                                                Successful                                Flagged

                                    20




                                   2−10




                                   2−20

                                                                                                                % of systems
                                                                                                                   0.2%
                                   2−30                                                                            0.4%
                           Error




                                                                                                                   0.6%
                                                                                                                   0.8%
                                                                                                                   1.0%
                                   2−40
                                                                                                                   1.2%
                                                                                                                   1.4%

                                   2−50




                                   2−60




                                          210   220          230    240            210   220        230   240
                                                                      Difficulty


    Jason Riedy (UCB)                                              Static Pivoting                                                       17 Dec, 2010   7 / 59
Iterative refinement

                        Newton’s method applied to Ax = b.

Repeat until done:
  1  Compute the residual ri = b − Ayi using extra precision εr .
  2  Solve Ady i = ri for the correction using working precision εw .
  3  Increment yi+1 = yi + dy i , maintaining y to extra precision εx .
Precisions:
Working precision εw The precision used for storing (and factoring)
             A: IEEE754 single (εw = 2−24 ), double (εw = 2−53 ), etc.
Residual precision εr At least double working precision, εr ≤ ε2 w
Solution precision εx At least double working precision, εx ≤ ε2 w
Latter two may be implemented in software.

    Jason Riedy (UCB)                Static Pivoting         17 Dec, 2010   8 / 59
Definitions



   Errors:
          Backward (relative) error
          Forward (relative) error
   Difficulty:
          Condition numbers: sensitivity to perturbations
          Element growth: error from factorization




   Jason Riedy (UCB)            Static Pivoting             17 Dec, 2010   9 / 59
Error measures: Backward error
         How close is the nearest system satisfying Ay1 = b?
                                                      y1

                                                 A b
                                                  −1


                   (A, b)

                    (A, b)
                                                A −1b
                                                                   x
Three ways, given r1 = b − Ay1 :
                         r1 ∞                                           |r1 |
  Normwise        A     y1 ∞ + b
                        ∞             ∞
                                                 Componentwise     |A| |y1 |+|b| ∞
                         r1 ∞
Columnwise                                       Note: Elementwise division, 0/0 = 0,
                 (max |A|) |y1 |+ b   ∞
                                                 and max produces a row vector
    Jason Riedy (UCB)                     Static Pivoting              17 Dec, 2010   10 / 59
Error measures: Forward error
                              How close is y1 to x?

                                                                  y1

                                            A b
                                             −1


                   (A, b)

                    (A, b)
                                A −1b
Two ways and two measuring sticks:                                x

                                   y1 −x ∞             y1 −x ∞
                        Normwise     x ∞                 y1 ∞
                                   y1 −x               y1 −x
             Componentwise            x   ∞               y1  ∞



    Jason Riedy (UCB)                Static Pivoting                   17 Dec, 2010   10 / 59
Error sensitivity: Conditioning
             How sensitive is y1 to perturbations in A and b?
                                                        y1

                                       A b
                                        −1


                   (A, b)

                    (A, b)
                                      A −1b
                                                      x


          forward error ≤ condition number × backward error

Each combination has a condition number. We choose two for use in
our difficulty measure.
    Jason Riedy (UCB)           Static Pivoting           17 Dec, 2010   10 / 59
Difficulty: condition number × element growth

   Condition number:
   Backward error κ(A−1 ) = κ(A) = A−1 ∞ A ∞
   Normwise forw. err.
               κ(A, x, b) = A−1 ∞ ( A ∞ x ∞ + b ∞ )
   Componentwise forw. err.
               ccond(A, x, b) = |A−1 | (|A| |x| + |b|) ∞
   Element growth, est. δAi in (A + δAi )y = b:
                  |δAi | ≤ 3nd |L| |U| ≤ p(nd )g 1r max |A|
   We use a col.-scaling-indep. expression allowing |L| > 1,
                              (max1≤k≤j maxi |L|(i,k))·(maxi |U|(i,j))
                  gc = maxj                   maxi |A|(i,j)




   Jason Riedy (UCB)                 Static Pivoting                     17 Dec, 2010   11 / 59
Dense test systems
   30 × 30 single, double, complex, and double complex:
            250k, 4 right-hand sides, 1M test systems
   Size chosen to sample ill-conditioned region well
   Generated as in Demmel, et al., plus b → x
                   κ∞ (A) = A−1                                                         ∞         A         ∞
                                                                        Single                                             Double


                                               15%



                                               10%



                                               5%
                       Percent of population




                                               0%
                                                                      Complex                                           Double Complex


                                               15%



                                               10%



                                               5%



                                               0%

                                                     20   210   220   230   240   250   260      270   20   210   220     230   240      250   260   270
                                                                                              Difficulty

   Jason Riedy (UCB)                                                                Static Pivoting                                                        17 Dec, 2010   12 / 59
Dense test systems
   30 × 30 single, double, complex, and double complex:
            250k, 4 right-hand sides, 1M test systems
   Size chosen to sample ill-conditioned region well
   Generated as in Demmel, et al., plus b → x
         κ(A, x, b) = A−1                                                   ∞       ( A               ∞          x      ∞        + b               ∞)
                                                                        Single                                                  Double
                                               14%

                                               12%

                                               10%

                                               8%

                                               6%

                                               4%
                       Percent of population




                                               2%

                                               0%
                                                                      Complex                                                Double Complex
                                               14%

                                               12%

                                               10%

                                               8%

                                               6%

                                               4%

                                               2%

                                               0%

                                                     20   210   220   230   240   250    260    270         20   210   220     230   240   250   260   270
                                                                                               Difficulty

   Jason Riedy (UCB)                                                                    Static Pivoting                                                      17 Dec, 2010   12 / 59
Dense test systems
   30 × 30 single, double, complex, and double complex:
            250k, 4 right-hand sides, 1M test systems
   Size chosen to sample ill-conditioned region well
   Generated as in Demmel, et al., plus b → x
          ccond(A, x, b) = |A−1 | (|A| |x| + |b|)                                                                          ∞
                                                                 Single                                Double
                                               12%

                                               10%

                                               8%

                                               6%

                                               4%
                       Percent of population




                                               2%

                                               0%
                                                                Complex                             Double Complex
                                               12%

                                               10%

                                               8%

                                               6%

                                               4%

                                               2%

                                               0%

                                                     20   220    240      260     280    20   220        240         260       280
                                                                                Difficulty

   Jason Riedy (UCB)                                                       Static Pivoting                                           17 Dec, 2010   12 / 59
Results: Dependable errors
                               nberr          colberr          cberr           nferr           nferrx           cferr          cferrx
                      20
                  2−10




                                                                                                                                             Converged
                  2−20
                  2−30
                  2−40
                  2−50
                  2−60

                      20
                  2−10




                                                                                                                                             No Progress
                  2−20
                  2−30
                  2−40
                  2−50                                                                                                                                         % of systems
                  2−60                                                                                                                                            10−5
          Error




                      20
                                                                                                                                                                  10−4
                  2−10                                                                                                                                            10−3
                  2−20                                                                                                                                            10−2




                                                                                                                                             Unstable
                  2−30
                  2−40
                  2−50
                  2−60

                      20
                  2−10




                                                                                                                                             Iteration Limit
                   −20
                  2
                  2−30
                  2−40
                  2−50
                  2−60


                           20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240
                                                                          Difficulty



   Jason Riedy (UCB)                                                           Static Pivoting                                                                       17 Dec, 2010   13 / 59
How?

                                 cberr                                              cferr
          20

         2−10

         2−20                                                                                                       % of systems
                                                                                                                       0.00%
         2−30
 Error




                                                                                                                       0.01%
         2−40                                                                                                          0.10%
         2−50                                                                                                          1.00%

         2−60



                25   210   215   220     225   230   235   240     25   210   215   220     225   230   235   240
                                                      Difficulty




         Carry the intermediate soln. yi to twice the working precision.
         Refine the backward error down to nearly ε2 .w
         By “forward error ≤ conditioning × backward error”, the
         forward error for well-enough conditioned problems is nearly εw .
         Jason Riedy (UCB)                                  Static Pivoting                                     17 Dec, 2010       14 / 59
How?

                                 cberr                                              cferr
          20

         2−10

         2−20                                                                                                       % of systems
                                                                                                                       0.00%
         2−30
 Error




                                                                                                                       0.01%
         2−40                                                                                                          0.10%
         2−50                                                                                                          1.00%

         2−60



                25   210   215   220     225   230   235   240     25   210   215   220     225   230   235   240
                                                      Difficulty




         Carry the intermediate soln. yi to twice the working precision.
         Refine the backward error down to nearly ε2 .w
         By “forward error ≤ conditioning × backward error”, the
         forward error for well-enough conditioned problems is nearly εw .
         Jason Riedy (UCB)                                  Static Pivoting                                     17 Dec, 2010       14 / 59
Results: Comparison with xGESVXX

               Precision         Accepted         Rejected
                                well   ill        well ill
               Single           79% 15%           1%   5%
               Single complex   76% 19%           1%   4%
               Double           87%  9%           1%   5%
               Double complex   85% 11%           1%   3%


   Accepted, ill-conditioned systems are those gained by our routine
   that xGESVXX rejects.
   Rejected, well-conditioned systems are those lost by our routine
   but accepted by xGESVXX.


   Jason Riedy (UCB)            Static Pivoting          17 Dec, 2010   15 / 59
Results: Iteration counts, single precision
                                    nberr               colberr               cberr                 ndx                  cdx
                         30
                         25




                                                                                                                                       Converged
                         20
                         15
                         10
                         5

                         30
                         25




                                                                                                                                       No Progress
                         20
                         15                                                                                                                              % of systems
                         10                                                                                                                                 1%
          # Iterations




                         5                                                                                                                                  2%
                         30
                                                                                                                                                            3%
                         25
                                                                                                                                                            4%
                                                                                                                                                            5%




                                                                                                                                       Unstable
                         20
                         15                                                                                                                                 6%
                         10
                         5

                         30




                                                                                                                                       Iteration Limit
                         25
                         20
                         15
                         10
                         5


                              20 210 220 230 240   20 210 220 230 240   20 210 220 230 240   20 210 220 230 240   20 210 220 230 240
                                                                           Difficulty


                                                                  Set limit at five.
   Jason Riedy (UCB)                                                           Static Pivoting                                                                 17 Dec, 2010   16 / 59
Results: Iteration counts, single complex precision
                                      nberr                   colberr                 cberr                    ndx                     cdx
                         30
                         25




                                                                                                                                                        Converged
                         20
                         15
                         10
                         5

                         30
                         25




                                                                                                                                                        No Progress
                         20
                         15
                         10                                                                                                                                               % of systems
          # Iterations




                         5                                                                                                                                                   2%
                         30
                                                                                                                                                                             4%
                         25
                                                                                                                                                                             6%
                                                                                                                                                                             8%




                                                                                                                                                        Unstable
                         20
                         15
                         10
                         5

                         30




                                                                                                                                                        Iteration Limit
                         25
                         20
                         15
                         10
                         5

                                  5                       5                       5                       5                       5
                              20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35
                                     2 2 2 2 2               2 2 2 2 2               2 2 2 2 2               2 2 2 2 2               2 2 2 2 2
                                                                                 Difficulty


                                                                   Set limit at seven.
   Jason Riedy (UCB)                                                                  Static Pivoting                                                                           17 Dec, 2010   17 / 59
Results: Iteration counts, double precision
                                     nberr             colberr              cberr              ndx                cdx
                         30
                         25




                                                                                                                             Converged
                         20
                         15
                         10
                         5

                         30
                         25




                                                                                                                             No Progress
                         20
                         15                                                                                                                    % of systems
                         10                                                                                                                       0.5%
          # Iterations




                         5                                                                                                                        1.0%
                         30
                                                                                                                                                  1.5%
                         25
                                                                                                                                                  2.0%
                                                                                                                                                  2.5%




                                                                                                                             Unstable
                         20
                         15                                                                                                                       3.0%
                         10
                         5

                         30




                                                                                                                             Iteration Limit
                         25
                         20
                         15
                         10
                         5


                              20   220 240 260   20   220 240 260   20   220 240 260   20   220 240 260   20   220 240 260
                                                                         Difficulty


                                                                 Set limit at ten.
   Jason Riedy (UCB)                                                        Static Pivoting                                                          17 Dec, 2010   18 / 59
Results: Iteration counts, double complex precision
                                     nberr                colberr                cberr                   ndx                   cdx
                         30
                         25




                                                                                                                                              Converged
                         20
                         15
                         10
                         5

                         30
                         25




                                                                                                                                              No Progress
                         20
                         15
                                                                                                                                                                % of systems
                         10                                                                                                                                        0.5%
          # Iterations




                         5                                                                                                                                         1.0%
                                                                                                                                                                   1.5%
                         30                                                                                                                                        2.0%
                         25                                                                                                                                        2.5%




                                                                                                                                              Unstable
                         20                                                                                                                                        3.0%
                         15                                                                                                                                        3.5%
                         10
                         5

                         30




                                                                                                                                              Iteration Limit
                         25
                         20
                         15
                         10
                         5


                              20 210220230240250260 20 210220230240250260 20 210220230240250260 20 210220230240250260 20 210220230240250260
                                                                             Difficulty


                                                                     Set limit at 15.
   Jason Riedy (UCB)                                                              Static Pivoting                                                                     17 Dec, 2010   19 / 59
Static pivoting

    If a pivot |A(j, j)| < T , perturb up to T by adding

                         sign(A(j, j)) · (T − |A(j, j)|).

    Forcibly increases backward error, decreases element growth
    In sparse systems, few updates should occur to an entry.
    Large diagonal entries should remain large...

Thresholding heuristics
                           SuperLU γ · A 1
                     column-relative γ · max |A(:, j)|
                    diagonal-relative γ · |A(j, j)|
                              √
                  γ = 2−26 ≈ εw , 2−38 , or 2−43 = 210 εw

   Jason Riedy (UCB)             Static Pivoting            17 Dec, 2010   20 / 59
Sparse test systems

     Matrices are from the UF Collection, chosen from existing
     comparisons between SuperLU, MUMPS, and UMFPACK.
           Wide range of conditioning and numerical scaling
     Compute “True” solutions using a doubled-double-extended
     factorization and quad-double-extended refinement with a
     modified TAUCS.
     Refinement uses LAPACK-style numerical scaling throughout,
     but the test systems are generated in the matrix’s given scaling.
     Also tested on singular systems; no solutions accepted.
At some point, plan on feeding the “true” solutions into the UF
Collection...


    Jason Riedy (UCB)           Static Pivoting               17 Dec, 2010   21 / 59
Sparse normwise conditioning


                         8%
 Percent of population




                         6%




                         4%




                         2%




                         0%


                                       210        220    230               240   250
                                                        Difficulty


                              Jason Riedy (UCB)                Static Pivoting         17 Dec, 2010   22 / 59
Sparse componentwise conditioning
                         8%




                         6%
 Percent of population




                         4%




                         2%




                         0%


                                            220   230                240        250   260
                                                        Difficulty


                              Jason Riedy (UCB)               Static Pivoting               17 Dec, 2010   23 / 59
Results: SuperLU perturbation heuristic
       Before refinement, by max. perturbation amount
                                                      nberr              colberr                cberr                 nferr               nferrx                cferr               cferrx
                                           20
                                        −10
                                       2

                                       2−20




                                                                                                                                                                                                     2^10 * eps
                                       2−30

                                       2−40

                                       2−50

                                       2−60
          Error / sqrt(max row deg.)




                                           20

                                       2−10




                                                                                                                                                                                                     2^−12 * sqrt(eps)
                                                                                                                                                                                                                         % of systems
                                       2−20
                                                                                                                                                                                                                            0.1%
                                       2−30
                                                                                                                                                                                                                            0.3%
                                       2−40                                                                                                                                                                                 1.0%
                                       2−50                                                                                                                                                                                 3.2%
                                       2−60


                                           20

                                       2−10

                                       2−20




                                                                                                                                                                                                     sqrt(eps)
                                       2−30

                                       2−40

                                       2−50

                                       2−60


                                                20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60
                                                 222222               222222               222222               222222               222222               222222               222222
                                                                                                               Difficulty
   Jason Riedy (UCB)                                                                                                 Static Pivoting                                                                                           17 Dec, 2010   24 / 59
Results: Column-relative perturbation heuristic
       Before refinement, by max. perturbation amount
                                                      nberr              colberr                cberr                 nferr               nferrx                cferr               cferrx
                                           20
                                        −10
                                       2

                                       2−20




                                                                                                                                                                                                     2^10 * eps
                                       2−30

                                       2−40

                                       2−50

                                       2−60
          Error / sqrt(max row deg.)




                                           20

                                       2−10




                                                                                                                                                                                                     2^−12 * sqrt(eps)
                                                                                                                                                                                                                         % of systems
                                       2−20
                                                                                                                                                                                                                            0.1%
                                       2−30
                                                                                                                                                                                                                            0.3%
                                       2−40                                                                                                                                                                                 1.0%
                                       2−50                                                                                                                                                                                 3.2%
                                       2−60


                                           20

                                       2−10

                                       2−20




                                                                                                                                                                                                     sqrt(eps)
                                       2−30

                                       2−40

                                       2−50

                                       2−60


                                                20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60
                                                 222222               222222               222222               222222               222222               222222               222222
                                                                                                               Difficulty
   Jason Riedy (UCB)                                                                                                 Static Pivoting                                                                                           17 Dec, 2010   25 / 59
Results: Diagonal-relative perturbation heuristic
       Before refinement, by max. perturbation amount
                                                      nberr              colberr                cberr                 nferr               nferrx                cferr               cferrx
                                           20
                                        −10
                                       2

                                       2−20




                                                                                                                                                                                                     2^10 * eps
                                       2−30

                                       2−40

                                       2−50

                                       2−60
          Error / sqrt(max row deg.)




                                           20

                                       2−10




                                                                                                                                                                                                     2^−12 * sqrt(eps)
                                                                                                                                                                                                                         % of systems
                                       2−20
                                                                                                                                                                                                                            0.1%
                                       2−30
                                                                                                                                                                                                                            0.3%
                                       2−40                                                                                                                                                                                 1.0%
                                       2−50                                                                                                                                                                                 3.2%
                                       2−60


                                           20

                                       2−10

                                       2−20




                                                                                                                                                                                                     sqrt(eps)
                                       2−30

                                       2−40

                                       2−50

                                       2−60


                                                20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60
                                                 222222               222222               222222               222222               222222               222222               222222
                                                                                                               Difficulty
   Jason Riedy (UCB)                                                                                                 Static Pivoting                                                                                           17 Dec, 2010   26 / 59
Results: SuperLU perturbation heuristic
               After refinement, with γ = 2−43 = 210 εw




   Jason Riedy (UCB)          Static Pivoting        17 Dec, 2010   27 / 59
Results: Column-relative perturbation heuristic
               After refinement, with γ = 2−43 = 210 εw




   Jason Riedy (UCB)          Static Pivoting        17 Dec, 2010   28 / 59
Results: Diagonal-relative perturbation heuristic
               After refinement, with γ = 2−43 = 210 εw




   Jason Riedy (UCB)          Static Pivoting        17 Dec, 2010   29 / 59
results
        Level and heuristic              Result
                              Trust both Trust nwise   Reject
       2−43 = 210 · εf
         SuperLU                    42.9%        8.0% 49.0%
         Column-relative            55.7%        5.7% 38.6%
         Diagonal-relative          55.8%        5.9% 38.3%
        −38            √
       2    =≈ 2−12 · εf
         SuperLU                    36.6%        6.7% 56.6%
         Column-relative            52.4%        6.5% 41.2%
         Diagonal-relative          53.7%        7.2% 39.1%
              √
       2−26 ≈ εf
         SuperLU                    32.4%        4.0% 63.6%
         Column-relative            42.2%        4.2% 53.6%
         Diagonal-relative          47.4%        4.7% 47.9%
    Jason Riedy (UCB)          Static Pivoting         17 Dec, 2010   30 / 59
Sparse Matrix to Bipartite Graph to Pivots
          Col 1Col 2Col 3Col 4                                             Col 1Col 2Col 3Col 4


Row 1                            Row 1                     Col 1   Row 2

Row 2                            Row 2                     Col 2   Row 3

Row 3                            Row 3                     Col 3   Row 1

Row 4                            Row 4                     Col 4   Row 4




Bipartite model
        Each row and column is a vertex.
        Each explicit entry is an edge.
        Want to chose “largest” entries for pivots.
        Maximum weight complete bipartite matching:
                           linear assignment problem
        Jason Riedy (UCB)                Static Pivoting                      17 Dec, 2010   31 / 59
Mathematical Form
“Just” a linear optimization problem:
          B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A|
          X n × n permutation matrix: the matching
     pr , πc dual variables, will be price and profit
     1r , 1c unit entry vectors corresponding to rows, cols
 Lin. assignment prob.                         Dual problem

   maximize             Tr B T X                 minimize 1T pr + 1T πc
                                                           r       c
    X∈ n×n                                            pr ,πc

  subject to X 1c = 1r ,                       subject to pr 1T + 1r πc ≥ B.
                                                              c
                                                                      T

                    X T 1r = 1c , and
                    X ≥ 0.

    Jason Riedy (UCB)               Static Pivoting               17 Dec, 2010   32 / 59
Mathematical Form
“Just” a linear optimization problem:
          B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A|
          X n × n permutation matrix: the matching
     pr , πc dual variables, will be price and profit
     1r , 1c unit entry vectors corresponding to rows, cols
 Lin. assignment prob.                         Dual problem
                                               Implicit form:
                           T
   maximize             Tr B X
    X∈ n×n                                      minimize 1T pr
                                                          r
                                                      pr
  subject to X 1c = 1r ,
                                                            +         max(B(i, j)
                    X T 1r = 1c , and                                 i∈R
                                                                j∈C
                    X ≥ 0.                                        − pr (j)).

    Jason Riedy (UCB)               Static Pivoting                   17 Dec, 2010   32 / 59
Do We Need a Special Method?
 The LAP:                                       Standard form:
   maximize            Tr B T X                              min    cT x
    X∈    n×n                                                 x
  subject to       X 1c = 1r ,                         subject to   Ax = 1r +c , and
                        T                                           x ≥ 0.
                   X 1r = 1c , and
                   X ≥ 0.
                                                A: 2n × τ vertex-edge matrix

   Network optimization kills simplex methods.
          (“Smoothed analysis” does not apply.)
   Interior point algs need to round the solution.
          (And need to solve Ax = b for a much larger A, although
          theoretically great in NC.)
   Combinatorial methods should be faster.
          (But unpredictable!)
   Jason Riedy (UCB)                 Static Pivoting                       17 Dec, 2010   33 / 59
Properties from Optimization


Complementary slackness
                       X        c
                                        T
                           (pr 1T + 1r πc − B) = 0.


   If (i, j) is in the matching (X (i, j) = 0), then
   pr (i) + πc (j) = B(i, j).
   Used to chose matching edges and modify dual variables in
   combinatorial algorithms.




   Jason Riedy (UCB)            Static Pivoting       17 Dec, 2010   34 / 59
Properties from Optimization
Relaxed problem
Introduce a parameter µ, two interpretations:
     from a barrier function related to X ≥ 0, or
     from the auction algorithm (later).
Then
          Tr B T X∗ ≤ 1T pr + 1T πc ≤ Tr B T X∗ + (n − 1)µ,
                        r       c

or the computed dual value (and hence computed primal matching) is
within (n − 1)µ of the optimal primal.
     Very useful for finding approximately optimal matchings.

Feasibility bound
Starting from zero prices:
                  pr (i) ≤ (n − 1)(µ + finite range of B)
    Jason Riedy (UCB)           Static Pivoting            17 Dec, 2010   35 / 59
Algorithms for Solving the LAP

   Goal: A parallel algorithm that justifies buying big machines.
   Acceptable: A distributed algorithm; matrix is on many nodes.
   Choices:
          Simplex or continuous / interior-point
                  Plain simplex blows up, network simplex difficult to parallelize.
                  Rounding for interior point often falls back on matching.
                  (Optimal IP algorithm: Goldberg, Plotkin, Shmoys, Tardos.
                  Needs factorization.)
          Augmenting-path based (Mc64: Duff and Koster)
                  Based on depth- or breadth-first search.
                  Both are P-complete, inherently sequential (Greenlaw, Reif).
          Auctions (Bertsekas, et al.)
                  Only length-1 or -2 alternating paths; global sync for duals.


   Jason Riedy (UCB)                 Static Pivoting                17 Dec, 2010   36 / 59
Auction Algorithms
   Discussion will be column-major.
   General structure:
      1   Each unmatched column finds the “best” row, places a bid.
                  The dual variable pr holds the prices.
                  The profit πc is implicit. (No significant FP errors!)
                  Each entry’s value: benefit B(i, j)− price p(i).
                  A bid maximally increases the price of the most valuable row.
      2   Bids are reconciled.
                  Highest proposed price wins, forms a match.
                  Loser needs to re-bid.
                  Some versions need tie-breaking; here least column.
      3   Repeat.
                  Eventually everyone will be matched, or
                  some price will be too high.
   Seq. implementation in ∼40–50 lines, can compete with Mc64
   Some corner cases to handle. . .
   Jason Riedy (UCB)                Static Pivoting               17 Dec, 2010   37 / 59
The Bid-Finding Loop
For each unmatched column:
                                                        Price
                  Row Index
                   Row Entry



                               value = entry − price
                               Save largest and second−largest
                               Bid price incr: diff. in values

Differences from sparse matrix-vector products
    Not all columns, rows used every iteration. (sparse matrix,
    sparse vector)
    Hence output price updates are scattered.
    More local work per entry
    Jason Riedy (UCB)                Static Pivoting             17 Dec, 2010   38 / 59
The Bid-Finding Loop
For each unmatched column:
                                                        Price
                  Row Index
                   Row Entry



                               value = entry − price
                               Save largest and second−largest
                               Bid price incr: diff. in values

Little points
    Increase bid price by µ to avoid loops
           Needs care in floating-point for small µ.
    Single adjacent row → ∞ price
           Affects feasibility test, computing dual
    Jason Riedy (UCB)                Static Pivoting             17 Dec, 2010   38 / 59
Termination

   Once a row is matched, it stays matched.
          A new bid may swap it to another column.
          The matching (primal) increases monotonically.
   Prices only increase.
          The dual does not change when a row is newly matched.
          But the dual may decrease when a row is taken.
          The dual decreases monotonically.
   Subtle part: If the dual doesn’t decrease. . .
          It’s ok. Can show the new edge begins an augmenting path that
          increases the matching or an alternating path that decreases the
          dual.



   Jason Riedy (UCB)            Static Pivoting             17 Dec, 2010   39 / 59
Successive Approximation (µ-scaling)


   Simple auctions aren’t really competitive with Mc64.
   Start with a rough approximation (large µ) and refine.
   Called -scaling in the literature, but µ-scaling is better.
   Preserve the prices pr at each step, but clear the matching.
   Note: Do not clear matches associated with ∞ prices!
   Equivalent to finding diagonal scaling Dr ADc and matching
   again on the new B.
   Problem: Performance strongly depends on initial scaling.
   Also depends strongly on hidden parameters.



   Jason Riedy (UCB)         Static Pivoting           17 Dec, 2010   40 / 59
Sequential performance: Auction v. MC64
                                                                         MC64
        Group             Name            Auction (s)      MC64 (s)     Auction

        Bai               af23560                  0.025      0.017        0.68
     FEMLAB             poisson3Db                 0.014      0.040        2.74
      FIDAP                 ex11                   0.060      0.015        0.26
     GHS indef           cont-300                  0.007      0.019        2.89
     GHS indef            ncvxqp5                  0.338      0.794        2.35
      Hamm                scircuit                 0.048      0.024        0.50
     Hollinger           g7jac200                  0.355      0.817        2.30
      Mallya               lhr14                   0.044      0.026        0.60
  Schenk IBMSDS        3D 51448 3D                 0.031      0.010        0.33
  Schenk IBMSDS          matrix 9                  0.074      0.024        0.33
    Schenk ISEI          barrier2-4                0.291      0.044        0.15
      Vavasis             av41092                  5.462      3.595        0.66
       Zhao                Zhao2                   1.041      3.237        3.11

   Jason Riedy (UCB)             Static Pivoting                 17 Dec, 2010     41 / 59
Sequential performance: Highly variable
                                                                         Row
          Group           Name             By col (s)    By row (s)      Col

         Bai              af23560                0.025       0.028      1.13
      FEMLAB            poisson3Db               0.014       0.016      1.11
       FIDAP                ex11                 0.060       0.060      1.00
      GHS indef          cont-300                0.007       0.006      0.84
      GHS indef           ncvxqp5                0.338       0.318      0.94
       Hamm               scircuit               0.048       0.047      0.99
      Hollinger          g7jac200                0.355       0.339      0.95
       Mallya              lhr14                 0.044       0.065      1.47
   Schenk IBMSDS       3D 51448 3D               0.031       0.282      9.22
   Schenk IBMSDS         matrix 9                0.074       0.613      8.29
     Schenk ISEI         barrier2-4              0.291       0.193      0.66
       Vavasis            av41092                5.462       4.083      0.75
        Zhao               Zhao2                 1.041       0.609      0.58

   Jason Riedy (UCB)           Static Pivoting                  17 Dec, 2010   42 / 59
Sequential performance: Highly variable
                                                                         Int
             Group        Name                Float (s)   Int (s)       Float

            Bai           af23560                0.025    0.040         1.61
         FEMLAB         poisson3Db               0.015    0.016         1.08
          FIDAP             ex11                 0.060    0.029         0.49
         GHS indef       cont-300                0.007    0.006         0.91
         GHS indef        ncvxqp5                0.338    0.425         1.26
          Hamm            scircuit               0.048    0.016         0.34
         Hollinger       g7jac200                0.355    1.004         2.83
          Mallya           lhr14                 0.044    0.050         1.12
      Schenk IBMSDS    3D 51448 3D               0.031    0.020         0.66
      Schenk IBMSDS      matrix 9                0.074    0.066         0.89
        Schenk ISEI      barrier2-4              0.291    0.261         0.91
          Vavasis         av41092                5.462    5.401         0.99
           Zhao            Zhao2                 1.041    2.269         2.18

   Jason Riedy (UCB)        Static Pivoting                         17 Dec, 2010   43 / 59
Approximately maximum matchings
                                         Terminal µ value
   Name                         0      5.96e-08 2.44e-04         5.00e-01
  af23560        Primal    1342850      1342850        1342850   1342670
                 Time(s)      0.14         0.05           0.03         0
                  ratio                    0.37           0.21      0.02
 poisson3Db      Primal    2483070      2483070        2483070   2483070
                 Time(s)      0.02         0.02           0.02      0.02
                  ratio                    1.01           1.04      1.07
  g7jac200       Primal    3533980      3533980        3533980   3533340
                 Time(s)      2.98         1.07           0.28      0.18
                  ratio                    0.36           0.09      0.06
  av41092        Primal    3156210      3156210        3156210   3155920
                 Time(s)     24.51         8.09           2.48      0.11
                  ratio                    0.33           0.10      0.00
   Zhao2         Primal     333891       333891         333891    333487
                 Time(s)      7.69         2.37           3.65      0.02
                  ratio                    0.31           0.47      0.00

    Jason Riedy (UCB)                Static Pivoting                  17 Dec, 2010   44 / 59
Setting / Lowering Parallel Expectations

Performance scalability?
   Originally proposed (early 1990s) when
      cpu speed ≈ memory speed ≈ network speed ≈ slow.
   Now:
         cpu speed    memory latency > network latency.
   The number of communication phases dominates matching
   algorithms (auction and others).
   Communication patterns are very irregular.
   Latency and software overhead is not improving. . .

Scaled back goal
       It suffices to not slow down much on distributed data.

   Jason Riedy (UCB)         Static Pivoting         17 Dec, 2010   45 / 59
Basic Idea: Run Local Auctions, Treat as Bids

                                         1
                                         0            1
                                                      0
         1 1
         0 0
         1 1
         0 0
       1111111
       0000000
         1 1
         0 0
       1111111
       0000000
         1 1
         0 0
                                   111
                                   000
                                   111
                                   000
                                         1
                                         0
                                         1
                                         0
                                         1
                                         0
                                         1
                                         0
                                                  111
                                                  000
                                                  111
                                                  000
                                                      1
                                                      0
                                                      1
                                                      0
                                                      1
                                                      0
                                                      1
                                                      0
                                                                    11111
                                                                    00000
1111
0000
         1 1
         0 0
         1 1
         0 0             1111
                         0000            1
                                         0
                                         1
                                         0    11
                                              00
                                               111
                                               000    11
                                                      00
                                                      1
                                                      0
                                                      1
                                                      0111
                                                       000
                       ⇒ 0000
                         1111                 11
                                              00
                                               111
                                               000    11
                                                      00
                                                       111
                                                       000
         1 1
         0 0                             1
                                         0            1
                                                      0
1111
0000
1111
0000
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0             1111
                         0000
                                         1
                                         0
                                         1
                                         0
                                         1
                                         0    11
                                              00
                                               111
                                               000
                                                      1
                                                      0
                                                      11
                                                      00
                                                      1
                                                      0
                                                      1
                                                      0111
                                                       000
1111
0000
         1 1
         0 0
         1 1
         0 0
          B
         1 1
         0 0             1111
                         0000
                         1111
                         0000
                                         1
                                         0
                                         1
                                         0    11
                                              00
                                               111
                                               000
                                              11
                                              00
                                               111
                                               000    11
                                                      00
                                                      1
                                                      0
                                                      1
                                                      0111
                                                       000
                                                      11
                                                      00
                                                       111
                                                       000
1111
0000
1111
0000
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0             1111
                         0000
                                         1
                                         0
                                         1
                                         0
                                         1
                                         0
                                         1
                                         0    11
                                              00
                                               111
                                               000
                                                      1
                                                      0
                                                      1
                                                      0
                                                      11
                                                      00
                                                      1
                                                      0
                                                      1
                                                      0111
                                                       000
         1 1
         0 0                             1
                                         0            1
                                                      0
         1 1
         0 0                             1
                                         0            1
                                                      0
                                    P1 0 1         P2 0
                                                      1              P3
   Slice the matrix into pieces, run local auctions.
   The winning local bids are the slices’ bids.
   Merge. . . (“And then a miracle occurs. . .”)
   Need to keep some data in sync for termination.


   Jason Riedy (UCB)        Static Pivoting          17 Dec, 2010    46 / 59
Basic Idea: Run Local Auctions, Treat as Bids

                                     1
                                     0               1
                                                     0
         1 1
         0 0
         1 1
         0 0
       1111111
       0000000
         1 1
         0 0
       1111111
       0000000
         1 1
         0 0
                                 111
                                 000
                                 111
                                 000
                                     1
                                     0
                                     1
                                     0
                                     1
                                     0
                                     1
                                     0
                                              111
                                              000
                                              111
                                              000
                                                     1
                                                     0
                                                     1
                                                     0
                                                     1
                                                     0
                                                     1
                                                     0
                                                                         11111
                                                                         00000
1111
0000
         1 1
         0 0
         1 1
         0 0             1111
                         0000        1
                                     0
                                     1
                                     0           11
                                                 00
                                                  111
                                                  0001
                                                     0
                                                     1
                                                     0     11
                                                           00
                                                            111
                                                            000
                       ⇒ 0000
                         1111                    11
                                                 00
                                                  111
                                                  000      11
                                                           00
                                                            111
                                                            000
         1 1
         0 0                         1
                                     0               1
                                                     0
1111
0000
1111
0000
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0             1111
                         0000
                                     1
                                     0
                                     1
                                     0
                                     1
                                     0           11
                                                 00
                                                  111
                                                  000
                                                     1
                                                     0
                                                     1
                                                     0
                                                     1
                                                     0     11
                                                           00
                                                            111
                                                            000
1111
0000
         1 1
         0 0
         1 1
         0 0
          B
         1 1
         0 0             1111
                         0000
                         1111
                         0000
                                     1
                                     0
                                     1
                                     0           11
                                                 00
                                                  111
                                                  000
                                                 11
                                                 00
                                                  111
                                                  000
                                                     1
                                                     0
                                                     1
                                                     0     11
                                                           00
                                                            111
                                                            000
                                                           11
                                                           00
                                                            111
                                                            000
1111
0000
1111
0000
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0
         1 1
         0 0             1111
                         0000
                                     1
                                     0
                                     1
                                     0
                                     1
                                     0
                                     1
                                     0           11
                                                 00
                                                  111
                                                  000
                                                     1
                                                     0
                                                     1
                                                     0
                                                     1
                                                     0
                                                     1
                                                     0     11
                                                           00
                                                            111
                                                            000
         1 1
         0 0                         1
                                     0               1
                                                     0
         1 1
         0 0                         1
                                     0               1
                                                     0
                                  P1 0
                                     1         P2 0  1                    P3
   Practically memory scalable: Compact the local pieces.
   Have not experimented with simple SMP version.
          Sequential performance is limited by the memory system.
   Note: Could be useful for multicore w/local memory.


   Jason Riedy (UCB)           Static Pivoting            17 Dec, 2010    46 / 59
Speed-up?

                     104



                     103



                     102



                     101
          Speed−up




                     100



                     10−1



                     10−2



                     10−3

                            5    10                 15   20
                                Number of processors



   Jason Riedy (UCB)              Static Pivoting             17 Dec, 2010   47 / 59
Speed-up: A bit better measuring appropriately

                                                           104



                                                           103
          Speed−up relative to reducing to the root node




                                                           102



                                                           101



                                                           100



                                                           10−1



                                                           10−2



                                                           10−3

                                                                  5    10                 15   20
                                                                      Number of processors



   Jason Riedy (UCB)                                                    Static Pivoting             17 Dec, 2010   48 / 59
Comparing distributed with reduce-to-root

                     104



                     103



                     102



                     101
          Speed−up




                                                                                            To root
                                                                                            Dist.
                     100    q
                            q                                    q
                            q       q
                                    q                 q
                                                      q
                            q       q
                                    q                 q
                                                      q          q
                            q       q                 q          q
                            q       q
                                    q                 q
                                                      q          q
                                                                 q                 q
                            q       q
                                    q       q         q
                                                      q          q        q        q
                                                      q          q        q
                            q       q                                              q

                     10−1



                     10−2



                     10−3

                                2       3       4         8          12       16       24
                                            Number of processors



   Jason Riedy (UCB)                                Static Pivoting                         17 Dec, 2010   49 / 59
Iteration order still matters
                                                av41092                                               shyy161




                                                                           G




                      102



                                                             G




                           1
                      10
           Time (s)




                                                                                                                             Direction
                                           G
                                           G        G
                                 G                                                                                            G Row−major
                               G G G

                                                                                                                                 Col−major



                      100
                                                                                                       G
                                                                                                                         G




                                                                                                                G




                       −1
                      10
                                                                                             G
                                                                                             G
                                                                                     G
                                                                               G G




                                       5       10       15           20                  5       10        15       20
                                                                 Number of Processors



    Jason Riedy (UCB)                                                     Static Pivoting                                           17 Dec, 2010   50 / 59
Many different speed-up profiles
                                                   af23560                                             bmwcra_1
                     101                                                   G




                     100
                                                              G
                                                    G
                                          G
                     10−1
                                          G
                                  G                                            G
                            G G                                                  G
                                                                                      G                           G          G
                                                                                              G         G

                     10−2

                     10−3

                     10−4
          Time (s)




                                                   garon2                                              stomach
                     101
                            G G
                                  G



                     100

                                                                               G
                     10−1                                                          G G        G
                                                                                              G         G
                                                                                                                  G          G




                       −2                                                  G
                     10
                                          G
                                          G         G         G



                     10−3

                     10−4

                                      5       10         15         20                    5       10         15       20
                                                                  Number of Processors



   Jason Riedy (UCB)                                                Static Pivoting                                        17 Dec, 2010   51 / 59
So what happens in some cases?

   Matrix av41092 has one large strongly connected component.
          (The square blocks in a Dulmage-Mendelsohn decomposition.)
   The SCC spans all the processors.
   Every edge in an SCC is a part of some complete matching.
   Horrible performance from:
          starting along a non-max-weight matching,
          making it almost complete,
          then an edge-by-edge search for nearby matchings,
          requiring a communication phase almost per edge.
   Conjecture: This type of performance land-mine will affect any
   0-1 combinatorial algorithm.


   Jason Riedy (UCB)           Static Pivoting            17 Dec, 2010   52 / 59
Improvements?
   Approximate matchings: Speeds up the sequential case,
   eliminating any “speed-up.”
   Rearranging deck chairs: few-to-few communication
          Build a directory of which nodes share rows: collapsed BB T .
          Send only to/from those neighbors.
          Minor improvement over MPI Allgatherv for a huge effort.
          Latency not a major factor...
   Improving communication may not be worth it. . .
          The real problem is the number of comm. phases.
          If diagonal is the matching, everything is overhead.
          Or if there’s a large SCC. . .
   Another alternative: Multiple algorithms at once.
          Run Bora U¸ar’s alg. on one set of nodes, auction on another,
                     c
          transposed auction on another, . . .
          Requires some painful software engineering.
   Jason Riedy (UCB)             Static Pivoting             17 Dec, 2010   53 / 59
Latency not a dominating factor


                                                           103
          Speed−up relative to reducing to the root node




                                                           102




                                                           101




                                                           100




                                                           10−1



                                                                  1x3               3x1                 1x8           2x4
                                                                        Number of nodes x number of procs. per node



   Jason Riedy (UCB)                                                                  Static Pivoting                       17 Dec, 2010   54 / 59
So, Could This Ever Be Parallel?

   For a given matrix-processor layout, constructing a matrix
   requiring O(n) communication is pretty easy for combinatorial
   algorithms.
          Force almost every local action to be undone at every step.
          Non-fractional combinatorial algorithms are too restricted.
   Using less-restricted optimization methods is promising, but far
   slower sequentially.
          Existing algs (Goldberg, et al.) are PRAM with n3 processors.
          General purpose methods: Cutting planes, successive SDPs
          Someone clever might find a parallel rounding algorithm.
          Solving the fractional LAP quickly would become a matter of
          finding a magic preconditioner. . .
          Maybe not a good thing for a direct method?


   Jason Riedy (UCB)            Static Pivoting            17 Dec, 2010   55 / 59
Review of contributions
Iterative refinement
    Successfully deliver dependable solutions with a little extra
    precision.
    Removed need for condition estimation.
    Built methodology for evaluating Ax = b solution methods’
    accuracy and dependability.

Static pivoting
    Tuned static pivoting heuristics to provide dependability.
    Demonstrated that an approximate maximum weight bipartite
    matching is faster and just as dependable.
    Developed a memory-scalable (although not
    performance-scalable) distributed memory auction algorithm for
    static pivoting.
   Jason Riedy (UCB)          Static Pivoting           17 Dec, 2010   56 / 59
Future directions
Iterative refinement
   Least-squares refinement demonstrated (Demmel, Hida, Li, &
   Riedy), but needs... refinement.
   Perhaps refinement could render an iterative method
   dependable. Could improve accuracy of Ady i = ri with extra
   iterations as i increases.
   Could help build trust in new methods (e.g. CALU).

Distributed matching
   Interesting software problem: Run multiple algorithms on
   portions of a parallel allotment. How do you signal the others to
   terminate?
   Interesting algorithm problem: Is there an efficient rounding
   method for fractional / interior point algorithms?
   Jason Riedy (UCB)         Static Pivoting           17 Dec, 2010   57 / 59
Thank you!




Jason Riedy (UCB)    Static Pivoting   17 Dec, 2010   58 / 59
Bounds

Backward error
                    Di−1 ri   ∞   ≤ (c − ρ)−1 (3(nd + 1)εr + εx )
                                         ¯
Here nd is an expression of size, c is the upper bound on per-iteration
decrease, and ρ is a safety factor for the region around 1/εw .
              ¯

Forward error
                  Di−1 ei     ∞       2(4 + ρ(nd + 1))εw · (c − ρ)−1
                                            ¯                   ¯
Assuming εr ≤ ε2 , εx ≤ ε2 . Using only one precision, εr = εx = εw ,
               w         w

         (c − ρ) Di−1 ei
              ¯                   ∞     2(5 + 2(nd + 1) ccond(A, yi ))εd .


    Jason Riedy (UCB)                     Static Pivoting           17 Dec, 2010   59 / 59

More Related Content

Viewers also liked

Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)leifwalsh
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...Francisco Zamora-Martinez
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingFrancisco Zamora-Martinez
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)leifwalsh
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Francisco Zamora-Martinez
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structuresleifwalsh
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFrancisco Zamora-Martinez
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithmsrajatmay1992
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingFrancisco Zamora-Martinez
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisDhrumil Patel
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...leifwalsh
 

Viewers also liked (14)

Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)Write-optimization in external memory data structures (Highload++ 2014)
Write-optimization in external memory data structures (Highload++ 2014)
 
Some empirical evaluations of a temperature forecasting module based on Art...
Some empirical evaluations of a temperature forecasting module   based on Art...Some empirical evaluations of a temperature forecasting module   based on Art...
Some empirical evaluations of a temperature forecasting module based on Art...
 
Integration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs TrainingIntegration of Unsupervised and Supervised Criteria for DNNs Training
Integration of Unsupervised and Supervised Criteria for DNNs Training
 
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
Introducing TokuMX: The Performance Engine for MongoDB (NYC.rb 2013-12-10)
 
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
Mejora del reconocimiento de palabras manuscritas aisladas mediante un clasif...
 
Write optimization in external memory data structures
Write optimization in external memory data structuresWrite optimization in external memory data structures
Write optimization in external memory data structures
 
Fast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language ModelsFast evaluation of Connectionist Language Models
Fast evaluation of Connectionist Language Models
 
PhD defence
PhD defencePhD defence
PhD defence
 
@pospaseis
@pospaseis@pospaseis
@pospaseis
 
Efficient Sorts
Efficient SortsEfficient Sorts
Efficient Sorts
 
Ch24 efficient algorithms
Ch24 efficient algorithmsCh24 efficient algorithms
Ch24 efficient algorithms
 
A Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech TaggingA Connectionist approach to Part-Of-Speech Tagging
A Connectionist approach to Part-Of-Speech Tagging
 
Algorithms : Introduction and Analysis
Algorithms : Introduction and AnalysisAlgorithms : Introduction and Analysis
Algorithms : Introduction and Analysis
 
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
A New MongoDB Sharding Architecture for Higher Availability and Better Resour...
 

Similar to Making Static Pivoting Scalable and Dependable

Winning in vietnam hay group connie ma 2009
Winning in vietnam hay group connie ma 2009Winning in vietnam hay group connie ma 2009
Winning in vietnam hay group connie ma 2009Hoanh Tien Nguyen
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: ResultsMichele Chinosi
 
Middle School Success
Middle School SuccessMiddle School Success
Middle School SuccessE3 Alliance
 
Antonio Donini - Civ-mil and the future of Humanitarian Action
Antonio Donini - Civ-mil and the future of Humanitarian ActionAntonio Donini - Civ-mil and the future of Humanitarian Action
Antonio Donini - Civ-mil and the future of Humanitarian ActionAustralian Civil-Military Centre
 
China Semiconductor Industry 2009
China Semiconductor Industry 2009China Semiconductor Industry 2009
China Semiconductor Industry 2009Dmitry Tseitlin
 
EVS full year 2011 earnings presentation
EVS full year 2011 earnings presentationEVS full year 2011 earnings presentation
EVS full year 2011 earnings presentationgdoultremont
 
Power com for-worldwide-office2007-sample-english
Power com for-worldwide-office2007-sample-englishPower com for-worldwide-office2007-sample-english
Power com for-worldwide-office2007-sample-englishPowerCom ARS
 
Inkjet Printing For Advanced Functional Coatings
Inkjet Printing For Advanced Functional CoatingsInkjet Printing For Advanced Functional Coatings
Inkjet Printing For Advanced Functional CoatingsXennia Technology
 
Vendor Performance Management survey results nov 7th
Vendor Performance Management  survey results nov 7thVendor Performance Management  survey results nov 7th
Vendor Performance Management survey results nov 7thGerald Ford
 
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...Virtual ULI
 
Sample Deliverable Dashboard
Sample Deliverable   DashboardSample Deliverable   Dashboard
Sample Deliverable Dashboardagc infotech
 
Green star lbe overview 05.03.11
Green star lbe overview 05.03.11Green star lbe overview 05.03.11
Green star lbe overview 05.03.11Eric Friedman
 
Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality ICSM 2010
 
Wealth Teams Primer
Wealth Teams PrimerWealth Teams Primer
Wealth Teams Primerguybee
 
Middle School Success
Middle School SuccessMiddle School Success
Middle School SuccessChristin007
 
Dp and causal analysis guideline
Dp and causal analysis guidelineDp and causal analysis guideline
Dp and causal analysis guidelineM H Chandra
 
The Rise and Consequences of Inequality in the United States: charts
The Rise and Consequences of Inequality in the United States: chartsThe Rise and Consequences of Inequality in the United States: charts
The Rise and Consequences of Inequality in the United States: chartsObama White House
 

Similar to Making Static Pivoting Scalable and Dependable (20)

Winning in vietnam hay group connie ma 2009
Winning in vietnam hay group connie ma 2009Winning in vietnam hay group connie ma 2009
Winning in vietnam hay group connie ma 2009
 
BPMN Usage Survey: Results
BPMN Usage Survey: ResultsBPMN Usage Survey: Results
BPMN Usage Survey: Results
 
Middle School Success
Middle School SuccessMiddle School Success
Middle School Success
 
Antonio Donini - Civ-mil and the future of Humanitarian Action
Antonio Donini - Civ-mil and the future of Humanitarian ActionAntonio Donini - Civ-mil and the future of Humanitarian Action
Antonio Donini - Civ-mil and the future of Humanitarian Action
 
China Semiconductor Industry 2009
China Semiconductor Industry 2009China Semiconductor Industry 2009
China Semiconductor Industry 2009
 
EVS full year 2011 earnings presentation
EVS full year 2011 earnings presentationEVS full year 2011 earnings presentation
EVS full year 2011 earnings presentation
 
Power com for-worldwide-office2007-sample-english
Power com for-worldwide-office2007-sample-englishPower com for-worldwide-office2007-sample-english
Power com for-worldwide-office2007-sample-english
 
Inkjet Printing For Advanced Functional Coatings
Inkjet Printing For Advanced Functional CoatingsInkjet Printing For Advanced Functional Coatings
Inkjet Printing For Advanced Functional Coatings
 
Vendor Performance Management survey results nov 7th
Vendor Performance Management  survey results nov 7thVendor Performance Management  survey results nov 7th
Vendor Performance Management survey results nov 7th
 
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...
Retailers Sound Off on Retailing: Tremendous Opportunity or Daunting Challeng...
 
Sample Deliverable Dashboard
Sample Deliverable   DashboardSample Deliverable   Dashboard
Sample Deliverable Dashboard
 
Green star lbe overview 05.03.11
Green star lbe overview 05.03.11Green star lbe overview 05.03.11
Green star lbe overview 05.03.11
 
Manufacturing Economic Update
Manufacturing Economic UpdateManufacturing Economic Update
Manufacturing Economic Update
 
Metrics
MetricsMetrics
Metrics
 
Lee Cooper Russia
Lee Cooper RussiaLee Cooper Russia
Lee Cooper Russia
 
Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality	Studying the impact of dependency network measures on software quality
Studying the impact of dependency network measures on software quality
 
Wealth Teams Primer
Wealth Teams PrimerWealth Teams Primer
Wealth Teams Primer
 
Middle School Success
Middle School SuccessMiddle School Success
Middle School Success
 
Dp and causal analysis guideline
Dp and causal analysis guidelineDp and causal analysis guideline
Dp and causal analysis guideline
 
The Rise and Consequences of Inequality in the United States: charts
The Rise and Consequences of Inequality in the United States: chartsThe Rise and Consequences of Inequality in the United States: charts
The Rise and Consequences of Inequality in the United States: charts
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsJason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Updating PageRank for Streaming Graphs
Updating PageRank for Streaming GraphsUpdating PageRank for Streaming Graphs
Updating PageRank for Streaming Graphs
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 

Making Static Pivoting Scalable and Dependable

  • 1. Making Static Pivoting Scalable and Dependable Ph.D. Dissertation Talk E. Jason Riedy jason@acm.org EECS Department University of California, Berkeley Committee: Dr. James Demmel (chair), Dr. Katherine Yelick, Dr. Sanjay Govindjee 17 December, 2010
  • 2. Outline 1 Introduction 2 Solving Ax = b dependably 3 Extending dependability to static pivoting 4 Distributed matching for static pivoting 5 Summary Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 2 / 59
  • 3. Motivation: Ever Larger Ax = b Systems Ax = b are growing larger, more difficult Omega3P: n = 7.5 million with τ = 300 million entries Quantum Mechanics: precondition with blocks of dimension 200-350 thousand Large barrier-based optimization problems: Many solves, similar structure, increasing condition number Huge systems are generated, solved, and analyzed automatically. Large, highly unsymmetric systems need scalable parallel solvers. Low-level routines: No expert in the loop! Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 3 / 59
  • 4. Motivation: Solving Ax = b better Many people work to solve Ax = b faster. Today we start with how to solve it better. Better enables faster. Use extra floating-point precision within iterative refinement to obtain a dependable solution, adding O(n2 ) work after an O(n3 ) factorization. Accelerate sparse factorization through static pivoting, decoupling symbolic, numeric phases. Refine the perturbed solution without needing extra triangular solves for condition estimation. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 4 / 59
  • 5. Contributions Iterative refinement Extend iterative refinement to provide small forward errors dependably (to be defined) Set and use a methodology to demonstrate dependability Show that condition estimation (expensive for sparse systems) is not necessary for obtaining a dependable solution Static pivoting Improve static pivoting heuristics Demonstrate that an approximate maximum weight bipartite matching is faster and just as accurate Develop a memory-scalable distributed memory auction algorithm for static pivoting Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 5 / 59
  • 6. Defining “dependable” A dependable solver for Ax = b returns a result x with small error often enough that you expect success with a small error, and clearly signals results that likely contain large errors. True error Difficulty Alg. reports w/likeliness O(mach. precision) not bad success Very likely failure Somewhat rare larger not bad success (not yet seen) failure Practically certain O(mach. precision) difficult success Whenever feasible failure Practically certain larger difficult success (not yet seen) failure Very likely Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 6 / 59
  • 7. Introducing the errors and targets y1 A b −1 (A, b) (A, b) A −1b x LU: Small backward error LU: Error in y ∝ difficulty 2−25 20 2−30 Percent Percent 0.5% 1% 1.0% 2−10 Error 1.5% Error 2% −35 2.0% 2 3% 4% 2.5% 3.0% 3.5% −40 2−20 2 2−45 2−30 25 210 215 220 225 20 25 210 215 220 225 230 Difficulty Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59
  • 8. Introducing the errors and targets y1 A b −1 yk (A, b) (A, b) yk A −1b x Refined: Accepted with small errors in y , or flagged with unknown error. Successful Flagged 20 2−10 2−20 % of systems 0.2% 2−30 0.4% Error 0.6% 0.8% 1.0% 2−40 1.2% 1.4% 2−50 2−60 210 220 230 240 210 220 230 240 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 7 / 59
  • 9. Iterative refinement Newton’s method applied to Ax = b. Repeat until done: 1 Compute the residual ri = b − Ayi using extra precision εr . 2 Solve Ady i = ri for the correction using working precision εw . 3 Increment yi+1 = yi + dy i , maintaining y to extra precision εx . Precisions: Working precision εw The precision used for storing (and factoring) A: IEEE754 single (εw = 2−24 ), double (εw = 2−53 ), etc. Residual precision εr At least double working precision, εr ≤ ε2 w Solution precision εx At least double working precision, εx ≤ ε2 w Latter two may be implemented in software. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 8 / 59
  • 10. Definitions Errors: Backward (relative) error Forward (relative) error Difficulty: Condition numbers: sensitivity to perturbations Element growth: error from factorization Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 9 / 59
  • 11. Error measures: Backward error How close is the nearest system satisfying Ay1 = b? y1 A b −1 (A, b) (A, b) A −1b x Three ways, given r1 = b − Ay1 : r1 ∞ |r1 | Normwise A y1 ∞ + b ∞ ∞ Componentwise |A| |y1 |+|b| ∞ r1 ∞ Columnwise Note: Elementwise division, 0/0 = 0, (max |A|) |y1 |+ b ∞ and max produces a row vector Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
  • 12. Error measures: Forward error How close is y1 to x? y1 A b −1 (A, b) (A, b) A −1b Two ways and two measuring sticks: x y1 −x ∞ y1 −x ∞ Normwise x ∞ y1 ∞ y1 −x y1 −x Componentwise x ∞ y1 ∞ Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
  • 13. Error sensitivity: Conditioning How sensitive is y1 to perturbations in A and b? y1 A b −1 (A, b) (A, b) A −1b x forward error ≤ condition number × backward error Each combination has a condition number. We choose two for use in our difficulty measure. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 10 / 59
  • 14. Difficulty: condition number × element growth Condition number: Backward error κ(A−1 ) = κ(A) = A−1 ∞ A ∞ Normwise forw. err. κ(A, x, b) = A−1 ∞ ( A ∞ x ∞ + b ∞ ) Componentwise forw. err. ccond(A, x, b) = |A−1 | (|A| |x| + |b|) ∞ Element growth, est. δAi in (A + δAi )y = b: |δAi | ≤ 3nd |L| |U| ≤ p(nd )g 1r max |A| We use a col.-scaling-indep. expression allowing |L| > 1, (max1≤k≤j maxi |L|(i,k))·(maxi |U|(i,j)) gc = maxj maxi |A|(i,j) Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 11 / 59
  • 15. Dense test systems 30 × 30 single, double, complex, and double complex: 250k, 4 right-hand sides, 1M test systems Size chosen to sample ill-conditioned region well Generated as in Demmel, et al., plus b → x κ∞ (A) = A−1 ∞ A ∞ Single Double 15% 10% 5% Percent of population 0% Complex Double Complex 15% 10% 5% 0% 20 210 220 230 240 250 260 270 20 210 220 230 240 250 260 270 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
  • 16. Dense test systems 30 × 30 single, double, complex, and double complex: 250k, 4 right-hand sides, 1M test systems Size chosen to sample ill-conditioned region well Generated as in Demmel, et al., plus b → x κ(A, x, b) = A−1 ∞ ( A ∞ x ∞ + b ∞) Single Double 14% 12% 10% 8% 6% 4% Percent of population 2% 0% Complex Double Complex 14% 12% 10% 8% 6% 4% 2% 0% 20 210 220 230 240 250 260 270 20 210 220 230 240 250 260 270 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
  • 17. Dense test systems 30 × 30 single, double, complex, and double complex: 250k, 4 right-hand sides, 1M test systems Size chosen to sample ill-conditioned region well Generated as in Demmel, et al., plus b → x ccond(A, x, b) = |A−1 | (|A| |x| + |b|) ∞ Single Double 12% 10% 8% 6% 4% Percent of population 2% 0% Complex Double Complex 12% 10% 8% 6% 4% 2% 0% 20 220 240 260 280 20 220 240 260 280 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 12 / 59
  • 18. Results: Dependable errors nberr colberr cberr nferr nferrx cferr cferrx 20 2−10 Converged 2−20 2−30 2−40 2−50 2−60 20 2−10 No Progress 2−20 2−30 2−40 2−50 % of systems 2−60 10−5 Error 20 10−4 2−10 10−3 2−20 10−2 Unstable 2−30 2−40 2−50 2−60 20 2−10 Iteration Limit −20 2 2−30 2−40 2−50 2−60 20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240 20 210220230240 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 13 / 59
  • 19. How? cberr cferr 20 2−10 2−20 % of systems 0.00% 2−30 Error 0.01% 2−40 0.10% 2−50 1.00% 2−60 25 210 215 220 225 230 235 240 25 210 215 220 225 230 235 240 Difficulty Carry the intermediate soln. yi to twice the working precision. Refine the backward error down to nearly ε2 .w By “forward error ≤ conditioning × backward error”, the forward error for well-enough conditioned problems is nearly εw . Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59
  • 20. How? cberr cferr 20 2−10 2−20 % of systems 0.00% 2−30 Error 0.01% 2−40 0.10% 2−50 1.00% 2−60 25 210 215 220 225 230 235 240 25 210 215 220 225 230 235 240 Difficulty Carry the intermediate soln. yi to twice the working precision. Refine the backward error down to nearly ε2 .w By “forward error ≤ conditioning × backward error”, the forward error for well-enough conditioned problems is nearly εw . Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 14 / 59
  • 21. Results: Comparison with xGESVXX Precision Accepted Rejected well ill well ill Single 79% 15% 1% 5% Single complex 76% 19% 1% 4% Double 87% 9% 1% 5% Double complex 85% 11% 1% 3% Accepted, ill-conditioned systems are those gained by our routine that xGESVXX rejects. Rejected, well-conditioned systems are those lost by our routine but accepted by xGESVXX. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 15 / 59
  • 22. Results: Iteration counts, single precision nberr colberr cberr ndx cdx 30 25 Converged 20 15 10 5 30 25 No Progress 20 15 % of systems 10 1% # Iterations 5 2% 30 3% 25 4% 5% Unstable 20 15 6% 10 5 30 Iteration Limit 25 20 15 10 5 20 210 220 230 240 20 210 220 230 240 20 210 220 230 240 20 210 220 230 240 20 210 220 230 240 Difficulty Set limit at five. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 16 / 59
  • 23. Results: Iteration counts, single complex precision nberr colberr cberr ndx cdx 30 25 Converged 20 15 10 5 30 25 No Progress 20 15 10 % of systems # Iterations 5 2% 30 4% 25 6% 8% Unstable 20 15 10 5 30 Iteration Limit 25 20 15 10 5 5 5 5 5 5 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 20 2 210 15 20 25 30 35 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 Difficulty Set limit at seven. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 17 / 59
  • 24. Results: Iteration counts, double precision nberr colberr cberr ndx cdx 30 25 Converged 20 15 10 5 30 25 No Progress 20 15 % of systems 10 0.5% # Iterations 5 1.0% 30 1.5% 25 2.0% 2.5% Unstable 20 15 3.0% 10 5 30 Iteration Limit 25 20 15 10 5 20 220 240 260 20 220 240 260 20 220 240 260 20 220 240 260 20 220 240 260 Difficulty Set limit at ten. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 18 / 59
  • 25. Results: Iteration counts, double complex precision nberr colberr cberr ndx cdx 30 25 Converged 20 15 10 5 30 25 No Progress 20 15 % of systems 10 0.5% # Iterations 5 1.0% 1.5% 30 2.0% 25 2.5% Unstable 20 3.0% 15 3.5% 10 5 30 Iteration Limit 25 20 15 10 5 20 210220230240250260 20 210220230240250260 20 210220230240250260 20 210220230240250260 20 210220230240250260 Difficulty Set limit at 15. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 19 / 59
  • 26. Static pivoting If a pivot |A(j, j)| < T , perturb up to T by adding sign(A(j, j)) · (T − |A(j, j)|). Forcibly increases backward error, decreases element growth In sparse systems, few updates should occur to an entry. Large diagonal entries should remain large... Thresholding heuristics SuperLU γ · A 1 column-relative γ · max |A(:, j)| diagonal-relative γ · |A(j, j)| √ γ = 2−26 ≈ εw , 2−38 , or 2−43 = 210 εw Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 20 / 59
  • 27. Sparse test systems Matrices are from the UF Collection, chosen from existing comparisons between SuperLU, MUMPS, and UMFPACK. Wide range of conditioning and numerical scaling Compute “True” solutions using a doubled-double-extended factorization and quad-double-extended refinement with a modified TAUCS. Refinement uses LAPACK-style numerical scaling throughout, but the test systems are generated in the matrix’s given scaling. Also tested on singular systems; no solutions accepted. At some point, plan on feeding the “true” solutions into the UF Collection... Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 21 / 59
  • 28. Sparse normwise conditioning 8% Percent of population 6% 4% 2% 0% 210 220 230 240 250 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 22 / 59
  • 29. Sparse componentwise conditioning 8% 6% Percent of population 4% 2% 0% 220 230 240 250 260 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 23 / 59
  • 30. Results: SuperLU perturbation heuristic Before refinement, by max. perturbation amount nberr colberr cberr nferr nferrx cferr cferrx 20 −10 2 2−20 2^10 * eps 2−30 2−40 2−50 2−60 Error / sqrt(max row deg.) 20 2−10 2^−12 * sqrt(eps) % of systems 2−20 0.1% 2−30 0.3% 2−40 1.0% 2−50 3.2% 2−60 20 2−10 2−20 sqrt(eps) 2−30 2−40 2−50 2−60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 222222 222222 222222 222222 222222 222222 222222 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 24 / 59
  • 31. Results: Column-relative perturbation heuristic Before refinement, by max. perturbation amount nberr colberr cberr nferr nferrx cferr cferrx 20 −10 2 2−20 2^10 * eps 2−30 2−40 2−50 2−60 Error / sqrt(max row deg.) 20 2−10 2^−12 * sqrt(eps) % of systems 2−20 0.1% 2−30 0.3% 2−40 1.0% 2−50 3.2% 2−60 20 2−10 2−20 sqrt(eps) 2−30 2−40 2−50 2−60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 222222 222222 222222 222222 222222 222222 222222 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 25 / 59
  • 32. Results: Diagonal-relative perturbation heuristic Before refinement, by max. perturbation amount nberr colberr cberr nferr nferrx cferr cferrx 20 −10 2 2−20 2^10 * eps 2−30 2−40 2−50 2−60 Error / sqrt(max row deg.) 20 2−10 2^−12 * sqrt(eps) % of systems 2−20 0.1% 2−30 0.3% 2−40 1.0% 2−50 3.2% 2−60 20 2−10 2−20 sqrt(eps) 2−30 2−40 2−50 2−60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 20 10 20 30 40 50 60 222222 222222 222222 222222 222222 222222 222222 Difficulty Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 26 / 59
  • 33. Results: SuperLU perturbation heuristic After refinement, with γ = 2−43 = 210 εw Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 27 / 59
  • 34. Results: Column-relative perturbation heuristic After refinement, with γ = 2−43 = 210 εw Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 28 / 59
  • 35. Results: Diagonal-relative perturbation heuristic After refinement, with γ = 2−43 = 210 εw Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 29 / 59
  • 36. results Level and heuristic Result Trust both Trust nwise Reject 2−43 = 210 · εf SuperLU 42.9% 8.0% 49.0% Column-relative 55.7% 5.7% 38.6% Diagonal-relative 55.8% 5.9% 38.3% −38 √ 2 =≈ 2−12 · εf SuperLU 36.6% 6.7% 56.6% Column-relative 52.4% 6.5% 41.2% Diagonal-relative 53.7% 7.2% 39.1% √ 2−26 ≈ εf SuperLU 32.4% 4.0% 63.6% Column-relative 42.2% 4.2% 53.6% Diagonal-relative 47.4% 4.7% 47.9% Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 30 / 59
  • 37. Sparse Matrix to Bipartite Graph to Pivots Col 1Col 2Col 3Col 4 Col 1Col 2Col 3Col 4 Row 1 Row 1 Col 1 Row 2 Row 2 Row 2 Col 2 Row 3 Row 3 Row 3 Col 3 Row 1 Row 4 Row 4 Col 4 Row 4 Bipartite model Each row and column is a vertex. Each explicit entry is an edge. Want to chose “largest” entries for pivots. Maximum weight complete bipartite matching: linear assignment problem Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 31 / 59
  • 38. Mathematical Form “Just” a linear optimization problem: B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A| X n × n permutation matrix: the matching pr , πc dual variables, will be price and profit 1r , 1c unit entry vectors corresponding to rows, cols Lin. assignment prob. Dual problem maximize Tr B T X minimize 1T pr + 1T πc r c X∈ n×n pr ,πc subject to X 1c = 1r , subject to pr 1T + 1r πc ≥ B. c T X T 1r = 1c , and X ≥ 0. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59
  • 39. Mathematical Form “Just” a linear optimization problem: B n × n matrix of benefits in ∪ {−∞}, often c + log2 |A| X n × n permutation matrix: the matching pr , πc dual variables, will be price and profit 1r , 1c unit entry vectors corresponding to rows, cols Lin. assignment prob. Dual problem Implicit form: T maximize Tr B X X∈ n×n minimize 1T pr r pr subject to X 1c = 1r , + max(B(i, j) X T 1r = 1c , and i∈R j∈C X ≥ 0. − pr (j)). Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 32 / 59
  • 40. Do We Need a Special Method? The LAP: Standard form: maximize Tr B T X min cT x X∈ n×n x subject to X 1c = 1r , subject to Ax = 1r +c , and T x ≥ 0. X 1r = 1c , and X ≥ 0. A: 2n × τ vertex-edge matrix Network optimization kills simplex methods. (“Smoothed analysis” does not apply.) Interior point algs need to round the solution. (And need to solve Ax = b for a much larger A, although theoretically great in NC.) Combinatorial methods should be faster. (But unpredictable!) Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 33 / 59
  • 41. Properties from Optimization Complementary slackness X c T (pr 1T + 1r πc − B) = 0. If (i, j) is in the matching (X (i, j) = 0), then pr (i) + πc (j) = B(i, j). Used to chose matching edges and modify dual variables in combinatorial algorithms. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 34 / 59
  • 42. Properties from Optimization Relaxed problem Introduce a parameter µ, two interpretations: from a barrier function related to X ≥ 0, or from the auction algorithm (later). Then Tr B T X∗ ≤ 1T pr + 1T πc ≤ Tr B T X∗ + (n − 1)µ, r c or the computed dual value (and hence computed primal matching) is within (n − 1)µ of the optimal primal. Very useful for finding approximately optimal matchings. Feasibility bound Starting from zero prices: pr (i) ≤ (n − 1)(µ + finite range of B) Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 35 / 59
  • 43. Algorithms for Solving the LAP Goal: A parallel algorithm that justifies buying big machines. Acceptable: A distributed algorithm; matrix is on many nodes. Choices: Simplex or continuous / interior-point Plain simplex blows up, network simplex difficult to parallelize. Rounding for interior point often falls back on matching. (Optimal IP algorithm: Goldberg, Plotkin, Shmoys, Tardos. Needs factorization.) Augmenting-path based (Mc64: Duff and Koster) Based on depth- or breadth-first search. Both are P-complete, inherently sequential (Greenlaw, Reif). Auctions (Bertsekas, et al.) Only length-1 or -2 alternating paths; global sync for duals. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 36 / 59
  • 44. Auction Algorithms Discussion will be column-major. General structure: 1 Each unmatched column finds the “best” row, places a bid. The dual variable pr holds the prices. The profit πc is implicit. (No significant FP errors!) Each entry’s value: benefit B(i, j)− price p(i). A bid maximally increases the price of the most valuable row. 2 Bids are reconciled. Highest proposed price wins, forms a match. Loser needs to re-bid. Some versions need tie-breaking; here least column. 3 Repeat. Eventually everyone will be matched, or some price will be too high. Seq. implementation in ∼40–50 lines, can compete with Mc64 Some corner cases to handle. . . Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 37 / 59
  • 45. The Bid-Finding Loop For each unmatched column: Price Row Index Row Entry value = entry − price Save largest and second−largest Bid price incr: diff. in values Differences from sparse matrix-vector products Not all columns, rows used every iteration. (sparse matrix, sparse vector) Hence output price updates are scattered. More local work per entry Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59
  • 46. The Bid-Finding Loop For each unmatched column: Price Row Index Row Entry value = entry − price Save largest and second−largest Bid price incr: diff. in values Little points Increase bid price by µ to avoid loops Needs care in floating-point for small µ. Single adjacent row → ∞ price Affects feasibility test, computing dual Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 38 / 59
  • 47. Termination Once a row is matched, it stays matched. A new bid may swap it to another column. The matching (primal) increases monotonically. Prices only increase. The dual does not change when a row is newly matched. But the dual may decrease when a row is taken. The dual decreases monotonically. Subtle part: If the dual doesn’t decrease. . . It’s ok. Can show the new edge begins an augmenting path that increases the matching or an alternating path that decreases the dual. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 39 / 59
  • 48. Successive Approximation (µ-scaling) Simple auctions aren’t really competitive with Mc64. Start with a rough approximation (large µ) and refine. Called -scaling in the literature, but µ-scaling is better. Preserve the prices pr at each step, but clear the matching. Note: Do not clear matches associated with ∞ prices! Equivalent to finding diagonal scaling Dr ADc and matching again on the new B. Problem: Performance strongly depends on initial scaling. Also depends strongly on hidden parameters. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 40 / 59
  • 49. Sequential performance: Auction v. MC64 MC64 Group Name Auction (s) MC64 (s) Auction Bai af23560 0.025 0.017 0.68 FEMLAB poisson3Db 0.014 0.040 2.74 FIDAP ex11 0.060 0.015 0.26 GHS indef cont-300 0.007 0.019 2.89 GHS indef ncvxqp5 0.338 0.794 2.35 Hamm scircuit 0.048 0.024 0.50 Hollinger g7jac200 0.355 0.817 2.30 Mallya lhr14 0.044 0.026 0.60 Schenk IBMSDS 3D 51448 3D 0.031 0.010 0.33 Schenk IBMSDS matrix 9 0.074 0.024 0.33 Schenk ISEI barrier2-4 0.291 0.044 0.15 Vavasis av41092 5.462 3.595 0.66 Zhao Zhao2 1.041 3.237 3.11 Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 41 / 59
  • 50. Sequential performance: Highly variable Row Group Name By col (s) By row (s) Col Bai af23560 0.025 0.028 1.13 FEMLAB poisson3Db 0.014 0.016 1.11 FIDAP ex11 0.060 0.060 1.00 GHS indef cont-300 0.007 0.006 0.84 GHS indef ncvxqp5 0.338 0.318 0.94 Hamm scircuit 0.048 0.047 0.99 Hollinger g7jac200 0.355 0.339 0.95 Mallya lhr14 0.044 0.065 1.47 Schenk IBMSDS 3D 51448 3D 0.031 0.282 9.22 Schenk IBMSDS matrix 9 0.074 0.613 8.29 Schenk ISEI barrier2-4 0.291 0.193 0.66 Vavasis av41092 5.462 4.083 0.75 Zhao Zhao2 1.041 0.609 0.58 Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 42 / 59
  • 51. Sequential performance: Highly variable Int Group Name Float (s) Int (s) Float Bai af23560 0.025 0.040 1.61 FEMLAB poisson3Db 0.015 0.016 1.08 FIDAP ex11 0.060 0.029 0.49 GHS indef cont-300 0.007 0.006 0.91 GHS indef ncvxqp5 0.338 0.425 1.26 Hamm scircuit 0.048 0.016 0.34 Hollinger g7jac200 0.355 1.004 2.83 Mallya lhr14 0.044 0.050 1.12 Schenk IBMSDS 3D 51448 3D 0.031 0.020 0.66 Schenk IBMSDS matrix 9 0.074 0.066 0.89 Schenk ISEI barrier2-4 0.291 0.261 0.91 Vavasis av41092 5.462 5.401 0.99 Zhao Zhao2 1.041 2.269 2.18 Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 43 / 59
  • 52. Approximately maximum matchings Terminal µ value Name 0 5.96e-08 2.44e-04 5.00e-01 af23560 Primal 1342850 1342850 1342850 1342670 Time(s) 0.14 0.05 0.03 0 ratio 0.37 0.21 0.02 poisson3Db Primal 2483070 2483070 2483070 2483070 Time(s) 0.02 0.02 0.02 0.02 ratio 1.01 1.04 1.07 g7jac200 Primal 3533980 3533980 3533980 3533340 Time(s) 2.98 1.07 0.28 0.18 ratio 0.36 0.09 0.06 av41092 Primal 3156210 3156210 3156210 3155920 Time(s) 24.51 8.09 2.48 0.11 ratio 0.33 0.10 0.00 Zhao2 Primal 333891 333891 333891 333487 Time(s) 7.69 2.37 3.65 0.02 ratio 0.31 0.47 0.00 Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 44 / 59
  • 53. Setting / Lowering Parallel Expectations Performance scalability? Originally proposed (early 1990s) when cpu speed ≈ memory speed ≈ network speed ≈ slow. Now: cpu speed memory latency > network latency. The number of communication phases dominates matching algorithms (auction and others). Communication patterns are very irregular. Latency and software overhead is not improving. . . Scaled back goal It suffices to not slow down much on distributed data. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 45 / 59
  • 54. Basic Idea: Run Local Auctions, Treat as Bids 1 0 1 0 1 1 0 0 1 1 0 0 1111111 0000000 1 1 0 0 1111111 0000000 1 1 0 0 111 000 111 000 1 0 1 0 1 0 1 0 111 000 111 000 1 0 1 0 1 0 1 0 11111 00000 1111 0000 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 11 00 111 000 11 00 1 0 1 0111 000 ⇒ 0000 1111 11 00 111 000 11 00 111 000 1 1 0 0 1 0 1 0 1111 0000 1111 0000 1 1 0 0 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 1 0 11 00 111 000 1 0 11 00 1 0 1 0111 000 1111 0000 1 1 0 0 1 1 0 0 B 1 1 0 0 1111 0000 1111 0000 1 0 1 0 11 00 111 000 11 00 111 000 11 00 1 0 1 0111 000 11 00 111 000 1111 0000 1111 0000 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 1 0 1 0 11 00 111 000 1 0 1 0 11 00 1 0 1 0111 000 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 P1 0 1 P2 0 1 P3 Slice the matrix into pieces, run local auctions. The winning local bids are the slices’ bids. Merge. . . (“And then a miracle occurs. . .”) Need to keep some data in sync for termination. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 46 / 59
  • 55. Basic Idea: Run Local Auctions, Treat as Bids 1 0 1 0 1 1 0 0 1 1 0 0 1111111 0000000 1 1 0 0 1111111 0000000 1 1 0 0 111 000 111 000 1 0 1 0 1 0 1 0 111 000 111 000 1 0 1 0 1 0 1 0 11111 00000 1111 0000 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 11 00 111 0001 0 1 0 11 00 111 000 ⇒ 0000 1111 11 00 111 000 11 00 111 000 1 1 0 0 1 0 1 0 1111 0000 1111 0000 1 1 0 0 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 1 0 11 00 111 000 1 0 1 0 1 0 11 00 111 000 1111 0000 1 1 0 0 1 1 0 0 B 1 1 0 0 1111 0000 1111 0000 1 0 1 0 11 00 111 000 11 00 111 000 1 0 1 0 11 00 111 000 11 00 111 000 1111 0000 1111 0000 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1111 0000 1 0 1 0 1 0 1 0 11 00 111 000 1 0 1 0 1 0 1 0 11 00 111 000 1 1 0 0 1 0 1 0 1 1 0 0 1 0 1 0 P1 0 1 P2 0 1 P3 Practically memory scalable: Compact the local pieces. Have not experimented with simple SMP version. Sequential performance is limited by the memory system. Note: Could be useful for multicore w/local memory. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 46 / 59
  • 56. Speed-up? 104 103 102 101 Speed−up 100 10−1 10−2 10−3 5 10 15 20 Number of processors Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 47 / 59
  • 57. Speed-up: A bit better measuring appropriately 104 103 Speed−up relative to reducing to the root node 102 101 100 10−1 10−2 10−3 5 10 15 20 Number of processors Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 48 / 59
  • 58. Comparing distributed with reduce-to-root 104 103 102 101 Speed−up To root Dist. 100 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10−1 10−2 10−3 2 3 4 8 12 16 24 Number of processors Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 49 / 59
  • 59. Iteration order still matters av41092 shyy161 G 102 G 1 10 Time (s) Direction G G G G G Row−major G G G Col−major 100 G G G −1 10 G G G G G 5 10 15 20 5 10 15 20 Number of Processors Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 50 / 59
  • 60. Many different speed-up profiles af23560 bmwcra_1 101 G 100 G G G 10−1 G G G G G G G G G G G 10−2 10−3 10−4 Time (s) garon2 stomach 101 G G G 100 G 10−1 G G G G G G G −2 G 10 G G G G 10−3 10−4 5 10 15 20 5 10 15 20 Number of Processors Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 51 / 59
  • 61. So what happens in some cases? Matrix av41092 has one large strongly connected component. (The square blocks in a Dulmage-Mendelsohn decomposition.) The SCC spans all the processors. Every edge in an SCC is a part of some complete matching. Horrible performance from: starting along a non-max-weight matching, making it almost complete, then an edge-by-edge search for nearby matchings, requiring a communication phase almost per edge. Conjecture: This type of performance land-mine will affect any 0-1 combinatorial algorithm. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 52 / 59
  • 62. Improvements? Approximate matchings: Speeds up the sequential case, eliminating any “speed-up.” Rearranging deck chairs: few-to-few communication Build a directory of which nodes share rows: collapsed BB T . Send only to/from those neighbors. Minor improvement over MPI Allgatherv for a huge effort. Latency not a major factor... Improving communication may not be worth it. . . The real problem is the number of comm. phases. If diagonal is the matching, everything is overhead. Or if there’s a large SCC. . . Another alternative: Multiple algorithms at once. Run Bora U¸ar’s alg. on one set of nodes, auction on another, c transposed auction on another, . . . Requires some painful software engineering. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 53 / 59
  • 63. Latency not a dominating factor 103 Speed−up relative to reducing to the root node 102 101 100 10−1 1x3 3x1 1x8 2x4 Number of nodes x number of procs. per node Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 54 / 59
  • 64. So, Could This Ever Be Parallel? For a given matrix-processor layout, constructing a matrix requiring O(n) communication is pretty easy for combinatorial algorithms. Force almost every local action to be undone at every step. Non-fractional combinatorial algorithms are too restricted. Using less-restricted optimization methods is promising, but far slower sequentially. Existing algs (Goldberg, et al.) are PRAM with n3 processors. General purpose methods: Cutting planes, successive SDPs Someone clever might find a parallel rounding algorithm. Solving the fractional LAP quickly would become a matter of finding a magic preconditioner. . . Maybe not a good thing for a direct method? Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 55 / 59
  • 65. Review of contributions Iterative refinement Successfully deliver dependable solutions with a little extra precision. Removed need for condition estimation. Built methodology for evaluating Ax = b solution methods’ accuracy and dependability. Static pivoting Tuned static pivoting heuristics to provide dependability. Demonstrated that an approximate maximum weight bipartite matching is faster and just as dependable. Developed a memory-scalable (although not performance-scalable) distributed memory auction algorithm for static pivoting. Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 56 / 59
  • 66. Future directions Iterative refinement Least-squares refinement demonstrated (Demmel, Hida, Li, & Riedy), but needs... refinement. Perhaps refinement could render an iterative method dependable. Could improve accuracy of Ady i = ri with extra iterations as i increases. Could help build trust in new methods (e.g. CALU). Distributed matching Interesting software problem: Run multiple algorithms on portions of a parallel allotment. How do you signal the others to terminate? Interesting algorithm problem: Is there an efficient rounding method for fractional / interior point algorithms? Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 57 / 59
  • 67. Thank you! Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 58 / 59
  • 68. Bounds Backward error Di−1 ri ∞ ≤ (c − ρ)−1 (3(nd + 1)εr + εx ) ¯ Here nd is an expression of size, c is the upper bound on per-iteration decrease, and ρ is a safety factor for the region around 1/εw . ¯ Forward error Di−1 ei ∞ 2(4 + ρ(nd + 1))εw · (c − ρ)−1 ¯ ¯ Assuming εr ≤ ε2 , εx ≤ ε2 . Using only one precision, εr = εx = εw , w w (c − ρ) Di−1 ei ¯ ∞ 2(5 + 2(nd + 1) ccond(A, yi ))εd . Jason Riedy (UCB) Static Pivoting 17 Dec, 2010 59 / 59