SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Rank aggregation via nuclear norm minimization




                                                                                                     David F. Gleich
                                                                                        Sandia National Laboratories

                                                                                                              Lek-Heng Lim
                                                                                                        University of Chicago

                                                                                          Householder Symposium 18
                                                                                              Tahoe City, (Barely) CA


Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin
                       Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
David F. Gleich (Sandia)                                       Householder 18                                                              1/15
Which is a better list of good DVDs?
Lord of the Rings 3: The Return of …       Lord of the Rings 3: The Return of …
Lord of the Rings 1: The Fellowship        Lord of the Rings 1: The Fellowship
Lord of the Rings 2: The Two Towers        Lord of the Rings 2: The Two Towers
Lost: Season 1                             Star Wars V: Empire Strikes Back
Battlestar Galactica: Season 1             Raiders of the Lost Ark
Fullmetal Alchemist                        Star Wars IV: A New Hope
Trailer Park Boys: Season 4                Shawshank Redemption
Trailer Park Boys: Season 3                Star Wars VI: Return of the Jedi
Tenchi Muyo!                               Lord of the Rings 3: Bonus DVD
Shawshank Redemption                       The Godfather


                   Standard
                                                        Nuclear Norm
               rank aggregation
                                                    based rank aggregation
               (the mean rating)

David F. Gleich (Sandia)           Householder 18                             2/15
Ranking is really hard
                            John Kemeny               Dwork, Kumar, Naor,
      Ken Arrow                                                 Sivikumar




All rank aggregations
involve some measure       A good ranking is the
of compromise              “average” ranking under   NP hard to compute
                           a permutation distance    Kemeny’s ranking


David F. Gleich (Sandia)         Householder 18                        3/15
Embody chair
                                            John Cantrell (flickr)


     Given a hard problem,
     what do you do?

     Numerically relax!

     It’ll probably be easier.



David F. Gleich (Sandia)   Householder 18                            4 of 15
Suppose we had scores
Let   be the score of the ith movie/song/paper/team to rank

Suppose we can compare the ith to jth:

                                 

Then                       is skew-symmetric, rank 2.

Also works for                  with an extra log.

               Numerical ranking is intimately intertwined
               with skew-symmetric matrices
                                         Kemeny and Snell, Mathematical Models in Social Sciences (1978)
David F. Gleich (Sandia)            Householder 18                                                  5/15
Using ratings instead of comparisons




Ratings induce various skew-symmetric matrices.

                                                                  Arithmetic Mean
                            

                                                                  Log-odds
                                
                                                    David 1988 – The Method of Paired Comparisons
David F. Gleich (Sandia)           Householder 18                                            6/15
Extracting the scores
Given   with all entries, then                               107

            is the Borda




                                               Movie Pairs
                                                             105
  count, the least-squares
  solution to  

How many                   do we have?                    101
  Most.
                                                                     101           105
                                                                   Number of Comparisons
Do we trust all              ?
  Not really.                                                      Netflix data 17k movies,
                                                                   500k users, 100M ratings–
                                                                   99.17% filled



David F. Gleich (Sandia)             Householder 18                                            7/15
Only partial info? Complete it!
Let              be known for                     We trust these scores.

Goal Find the simplest skew-symmetric matrix that matches
  the data  




                            
               NP hard




                                                                                  Heuristic
                            
                Convex




                               best convex underestimator of rank on unit ball.
David F. Gleich (Sandia)                 Householder 18                              8/15
Solving the nuclear norm problem
Use a LASSO formulation                1.  
                                       2. REPEAT
 
                                       3.           = rank-k SVD of
                                                

                                       4.
                                       5.  
                                                

                                       6. UNTIL  
Jain et al. propose SVP for
   this problem without
    



                                                          Jain et al. NIPS 2010
David F. Gleich (Sandia)      Householder 18                              9/15
Skew-symmetric SVDs
Let         be an                skew-symmetric matrix with
  eigenvalues                                      ,
  where                          and       . Then the SVD of   is
  given by




                            
for   and   given in the proof.
Proof Use the Murnaghan-Wintner form and the SVD of a
   2x2 skew-symmetric block

                 This means that SVP will give us the skew-
                 symmetric constraint “for free”
David F. Gleich (Sandia)           Householder 18              10/15
Exact recovery results
David Gross showed how to recover Hermitian matrices.
  i.e. the conditions under which we get the exact  

Note that                  is Hermitian. Thus our new result!




                                       
                                                                Gross arXiv 2010.
David F. Gleich (Sandia)                  Householder 18                    11/15
Recovery Discussion and Experiments
If                  , then just look at differences from a
       connected set. Constants? Not very good.
                               Intuition for the truth.
                                                  




David F. Gleich (Sandia)        Householder 18               12/15
The Ranking Algorithm
0. INPUT   (ratings data) and c
   (for trust on comparisons)
1. Compute   from  
2. Discard entries with fewer than
   c comparisons
3. Set      to be indices and
   values of what’s left
4.                         = SVP(    )
5. OUTPUT  




David F. Gleich (Sandia)                 Householder 18   13/15
Synthetic Results
Construct an Item Response Theory model. Vary number of
  ratings per user and a noise/error level
                           Our Algorithm!                    The Average Rating




David F. Gleich (Sandia)                    Householder 18                        14/15
Conclusions and Future Work

                                    1. Compare against others
Rank aggregation with                  van Dooren (fitting)
the nuclear norm is:                   Massey (direct least
  principled                           squares for   )
  easy to compute
The results are much better         2. Noisy recovery! More
than simple approaches.                realistic sampling.

                                    3. Skew-symmetric Lanczos
                                       based SVD?



David F. Gleich (Sandia)   Householder 18                     15/15
Google nuclear ranking gleich


          To appear KDD2011

Más contenido relacionado

Más de David Gleich

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresDavid Gleich
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networksDavid Gleich
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisDavid Gleich
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansDavid Gleich
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningDavid Gleich
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsDavid Gleich
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph miningDavid Gleich
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresDavid Gleich
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structuresDavid Gleich
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsDavid Gleich
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphsDavid Gleich
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutDavid Gleich
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential David Gleich
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...David Gleich
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsDavid Gleich
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLDavid Gleich
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksDavid Gleich
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detectionDavid Gleich
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceDavid Gleich
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...David Gleich
 

Más de David Gleich (20)

Spectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structuresSpectral clustering with motifs and higher-order structures
Spectral clustering with motifs and higher-order structures
 
Higher-order organization of complex networks
Higher-order organization of complex networksHigher-order organization of complex networks
Higher-order organization of complex networks
 
Spacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysisSpacey random walks and higher-order data analysis
Spacey random walks and higher-order data analysis
 
Non-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-meansNon-exhaustive, Overlapping K-means
Non-exhaustive, Overlapping K-means
 
Using Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based LearningUsing Local Spectral Methods to Robustify Graph-Based Learning
Using Local Spectral Methods to Robustify Graph-Based Learning
 
Spacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chainsSpacey random walks and higher order Markov chains
Spacey random walks and higher order Markov chains
 
Localized methods in graph mining
Localized methods in graph miningLocalized methods in graph mining
Localized methods in graph mining
 
PageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structuresPageRank Centrality of dynamic graph structures
PageRank Centrality of dynamic graph structures
 
Iterative methods with special structures
Iterative methods with special structuresIterative methods with special structures
Iterative methods with special structures
 
Big data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphsBig data matrix factorizations and Overlapping community detection in graphs
Big data matrix factorizations and Overlapping community detection in graphs
 
Localized methods for diffusions in large graphs
Localized methods for diffusions in large graphsLocalized methods for diffusions in large graphs
Localized methods for diffusions in large graphs
 
Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential Fast relaxation methods for the matrix exponential
Fast relaxation methods for the matrix exponential
 
Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...Gaps between the theory and practice of large-scale matrix-based network comp...
Gaps between the theory and practice of large-scale matrix-based network comp...
 
MapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applicationsMapReduce Tall-and-skinny QR and applications
MapReduce Tall-and-skinny QR and applications
 
Recommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQLRecommendation and graph algorithms in Hadoop and SQL
Recommendation and graph algorithms in Hadoop and SQL
 
Relaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networksRelaxation methods for the matrix exponential on large networks
Relaxation methods for the matrix exponential on large networks
 
Personalized PageRank based community detection
Personalized PageRank based community detectionPersonalized PageRank based community detection
Personalized PageRank based community detection
 
Tall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduceTall and Skinny QRs in MapReduce
Tall and Skinny QRs in MapReduce
 
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
Vertex neighborhoods, low conductance cuts, and good seeds for local communit...
 

Último

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 

Último (20)

Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 

Rank aggregation via skew-symmetric matrix completion

  • 1. Rank aggregation via nuclear norm minimization David F. Gleich Sandia National Laboratories Lek-Heng Lim University of Chicago Householder Symposium 18 Tahoe City, (Barely) CA Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. David F. Gleich (Sandia) Householder 18 1/15
  • 2. Which is a better list of good DVDs? Lord of the Rings 3: The Return of … Lord of the Rings 3: The Return of … Lord of the Rings 1: The Fellowship Lord of the Rings 1: The Fellowship Lord of the Rings 2: The Two Towers Lord of the Rings 2: The Two Towers Lost: Season 1 Star Wars V: Empire Strikes Back Battlestar Galactica: Season 1 Raiders of the Lost Ark Fullmetal Alchemist Star Wars IV: A New Hope Trailer Park Boys: Season 4 Shawshank Redemption Trailer Park Boys: Season 3 Star Wars VI: Return of the Jedi Tenchi Muyo! Lord of the Rings 3: Bonus DVD Shawshank Redemption The Godfather Standard Nuclear Norm rank aggregation based rank aggregation (the mean rating) David F. Gleich (Sandia) Householder 18 2/15
  • 3. Ranking is really hard John Kemeny Dwork, Kumar, Naor, Ken Arrow Sivikumar All rank aggregations involve some measure A good ranking is the of compromise “average” ranking under NP hard to compute a permutation distance Kemeny’s ranking David F. Gleich (Sandia) Householder 18 3/15
  • 4. Embody chair John Cantrell (flickr) Given a hard problem, what do you do? Numerically relax! It’ll probably be easier. David F. Gleich (Sandia) Householder 18 4 of 15
  • 5. Suppose we had scores Let   be the score of the ith movie/song/paper/team to rank Suppose we can compare the ith to jth:   Then   is skew-symmetric, rank 2. Also works for   with an extra log. Numerical ranking is intimately intertwined with skew-symmetric matrices Kemeny and Snell, Mathematical Models in Social Sciences (1978) David F. Gleich (Sandia) Householder 18 5/15
  • 6. Using ratings instead of comparisons Ratings induce various skew-symmetric matrices. Arithmetic Mean   Log-odds   David 1988 – The Method of Paired Comparisons David F. Gleich (Sandia) Householder 18 6/15
  • 7. Extracting the scores Given   with all entries, then 107   is the Borda Movie Pairs 105 count, the least-squares solution to   How many   do we have? 101 Most. 101 105 Number of Comparisons Do we trust all   ? Not really. Netflix data 17k movies, 500k users, 100M ratings– 99.17% filled David F. Gleich (Sandia) Householder 18 7/15
  • 8. Only partial info? Complete it! Let   be known for   We trust these scores. Goal Find the simplest skew-symmetric matrix that matches the data     NP hard Heuristic   Convex   best convex underestimator of rank on unit ball. David F. Gleich (Sandia) Householder 18 8/15
  • 9. Solving the nuclear norm problem Use a LASSO formulation 1.   2. REPEAT   3.   = rank-k SVD of     4. 5.     6. UNTIL   Jain et al. propose SVP for this problem without   Jain et al. NIPS 2010 David F. Gleich (Sandia) Householder 18 9/15
  • 10. Skew-symmetric SVDs Let   be an   skew-symmetric matrix with eigenvalues   , where   and   . Then the SVD of   is given by   for   and   given in the proof. Proof Use the Murnaghan-Wintner form and the SVD of a 2x2 skew-symmetric block This means that SVP will give us the skew- symmetric constraint “for free” David F. Gleich (Sandia) Householder 18 10/15
  • 11. Exact recovery results David Gross showed how to recover Hermitian matrices. i.e. the conditions under which we get the exact   Note that   is Hermitian. Thus our new result!   Gross arXiv 2010. David F. Gleich (Sandia) Householder 18 11/15
  • 12. Recovery Discussion and Experiments If   , then just look at differences from a connected set. Constants? Not very good.   Intuition for the truth.     David F. Gleich (Sandia) Householder 18 12/15
  • 13. The Ranking Algorithm 0. INPUT   (ratings data) and c (for trust on comparisons) 1. Compute   from   2. Discard entries with fewer than c comparisons 3. Set   to be indices and values of what’s left 4.   = SVP(  ) 5. OUTPUT   David F. Gleich (Sandia) Householder 18 13/15
  • 14. Synthetic Results Construct an Item Response Theory model. Vary number of ratings per user and a noise/error level Our Algorithm! The Average Rating David F. Gleich (Sandia) Householder 18 14/15
  • 15. Conclusions and Future Work 1. Compare against others Rank aggregation with van Dooren (fitting) the nuclear norm is: Massey (direct least principled squares for   ) easy to compute The results are much better 2. Noisy recovery! More than simple approaches. realistic sampling. 3. Skew-symmetric Lanczos based SVD? David F. Gleich (Sandia) Householder 18 15/15
  • 16. Google nuclear ranking gleich To appear KDD2011