Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.


Es necesario tener Flash Player 9 (o una versión superior) para ver presentaciones.
Hemos detectado que no lo tienes en tu ordenador. Para instalarlo, ve a aquí.

Raiders of the Lost Star

1.403 visualizaciones

Publicado el

Introduction to Recommender Systems Research at Telefonica just when it was all starting November 2007

Publicado en: Tecnología, Viajes
  • Sé el primero en comentar

Raiders of the Lost Star

  1. 1. TELEFÓNICA I+D November 27 2007 Raiders of the Lost Coping with Information Overflow through Recommendations © 2007 Telefónica Investigación y Desarrollo, S.A. Unipersonal Xavier Amatriain Researcher Give us the content
  2. 2. <ul><li>01 Introduction </li></ul><ul><li>02 Recommender Systems </li></ul><ul><li>03 The Netflix Prize </li></ul><ul><li>04 The Sparsity Problem </li></ul><ul><li>05 Working with the Data </li></ul><ul><li>06 So what does Telefonica get out of all this? </li></ul><ul><li>07 Conclusions </li></ul>Index
  3. 3. A little icebreaker What movie do you like?
  4. 4. Information overload “People read around 10 MB worth of material a day, hear 400 MB a day, and see one MB of information every second” The Economist, November 2006
  5. 5. Tell me what you like... <ul><li>Tell me what you like and I will tell you who you are </li></ul><ul><li>Tell me who you know and I will tell you what you like </li></ul><ul><li>Tell me what you have and I will tell you what you need </li></ul>
  6. 6. The value of recommendations <ul><li>Netflix: 2/3 of the movies rented were recommended </li></ul><ul><li>Google News: recommendations generate 38% more clickthrough </li></ul><ul><li>Amazon: 35% sales from recommendations </li></ul><ul><li>Choicestream: 28% of the people would buy more music if they found what they liked. </li></ul>
  7. 7. 02 Recommender Systems
  8. 8. The “Recommender problem” <ul><li>Estimate a utility function that is able to automatically predict how much a user will like an item that is unknown for her. Based on: </li></ul><ul><ul><li>Past behavior </li></ul></ul><ul><ul><li>Relations to other users </li></ul></ul><ul><ul><li>Item similarity </li></ul></ul><ul><ul><li>... </li></ul></ul>
  9. 9. Approaches to Recommendation <ul><li>Collaborative Filtering </li></ul><ul><ul><li>Recommend items based only on how other users have previously rated those items </li></ul></ul><ul><ul><li>User-based </li></ul></ul><ul><ul><ul><li>Find similar users to me and recommend what those users liked </li></ul></ul></ul><ul><ul><li>Item-based </li></ul></ul><ul><ul><ul><li>Find a similar item to those that I have previously liked </li></ul></ul></ul><ul><li>Content-based </li></ul><ul><ul><li>Recommend based on features inherent to the items </li></ul></ul>
  10. 10. What works? <ul><li>What works clearly depends on the domain of the recommender: Domain-specific modeling </li></ul><ul><li>However, in the general case it has been demonstrated that the best isolated approach is (currently) the item-based collaborative filtering. </li></ul><ul><ul><li>Other approaches can be hybridized to improve results in specific cases (cold-start problem...) </li></ul></ul>
  11. 11. 03 The Netflix Prize
  12. 12. The Netflix Prize <ul><li>500,000 users * 17,000 movie titles = 100M ratings = $1M (if you “only” improve existing system by 10%! From 0.95 to 0.85 RMSE) </li></ul><ul><ul><li>This is what Netflix thinks a 10% improvement is worth for their business </li></ul></ul><ul><ul><li>29K contestants on 23K teams from 165 countries. </li></ul></ul><ul><ul><li>19K valid submissions from 2700 teams; 59 submissions in the “last 24 hours” </li></ul></ul>
  13. 13. The Netflix Prize <ul><li>First conclusion: it is really extremely simple to reach a “reasonable” recommendations and extremely difficult to improve them. </li></ul>
  14. 14. The Netflix Prize <ul><li>(Apart from the extremely unlikely possibility of getting the $1M) it is a great source of data and measurable improvement. </li></ul><ul><ul><li>100M ratings from 1 to 5 </li></ul></ul><ul><ul><li>Measure of success: RMSE </li></ul></ul><ul><li>Most successfull teams are using item-based collaborative filtering and some sort of matrix factorization (such as SVD) and... </li></ul>
  15. 15. The Netflix Prize <ul><li>Currently the leader is at 8.5% improvement (blending 107 individual predictors using all sorts of techniques) </li></ul><ul><li>Many teams are merging </li></ul>
  16. 16. 04 The Sparsity Problem
  17. 17. The Sparsity Problem <ul><li>If you represent the Netflix rating data in a User/Movie matrix you get... </li></ul><ul><ul><li>500,000 x 17,000 = 8,500 M positions </li></ul></ul><ul><ul><li>Out of which only 100M are not 0's! </li></ul></ul><ul><li>Methods of dimensionality reduction </li></ul><ul><ul><li>Matrix Factorization </li></ul></ul><ul><ul><li>Clustering </li></ul></ul><ul><ul><li>Projection (PCA ...) </li></ul></ul>
  18. 18. Dimensionality Reduction <ul><li>Matrix Factorization </li></ul><ul><ul><li>This is so far the “winning horse” </li></ul></ul><ul><ul><li>In particular the Singular Value Decomposition method (Simon Funk's modified SVD) </li></ul></ul><ul><li>Clustering </li></ul><ul><ul><li>Similar results can be obtained but a higher computational cost (so far many “traditional” algorithms such as K-nn have been tried with varying results). </li></ul></ul>
  19. 19. Our approach to Dimensionality Reduction <ul><li>We are experimenting with message-passing clustering algorithms </li></ul><ul><ul><li>Affinity Propagation (Frey&Dueck, Science, February 2007) </li></ul></ul>
  20. 20. But wait... Is this all about tweaking algorithms? 05 Working with the data
  21. 21. What about the data? <ul><li>Data massaging </li></ul><ul><ul><li>Denoising – can we remove outliers and/or estimate noise? </li></ul></ul><ul><ul><ul><li>We are working on estimating noise inherent to the absolute quantized rating system. </li></ul></ul></ul><ul><ul><li>Remove global effects </li></ul></ul><ul><ul><ul><li>User tendencies (e.g. to rate higher than others) </li></ul></ul></ul><ul><ul><ul><li>Movie tendencies </li></ul></ul></ul><ul><ul><ul><li>Cross tendencies (movie vs. time...) </li></ul></ul></ul>
  22. 22. Approaching the sparsity problem <ul><li>A different (although complementary) approach to reducing data sparsity deals with trying to improve the data set. </li></ul><ul><li>2 possibilities </li></ul><ul><ul><li>Content-based approach </li></ul></ul><ul><ul><ul><li>“Group” similar items because they share similar important features (such as genre or director in films) to reduce dimensions </li></ul></ul></ul><ul><ul><ul><li>Add editorial data from external sources </li></ul></ul></ul><ul><ul><li>User-based approach </li></ul></ul><ul><ul><ul><li>Are there users “out there” that can provide missing data </li></ul></ul></ul>
  23. 23. User-oriented data approach <ul><li>Adding “expert” users might help in clustering the data set </li></ul><ul><li>We are crawling the web to find complementary information for users such as critics or others coming from services similar to Netflix </li></ul>
  24. 24. Algorithms + data + all those other things <ul><li>Serendipity </li></ul><ul><li>User Interface </li></ul><ul><li>Architecture </li></ul><ul><li>.... </li></ul>
  25. 25. 06 What does Telefonica get out of all this?
  26. 26. Some TEF projects using RS <ul><li>Online picture repository </li></ul><ul><li>IPTV program recommendation (Imagenio) </li></ul><ul><li>Personalized Addvertisement Placement (hyper-targetting) </li></ul><ul><li>Music recommendation on the cell phone </li></ul><ul><li>Product recommendation for online stores </li></ul>Multimedia Entertainment E-commerce Social Networking News/Blogs/Portals Comunidades PLATFORM PRODUCTS AND SERVICES COMMERCIALIZATION Content Packaging and Design Devices Access Commercialization Customers Recommendation Systems
  27. 27. 07 Conclusions <ul><li>Key technology in future years </li></ul><ul><li>Many areas to improve and large unexplored research field </li></ul><ul><ul><li>Area related to many traditional disciplines: Computer Science, Statistics, Economics, Sociology... </li></ul></ul><ul><li>Research results immediately applicable </li></ul><ul><ul><li>And generate revenues </li></ul></ul><ul><li>Core approach is reusable in many cases </li></ul>