SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Thesis presentation:

Online recommendations
at scale with matrix factorisation.




Royal Institute of Technology, Stockholm, Sweden               22 June 2012
Instituto Superior Técnico, Lisbon, Portugal                Marcus Ljungblad
Universitat Politécnica de Catalunya, Barcelona, Spain   marcus@ljungblad.nu
"   75% of the 30 million daily movie
    starts are sourced from
    recommendations.
"   a key differentiating factor
3 challenges
How do you serve
recommendations
from millions of
items to millions
of users online?
Video ratings



        2   4       4           ?   1
        3   5       ?           ?   1
Users




        ?   4       2           1   ?
        1   ?       1           3   3
f( )
Video ratings



        2.05   3.97   3.96   2.12   1.01
        2.93   5.02   3.21   1.61   0.98
Users




        2.15   3.95   2.01   1.05   1.10
        1.00   4.29   1.01   2.96   2.98
Video ratings


        2.05   3.97   3.96   2.12   1.01           2   4   4   ?   1
Users




        2.93   5.02   3.21   1.61   0.98           3   5   ?   ?   1
        2.15   3.95   2.01   1.05   1.10           ?   4   2   1   ?
        1.00   4.29   1.01   2.96   2.98           1   ?   1   3   3
2.05   3.97   3.96   2.12   1.01
2.93   5.02   3.21   1.61   0.98
2.15   3.95   2.01   1.05   1.10
1.00   4.29   1.01   2.96   2.98
13x40
MILLION
RATINGS
Interface             Delegate                     Router             Worker


request
                      start

                                             route

                                                                      compute




                                                     top-N




                                           merge
                      to json
  reply
Interface             Delegate                     Router             Worker


request
                      start

                                             route

                                                                      compute




                                                     top-N




                                           merge
                      to json
  reply
Did it work?
Setup:
 • 1-3 machines

 • 1 million items

 • same rack = high-speed

 • 1 test machine
Performance!
Performance!




        h uh?!
Did it work?
          w ell
74% = 74%
 Offline   Online
Summary:
... clustering depends on data ...

... need balanced clusters ...

... memory bound ...

... scales ok ...
Thank you!
Photos and pictures borrowed from the Internetz:

Iron Maiden cover: http://en.wikipedia.org/wiki/File:Iron_Maiden_(album)_cover.jpg
Cat picture: http://www.lastfm.es/group/Cats
Coins: http://www.sxc.hu/photo/1235540
iPhones: http://blog.bayuamus.com/2011/08/user-experience-comparison-between-htc-salsa-and-samsung-galaxy-mini/
Amazon recommendations: http://mashable.com/2010/08/06/online-retail-facebook-data/
TV remote: http://www.flickr.com/photos/62337512@N00/2749561795/sizes/z/in/photostream/
Headphones: http://www.flickr.com/photos/markusschoepke/82957375/sizes/m/in/photostream/
Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg
Home servers: http://www.flickr.com/photos/fabrico/477844434/sizes/z/in/photostream/
Extra material...
AXYDBLSZQ   (1/2) / 1


AXYDBLSZQ   (1/1) / 1


AXYDBLSZQ   (1/1 + 2/3) / 2

Más contenido relacionado

Destacado

Recommendation techniques
Recommendation techniques Recommendation techniques
Recommendation techniques
sun9413
 
Your own recommendation engine with neo4j and reco4php - DPC16
Your own recommendation engine with neo4j and reco4php - DPC16Your own recommendation engine with neo4j and reco4php - DPC16
Your own recommendation engine with neo4j and reco4php - DPC16
Christophe Willemsen
 

Destacado (18)

Text mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science projectText mining to correct missing CRM information: a practical data science project
Text mining to correct missing CRM information: a practical data science project
 
Datamining for crm
Datamining for crmDatamining for crm
Datamining for crm
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Customer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubeyCustomer relationship management_dwm_ankita_dubey
Customer relationship management_dwm_ankita_dubey
 
Ranking Related News Predictions
Ranking Related News PredictionsRanking Related News Predictions
Ranking Related News Predictions
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
 
Solving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model ProblemSolving the AL Chicken-and-Egg Corpus and Model Problem
Solving the AL Chicken-and-Egg Corpus and Model Problem
 
Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014Recommender.system.presentation.pjug.01.21.2014
Recommender.system.presentation.pjug.01.21.2014
 
Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...Customer Relationship Management in Ireland Managing your Customers for Busin...
Customer Relationship Management in Ireland Managing your Customers for Busin...
 
Multi Criteria Recommender Systems - Overview
Multi Criteria Recommender Systems - OverviewMulti Criteria Recommender Systems - Overview
Multi Criteria Recommender Systems - Overview
 
Recommendation techniques
Recommendation techniques Recommendation techniques
Recommendation techniques
 
Your own recommendation engine with neo4j and reco4php - DPC16
Your own recommendation engine with neo4j and reco4php - DPC16Your own recommendation engine with neo4j and reco4php - DPC16
Your own recommendation engine with neo4j and reco4php - DPC16
 
Profile injection attack detection in recommender system
Profile injection attack detection in recommender systemProfile injection attack detection in recommender system
Profile injection attack detection in recommender system
 
Recommendation Engine Project Presentation
Recommendation Engine Project PresentationRecommendation Engine Project Presentation
Recommendation Engine Project Presentation
 
Data mining
Data miningData mining
Data mining
 
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender SystemsTutorial: Context-awareness In Information Retrieval and Recommender Systems
Tutorial: Context-awareness In Information Retrieval and Recommender Systems
 
Recommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab GhoshRecommendation Engine Powered by Hadoop - Pranab Ghosh
Recommendation Engine Powered by Hadoop - Pranab Ghosh
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 

Similar a Online recommendations at scale using matrix factorisation

Signals and systems with matlab computing and simulink modeling
Signals and systems with matlab computing and simulink modelingSignals and systems with matlab computing and simulink modeling
Signals and systems with matlab computing and simulink modeling
votasugs567
 
Mobisys Seminar 28/10/08
Mobisys Seminar 28/10/08Mobisys Seminar 28/10/08
Mobisys Seminar 28/10/08
poline_sonia
 

Similar a Online recommendations at scale using matrix factorisation (20)

IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving CarsIRJET - Steering Wheel Angle Prediction for Self-Driving Cars
IRJET - Steering Wheel Angle Prediction for Self-Driving Cars
 
Presentation
PresentationPresentation
Presentation
 
Process coordinator in NUMA environment
Process coordinator in NUMA environmentProcess coordinator in NUMA environment
Process coordinator in NUMA environment
 
Analysing quantitative data
Analysing quantitative dataAnalysing quantitative data
Analysing quantitative data
 
Compensator Design for Speed Control of DC Motor by Root Locus Approach using...
Compensator Design for Speed Control of DC Motor by Root Locus Approach using...Compensator Design for Speed Control of DC Motor by Root Locus Approach using...
Compensator Design for Speed Control of DC Motor by Root Locus Approach using...
 
IRJET- Criminal Recognization in CCTV Surveillance Video
IRJET-  	  Criminal Recognization in CCTV Surveillance VideoIRJET-  	  Criminal Recognization in CCTV Surveillance Video
IRJET- Criminal Recognization in CCTV Surveillance Video
 
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
18.02.05_IAAI2018_Mobille Network Failure Event Detection and Forecasting wit...
 
Route Intensity Tracker using Machine Learning and Database Management
Route Intensity Tracker using Machine Learning and Database ManagementRoute Intensity Tracker using Machine Learning and Database Management
Route Intensity Tracker using Machine Learning and Database Management
 
IRJET - Augmented Tangible Style using 8051 MCU
IRJET -  	  Augmented Tangible Style using 8051 MCUIRJET -  	  Augmented Tangible Style using 8051 MCU
IRJET - Augmented Tangible Style using 8051 MCU
 
Signals and systems with matlab computing and simulink modeling
Signals and systems with matlab computing and simulink modelingSignals and systems with matlab computing and simulink modeling
Signals and systems with matlab computing and simulink modeling
 
SFScon 22 - Andrea Janes - Scalability assessment applied to microservice arc...
SFScon 22 - Andrea Janes - Scalability assessment applied to microservice arc...SFScon 22 - Andrea Janes - Scalability assessment applied to microservice arc...
SFScon 22 - Andrea Janes - Scalability assessment applied to microservice arc...
 
Mobisys Seminar 28/10/08
Mobisys Seminar 28/10/08Mobisys Seminar 28/10/08
Mobisys Seminar 28/10/08
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER) International Journal of Computational Engineering Research (IJCER)
International Journal of Computational Engineering Research (IJCER)
 
Automated Security Surveillance System in Real Time World
Automated Security Surveillance System in Real Time WorldAutomated Security Surveillance System in Real Time World
Automated Security Surveillance System in Real Time World
 
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
FPGA Implementation of 2-D DCT & DWT Engines for Vision Based Tracking of Dyn...
 
AUTOMATIC SPEED CONTROLLING OF VEHICLE BASED ON SIGNBOARD DETECTION USING IMA...
AUTOMATIC SPEED CONTROLLING OF VEHICLE BASED ON SIGNBOARD DETECTION USING IMA...AUTOMATIC SPEED CONTROLLING OF VEHICLE BASED ON SIGNBOARD DETECTION USING IMA...
AUTOMATIC SPEED CONTROLLING OF VEHICLE BASED ON SIGNBOARD DETECTION USING IMA...
 
IRJET-A Blind Watermarking Algorithm
IRJET-A Blind Watermarking AlgorithmIRJET-A Blind Watermarking Algorithm
IRJET-A Blind Watermarking Algorithm
 
IRJET-A Blind Watermarking Algorithm
IRJET-A Blind Watermarking AlgorithmIRJET-A Blind Watermarking Algorithm
IRJET-A Blind Watermarking Algorithm
 
A Blind Watermarking Algorithm
A Blind Watermarking AlgorithmA Blind Watermarking Algorithm
A Blind Watermarking Algorithm
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 

Online recommendations at scale using matrix factorisation

  • 1. Thesis presentation: Online recommendations at scale with matrix factorisation. Royal Institute of Technology, Stockholm, Sweden 22 June 2012 Instituto Superior Técnico, Lisbon, Portugal Marcus Ljungblad Universitat Politécnica de Catalunya, Barcelona, Spain marcus@ljungblad.nu
  • 2.
  • 3.
  • 4.
  • 5. " 75% of the 30 million daily movie starts are sourced from recommendations.
  • 6. " a key differentiating factor
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14. How do you serve recommendations from millions of items to millions of users online?
  • 15. Video ratings 2 4 4 ? 1 3 5 ? ? 1 Users ? 4 2 1 ? 1 ? 1 3 3
  • 16. f( )
  • 17. Video ratings 2.05 3.97 3.96 2.12 1.01 2.93 5.02 3.21 1.61 0.98 Users 2.15 3.95 2.01 1.05 1.10 1.00 4.29 1.01 2.96 2.98
  • 18. Video ratings 2.05 3.97 3.96 2.12 1.01 2 4 4 ? 1 Users 2.93 5.02 3.21 1.61 0.98 3 5 ? ? 1 2.15 3.95 2.01 1.05 1.10 ? 4 2 1 ? 1.00 4.29 1.01 2.96 2.98 1 ? 1 3 3
  • 19. 2.05 3.97 3.96 2.12 1.01 2.93 5.02 3.21 1.61 0.98 2.15 3.95 2.01 1.05 1.10 1.00 4.29 1.01 2.96 2.98
  • 21.
  • 22.
  • 23.
  • 24.
  • 25.
  • 26.
  • 27.
  • 28.
  • 29. Interface Delegate Router Worker request start route compute top-N merge to json reply
  • 30. Interface Delegate Router Worker request start route compute top-N merge to json reply
  • 32. Setup: • 1-3 machines • 1 million items • same rack = high-speed • 1 test machine
  • 34. Performance! h uh?!
  • 35. Did it work? w ell
  • 36. 74% = 74% Offline Online
  • 37.
  • 38. Summary: ... clustering depends on data ... ... need balanced clusters ... ... memory bound ... ... scales ok ...
  • 40.
  • 41. Photos and pictures borrowed from the Internetz: Iron Maiden cover: http://en.wikipedia.org/wiki/File:Iron_Maiden_(album)_cover.jpg Cat picture: http://www.lastfm.es/group/Cats Coins: http://www.sxc.hu/photo/1235540 iPhones: http://blog.bayuamus.com/2011/08/user-experience-comparison-between-htc-salsa-and-samsung-galaxy-mini/ Amazon recommendations: http://mashable.com/2010/08/06/online-retail-facebook-data/ TV remote: http://www.flickr.com/photos/62337512@N00/2749561795/sizes/z/in/photostream/ Headphones: http://www.flickr.com/photos/markusschoepke/82957375/sizes/m/in/photostream/ Function: http://en.wikipedia.org/wiki/File:Graph_of_example_function.svg Home servers: http://www.flickr.com/photos/fabrico/477844434/sizes/z/in/photostream/
  • 43.
  • 44.
  • 45. AXYDBLSZQ (1/2) / 1 AXYDBLSZQ (1/1) / 1 AXYDBLSZQ (1/1 + 2/3) / 2