SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Matrix Factorizations for Recommender Systems
Dmitriy Selivanov
selivanov.dmitriy@gmail.com
2017-11-16
Recommender systems are everywhere
Figure 1:
Recommender systems are everywhere
Figure 2:
Recommender systems are everywhere
Figure 3:
Recommender systems are everywhere
Figure 4:
Goals
Propose “relevant” items to customers
Retention
Exploration
Up-sale
Personalized offers
recommended items for a customer given history of activities (transactions, browsing
history, favourites)
Similar items
substitutions
bundles - frequently bought together
. . .
Live demo
Dataset - LastFM-360K:
360k users
160k artists
17M observations
sparsity - 0.9999999
Explicit feedback
Ratings, likes/dislikes, purchases:
cleaner data
smaller
hard to collect
RMSE2
=
1
D u,i∈D
(rui − ˆrui )2
Netflix prize
~ 480k users, 18k movies, 100m ratings
sparsity ~ 90%
goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
Implicit feedback
noisy feedback (click, likes, purchases, search, . . . )
much easier to collect
wider user/item coverage
usually sparsity > 99.9%
One-Class Collaborative Filtering
observed entries are positive preferences
should have high confidence
missed entries in matrix are mix of negative preferences and positive preferences
consider them as negative with low confidence
we cannot really distinguish that user did not click a banner because of a lack of
interest or lack of awareness
Evaluation
Recap: we only care about how to produce small set of highly relevant items.
RMSE is bad metrics - very weak connection to business goals.
Only interested about relevance precision of retreived items:
space on the screen is limited
only order matters - most relevant items should be in top
Ranking - Mean average precision
AveragePrecision =
n
k=1
(P(k)×rel(k))
number of relevant documents
## index relevant precision_at_k
## 1: 1 0 0.0000000
## 2: 2 0 0.0000000
## 3: 3 1 0.3333333
## 4: 4 0 0.2500000
## 5: 5 0 0.2000000
map@5 = 0.1566667
Ranking - Normalized Discounted Cumulative Gain
Intuition is the same as for MAP@K, but also takes into account value of relevance:
DCGp =
p
i=1
2reli − 1
log2(i + 1)
nDCGp =
DCGp
IDCGp
IDCGp =
|REL|
i=1
2reli − 1
log2(i + 1)
Approaches
Content based
good for cold start
not personalized
Collaborative filtering
vanilla collaborative fitlering
matrix factorizations
. . .
Hybrid and context aware recommender systems
best of two worlds
Focus today
WRMF (Weighted Regularized Matrix Factorization) - Collaborative Filtering for
Implicit Feedback Datasets (2008)
efficient learning with accelerated approximate Alternating Least Squares
inference time
Linear-FLow - Practical Linear Models for Large-Scale One-Class Collaborative
Filtering (2016)
efficient truncated SVD
cheap cross-validation with full path regularization
Matrix Factorizations
Users can be described by small number of latent factors puk
Items can be described by small number of latent factors qki
Sparse data
items
users
Low rank matrix factorization
R = P × Q
factors
users
items
factors
Reconstruction
items
users
items
users
Truncated SVD
Take k largest singular values:
X ≈ UkDkV T
k
- Xk ∈ Rm∗n - Uk, V - columns are orthonormal bases (dot product of any 2 columns is
zero, unit norm) - Dk - matrix with singular values on diagonal
Truncated SVD is the best rank k approximation of the matrix X in terms of
Frobenius norm:
||X − UkDkV T
k ||F
P = Uk Dk
Q = DkV T
k
Issue with truncated SVD for “explicit” feedback
Optimal in terms of Frobenius norm - takes into account zeros in ratings -
RMSE =
1
users × items u∈users,i∈items
(rui − ˆrui )2
Overfits data
Objective = error only in “observed” ratings:
RMSE =
1
Observed u,i∈Observed
(rui − ˆrui )2
SVD-like matrix factorization with ALS
J =
u,i∈Observed
(rui − pu × qi )2
+ λ(||Q2
|| + ||P2
||)
Given Q fixed solve for p:
min
i∈Observed
(ri − qi × P)2
+ λ
u
j=1
p2
j
Given P fixed solve for q:
min
u∈Observed
(ru − pu × Q)2
+ λ
i
j=1
q2
j
Ridge regression: P = (QT Q + λI)−1QT r, Q = (PT P + λI)−1PT r
“Collaborative Filtering for Implicit Feedback Datasets”
WRMF - Weighted Regularized Matrix Factorization
“Default” approach
Proposed in 2008, but still widely used in industry (even at youtube)
several high-quality open-source implementations
J =
u,i
Cui (Pui − XuYi )2
+ λ(||X||F + ||Y ||F )
Preferences - binary
Pij =
1 if Rij > 0
0 otherwise
Confidence - Cui = 1 + f (Rui )
Alternating Least Squares for implicit feedback
For fixed Y :
dL/dxu = −2
i=item
cui (pui − xT
u yi )yi + 2λxu =
−2
i=item
cui (pui − yT
i xu)yi + 2λxu =
−2Y T
Cu
p(u) + 2Y T
Cu
Yxu + 2λxu
Setting dL/dxu = 0 for optimal solution gives us (Y T CuY + λI)xu = Y T Cup(u)
xu can be obtained by solving system of linear equations:
xu = solve(Y T
Cu
Y + λI, Y T
Cu
p(u))
Alternating Least Squares for implicit feedback
Similarly for fixed X:
dL/dyi = −2XT Ci p(i) + 2XT Ci Yyi + 2λyi
yi = solve(XT Ci X + λI, XT Ci p(i))
Another optimization:
XT Ci X = XT X + XT (Ci − I)X
Y T CuY = Y T Y + Y T (Cu − I)Y
XT X and Y T Y can be precomputed
Accelerated Approximate Alternating Least Squares
yi = solve(XT Ci X + λI, XT Ci p(i))
Iterative methods
Conjugate Gradient
Coordinate Descend
Fixed number of steps of (usually 3-4 is enough):
Inference time
How to make recommendations for new users?
There are no user embeddings since users are not in original matrix!
Inference time
Make one step on ALS with fixed item embeddings matrix => get new user embeddings:
given Y fixed, Cnew - new user-item interactions confidence
xunew = solve(Y T Cunew Y + λI, Y T Cunew p(unew ))
scores = Xnew Y T
WRMF Implementations
python implicit - implemets Conjugate Gradient. With GPU support recently!
R reco - implemets Conjugate Gradient
Spark ALS
Quora qmf
Google tensorflow
*titles are clickable
Linear-Flow
Idea is to learn item-item similarity matrix W from the data.
First
min J = ||X − XWk||F + λ||Wk||F
With constraint:
rank(W ) ≤ k
Linear-Flow observations
1. Whithout L2 regularization optimal solution is Wk = QkQT
k where
SVDk(X) = PkΣkQT
k
2. Whithout rank(W ) ≤ k optimal solution is just solution for ridge regression:
W = (XT X + λI)−1XT X - infeasible.
Linear-Flow reparametrization
SVDk(X) = PkΣkQT
k
Let W = QkY :
argmin(Y ) : ||X − XQkY ||F + λ||QkY ||F
Motivation
λ = 0 => W = QkQT
k and also soliton for current problem Y = QT
k
Linear-Flow closed-form solution
Notice that if Qk orthogogal then ||QkY ||F = ||Y ||F
Solve ||X − XQkY ||F + λ||Y ||F
Simple ridge regression with close form solution
Y = (QT
k XT
XQk + λI)−1
QT
k XT
X
Very cheap inversion of the matrix of rank k!
Linear-Flow hassle-free cross-validation
Y = (QT
k XT
XQk + λI)−1
QT
k XT
X
How to find lamda with cross-validation?
pre-compute Z = QT
k XT X so Y = (ZQk + λI)−1Z -
pre-compute ZQk
notice that value of lambda affects only diagonal of ZQk
generate sequence of lambda (say of length 50) based on min/max diagonal values
solving 50 rigde regression of a small rank is super-fast
Linear-Flow hassle-free cross-validation
Figure 7:
Suggestions
start simple - SVD, WRMF
design proper cross-validation - both objective and data split
think about how to incorporate business logic (for example how to exclude
something)
use single machine implementations
think about inference time
don’t waste time with libraries/articles/blogposts wich demonstrate MF with dense
matrices
Questions?
http://dsnotes.com/tags/recommender-systems/
https://github.com/dselivanov/reco
Contacts:
selivanov.dmitriy@gmail.com
https://github.com/dselivanov
https://www.linkedin.com/in/dselivanov1

Más contenido relacionado

La actualidad más candente

【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析KazuyaYagihashi
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and EditingDeep Learning JP
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsOVHcloud
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築Kosuke Shinoda
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Preferred Networks
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationSangwoo Mo
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architectureLiang Xiang
 
Stochastic Variational Inference
Stochastic Variational InferenceStochastic Variational Inference
Stochastic Variational InferenceKaede Hayashi
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Weiwei Guo
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것NAVER Engineering
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep LearningSebastian Ruder
 
機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話Ryota Kamoshida
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsYoonho Lee
 
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習Yusuke Iwasawa
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention NetworksTaeoh Kim
 

La actualidad más candente (20)

【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析【BERT】自然言語処理を用いたレビュー分析
【BERT】自然言語処理を用いたレビュー分析
 
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing[DL輪読会]GLIDE: Guided Language to Image Diffusion  for Generation and Editing
[DL輪読会]GLIDE: Guided Language to Image Diffusion for Generation and Editing
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
PyTorchLightning ベース Hydra+MLFlow+Optuna による機械学習開発環境の構築
 
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
Kaggle Happywhaleコンペ優勝解法でのOptuna使用事例 - 2022/12/10 Optuna Meetup #2
 
EVCache at Netflix
EVCache at NetflixEVCache at Netflix
EVCache at Netflix
 
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Recommender system algorithm and architecture
Recommender system algorithm and architectureRecommender system algorithm and architecture
Recommender system algorithm and architecture
 
Stochastic Variational Inference
Stochastic Variational InferenceStochastic Variational Inference
Stochastic Variational Inference
 
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
Deep Natural Language Processing for Search Systems (sigir 2019 tutorial)
 
오토인코더의 모든 것
오토인코더의 모든 것오토인코더의 모든 것
오토인코더의 모든 것
 
Bert
BertBert
Bert
 
20191019 sinkhorn
20191019 sinkhorn20191019 sinkhorn
20191019 sinkhorn
 
Optimization for Deep Learning
Optimization for Deep LearningOptimization for Deep Learning
Optimization for Deep Learning
 
機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話機械学習によるデータ分析まわりのお話
機械学習によるデータ分析まわりのお話
 
On First-Order Meta-Learning Algorithms
On First-Order Meta-Learning AlgorithmsOn First-Order Meta-Learning Algorithms
On First-Order Meta-Learning Algorithms
 
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習
JSAI2017:敵対的訓練を利用したドメイン不変な表現の学習
 
CNN Attention Networks
CNN Attention NetworksCNN Attention Networks
CNN Attention Networks
 

Destacado

Recsys matrix-factorizations
Recsys matrix-factorizationsRecsys matrix-factorizations
Recsys matrix-factorizationsDmitriy Selivanov
 
Disorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleDisorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleHelena Edelson
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldTimothy Perrett
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashingDmitriy Selivanov
 
Return of the transaction king
Return of the transaction kingReturn of the transaction king
Return of the transaction kingRyan Knight
 
Analyzing Functional Programs
Analyzing Functional ProgramsAnalyzing Functional Programs
Analyzing Functional ProgramsDave Cleaver
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Yoshiaki Shibutani
 
JVM上で動くPython処理系実装のススメ
JVM上で動くPython処理系実装のススメJVM上で動くPython処理系実装のススメ
JVM上で動くPython処理系実装のススメYoshiaki Shibutani
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門hoxo_m
 
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践Yoshifumi Kawai
 

Destacado (11)

Recsys matrix-factorizations
Recsys matrix-factorizationsRecsys matrix-factorizations
Recsys matrix-factorizations
 
Disorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At ScaleDisorder And Tolerance In Distributed Systems At Scale
Disorder And Tolerance In Distributed Systems At Scale
 
Nelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional WorldNelson: Rigorous Deployment for a Functional World
Nelson: Rigorous Deployment for a Functional World
 
Finding similar items in high dimensional spaces locality sensitive hashing
Finding similar items in high dimensional spaces  locality sensitive hashingFinding similar items in high dimensional spaces  locality sensitive hashing
Finding similar items in high dimensional spaces locality sensitive hashing
 
Return of the transaction king
Return of the transaction kingReturn of the transaction king
Return of the transaction king
 
Analyzing Functional Programs
Analyzing Functional ProgramsAnalyzing Functional Programs
Analyzing Functional Programs
 
ENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million UsersENT309 Scaling Up to Your First 10 Million Users
ENT309 Scaling Up to Your First 10 Million Users
 
Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)Pythonが動く仕組み(の概要)
Pythonが動く仕組み(の概要)
 
JVM上で動くPython処理系実装のススメ
JVM上で動くPython処理系実装のススメJVM上で動くPython処理系実装のススメ
JVM上で動くPython処理系実装のススメ
 
機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門機械学習のためのベイズ最適化入門
機械学習のためのベイズ最適化入門
 
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
「黒騎士と白の魔王」gRPCによるHTTP/2 - API, Streamingの実践
 

Similar a Matrix Factorizations for Recommender Systems Explained

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorizationrecsysfr
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Rediet Moges
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Meanstthonet
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation systemKimikazu Kato
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackarogozhnikov
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Charles Martin
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismParameswaran Raman
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdfAlexanderLerch4
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdfanandsimple
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Geoffrey Négiar
 
Lecture note4coordinatedescent
Lecture note4coordinatedescentLecture note4coordinatedescent
Lecture note4coordinatedescentXudong Sun
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Université de Liège (ULg)
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...JuanPabloCarbajal3
 

Similar a Matrix Factorizations for Recommender Systems Explained (20)

Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
Dictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix FactorizationDictionary Learning for Massive Matrix Factorization
Dictionary Learning for Massive Matrix Factorization
 
Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03Digital Signal Processing[ECEG-3171]-Ch1_L03
Digital Signal Processing[ECEG-3171]-Ch1_L03
 
ENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-MeansENBIS 2018 presentation on Deep k-Means
ENBIS 2018 presentation on Deep k-Means
 
SASA 2016
SASA 2016SASA 2016
SASA 2016
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 
Introduction to behavior based recommendation system
Introduction to behavior based recommendation systemIntroduction to behavior based recommendation system
Introduction to behavior based recommendation system
 
MLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic trackMLHEP Lectures - day 2, basic track
MLHEP Lectures - day 2, basic track
 
Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3Applied machine learning for search engine relevance 3
Applied machine learning for search engine relevance 3
 
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid ParallelismDS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
DS-MLR: Scaling Multinomial Logistic Regression via Hybrid Parallelism
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf0A-02-ACA-Fundamentals-Convolution.pdf
0A-02-ACA-Fundamentals-Convolution.pdf
 
Optimization tutorial
Optimization tutorialOptimization tutorial
Optimization tutorial
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Optimization Techniques.pdf
Optimization Techniques.pdfOptimization Techniques.pdf
Optimization Techniques.pdf
 
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
Stochastic Frank-Wolfe for Constrained Finite Sum Minimization @ Montreal Opt...
 
Lecture note4coordinatedescent
Lecture note4coordinatedescentLecture note4coordinatedescent
Lecture note4coordinatedescent
 
Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...Batch mode reinforcement learning based on the synthesis of artificial trajec...
Batch mode reinforcement learning based on the synthesis of artificial trajec...
 
A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...A walk through the intersection between machine learning and mechanistic mode...
A walk through the intersection between machine learning and mechanistic mode...
 

Último

SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfrahulyadav957181
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfsimulationsindia
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 

Último (20)

SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Rithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdfRithik Kumar Singh codealpha pythohn.pdf
Rithik Kumar Singh codealpha pythohn.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdfWorld Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
World Economic Forum Metaverse Ecosystem By Utpal Chakraborty.pdf
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 

Matrix Factorizations for Recommender Systems Explained

  • 1. Matrix Factorizations for Recommender Systems Dmitriy Selivanov selivanov.dmitriy@gmail.com 2017-11-16
  • 2. Recommender systems are everywhere Figure 1:
  • 3. Recommender systems are everywhere Figure 2:
  • 4. Recommender systems are everywhere Figure 3:
  • 5. Recommender systems are everywhere Figure 4:
  • 6. Goals Propose “relevant” items to customers Retention Exploration Up-sale Personalized offers recommended items for a customer given history of activities (transactions, browsing history, favourites) Similar items substitutions bundles - frequently bought together . . .
  • 7. Live demo Dataset - LastFM-360K: 360k users 160k artists 17M observations sparsity - 0.9999999
  • 8. Explicit feedback Ratings, likes/dislikes, purchases: cleaner data smaller hard to collect RMSE2 = 1 D u,i∈D (rui − ˆrui )2
  • 9. Netflix prize ~ 480k users, 18k movies, 100m ratings sparsity ~ 90% goal is to reduce RMSE by 10% - from 0.9514 to 0.8563
  • 10. Implicit feedback noisy feedback (click, likes, purchases, search, . . . ) much easier to collect wider user/item coverage usually sparsity > 99.9% One-Class Collaborative Filtering observed entries are positive preferences should have high confidence missed entries in matrix are mix of negative preferences and positive preferences consider them as negative with low confidence we cannot really distinguish that user did not click a banner because of a lack of interest or lack of awareness
  • 11. Evaluation Recap: we only care about how to produce small set of highly relevant items. RMSE is bad metrics - very weak connection to business goals. Only interested about relevance precision of retreived items: space on the screen is limited only order matters - most relevant items should be in top
  • 12. Ranking - Mean average precision AveragePrecision = n k=1 (P(k)×rel(k)) number of relevant documents ## index relevant precision_at_k ## 1: 1 0 0.0000000 ## 2: 2 0 0.0000000 ## 3: 3 1 0.3333333 ## 4: 4 0 0.2500000 ## 5: 5 0 0.2000000 map@5 = 0.1566667
  • 13. Ranking - Normalized Discounted Cumulative Gain Intuition is the same as for MAP@K, but also takes into account value of relevance: DCGp = p i=1 2reli − 1 log2(i + 1) nDCGp = DCGp IDCGp IDCGp = |REL| i=1 2reli − 1 log2(i + 1)
  • 14. Approaches Content based good for cold start not personalized Collaborative filtering vanilla collaborative fitlering matrix factorizations . . . Hybrid and context aware recommender systems best of two worlds
  • 15. Focus today WRMF (Weighted Regularized Matrix Factorization) - Collaborative Filtering for Implicit Feedback Datasets (2008) efficient learning with accelerated approximate Alternating Least Squares inference time Linear-FLow - Practical Linear Models for Large-Scale One-Class Collaborative Filtering (2016) efficient truncated SVD cheap cross-validation with full path regularization
  • 16. Matrix Factorizations Users can be described by small number of latent factors puk Items can be described by small number of latent factors qki
  • 18. Low rank matrix factorization R = P × Q factors users items factors
  • 20. Truncated SVD Take k largest singular values: X ≈ UkDkV T k - Xk ∈ Rm∗n - Uk, V - columns are orthonormal bases (dot product of any 2 columns is zero, unit norm) - Dk - matrix with singular values on diagonal Truncated SVD is the best rank k approximation of the matrix X in terms of Frobenius norm: ||X − UkDkV T k ||F P = Uk Dk Q = DkV T k
  • 21. Issue with truncated SVD for “explicit” feedback Optimal in terms of Frobenius norm - takes into account zeros in ratings - RMSE = 1 users × items u∈users,i∈items (rui − ˆrui )2 Overfits data Objective = error only in “observed” ratings: RMSE = 1 Observed u,i∈Observed (rui − ˆrui )2
  • 22. SVD-like matrix factorization with ALS J = u,i∈Observed (rui − pu × qi )2 + λ(||Q2 || + ||P2 ||) Given Q fixed solve for p: min i∈Observed (ri − qi × P)2 + λ u j=1 p2 j Given P fixed solve for q: min u∈Observed (ru − pu × Q)2 + λ i j=1 q2 j Ridge regression: P = (QT Q + λI)−1QT r, Q = (PT P + λI)−1PT r
  • 23. “Collaborative Filtering for Implicit Feedback Datasets” WRMF - Weighted Regularized Matrix Factorization “Default” approach Proposed in 2008, but still widely used in industry (even at youtube) several high-quality open-source implementations J = u,i Cui (Pui − XuYi )2 + λ(||X||F + ||Y ||F ) Preferences - binary Pij = 1 if Rij > 0 0 otherwise Confidence - Cui = 1 + f (Rui )
  • 24. Alternating Least Squares for implicit feedback For fixed Y : dL/dxu = −2 i=item cui (pui − xT u yi )yi + 2λxu = −2 i=item cui (pui − yT i xu)yi + 2λxu = −2Y T Cu p(u) + 2Y T Cu Yxu + 2λxu Setting dL/dxu = 0 for optimal solution gives us (Y T CuY + λI)xu = Y T Cup(u) xu can be obtained by solving system of linear equations: xu = solve(Y T Cu Y + λI, Y T Cu p(u))
  • 25. Alternating Least Squares for implicit feedback Similarly for fixed X: dL/dyi = −2XT Ci p(i) + 2XT Ci Yyi + 2λyi yi = solve(XT Ci X + λI, XT Ci p(i)) Another optimization: XT Ci X = XT X + XT (Ci − I)X Y T CuY = Y T Y + Y T (Cu − I)Y XT X and Y T Y can be precomputed
  • 26. Accelerated Approximate Alternating Least Squares yi = solve(XT Ci X + λI, XT Ci p(i)) Iterative methods Conjugate Gradient Coordinate Descend Fixed number of steps of (usually 3-4 is enough):
  • 27. Inference time How to make recommendations for new users? There are no user embeddings since users are not in original matrix!
  • 28. Inference time Make one step on ALS with fixed item embeddings matrix => get new user embeddings: given Y fixed, Cnew - new user-item interactions confidence xunew = solve(Y T Cunew Y + λI, Y T Cunew p(unew )) scores = Xnew Y T
  • 29. WRMF Implementations python implicit - implemets Conjugate Gradient. With GPU support recently! R reco - implemets Conjugate Gradient Spark ALS Quora qmf Google tensorflow *titles are clickable
  • 30. Linear-Flow Idea is to learn item-item similarity matrix W from the data. First min J = ||X − XWk||F + λ||Wk||F With constraint: rank(W ) ≤ k
  • 31. Linear-Flow observations 1. Whithout L2 regularization optimal solution is Wk = QkQT k where SVDk(X) = PkΣkQT k 2. Whithout rank(W ) ≤ k optimal solution is just solution for ridge regression: W = (XT X + λI)−1XT X - infeasible.
  • 32. Linear-Flow reparametrization SVDk(X) = PkΣkQT k Let W = QkY : argmin(Y ) : ||X − XQkY ||F + λ||QkY ||F Motivation λ = 0 => W = QkQT k and also soliton for current problem Y = QT k
  • 33. Linear-Flow closed-form solution Notice that if Qk orthogogal then ||QkY ||F = ||Y ||F Solve ||X − XQkY ||F + λ||Y ||F Simple ridge regression with close form solution Y = (QT k XT XQk + λI)−1 QT k XT X Very cheap inversion of the matrix of rank k!
  • 34. Linear-Flow hassle-free cross-validation Y = (QT k XT XQk + λI)−1 QT k XT X How to find lamda with cross-validation? pre-compute Z = QT k XT X so Y = (ZQk + λI)−1Z - pre-compute ZQk notice that value of lambda affects only diagonal of ZQk generate sequence of lambda (say of length 50) based on min/max diagonal values solving 50 rigde regression of a small rank is super-fast
  • 36. Suggestions start simple - SVD, WRMF design proper cross-validation - both objective and data split think about how to incorporate business logic (for example how to exclude something) use single machine implementations think about inference time don’t waste time with libraries/articles/blogposts wich demonstrate MF with dense matrices