Recsys 2014 Tutorial - The Recommender Problem Revisited
Rate it Again
1. X. Amatriain et. al
Rate It Again
Rate it Again
Increasing Recommendation Accuracy by
User reRating
Xavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)
Telefonica Research
Recsys 09
2. X. Amatriain et. al
Rate It Again
The Recommender Problem
● Two ways to address it
1. Improve the Algorithm
3. X. Amatriain et. al
Rate It Again
The Recommender Problem
● Two ways to address it
2. Improve the Input Data
Time for Data
Cleaning!
4. X. Amatriain et. al
Rate It Again
User Feedback is Noisy
● See our UMAP '09 Publication:
“I like it... I like it not” (Amatriain et al. '09)
5. X. Amatriain et. al
Rate It Again
Natural Noise Limits our User Model
DID YOU HEAR WHAT
I LIKE??!!
...and Our Prediction Accuracy
6. X. Amatriain et. al
Rate It Again
Experimental setup
● 118 participants rated movies in 3 trials
T1
(rand) <> 24 h <>T2
(pop.) <> 15 days <>T3
(rand)
● 100 Movies from Netflix dataset, stratified
random sampling on popularity
● Ratings on a 1 to 5 star scale with special “not
seen” symbol.
7. X. Amatriain et. al
Rate It Again
Users are Inconsistent
● What is the probability of making an inconsistency
given an original rating
8. X. Amatriain et. al
Rate It Again
Users are Inconsistent
● What is the percentage of inconsistencies given an
original rating
Mild ratings are
noisier
9. X. Amatriain et. al
Rate It Again
Users are Inconsistent
● What is the percentage of inconsistencies given an
original rating
Negative
ratings are
noisier
10. X. Amatriain et. al
Rate It Again
Prediction Accuracy
#Ti
#Tj
# RMSE
T1
, T2
2185 1961 1838 2308 0.573 0.707
T1
, T3
2185 1909 1774 2320 0.637 0.765
T2
, T3
1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
11. X. Amatriain et. al
Rate It Again
Prediction Accuracy
#Ti
#Tj
# RMSE
T1
, T2
2185 1961 1838 2308 0.573 0.707
T1
, T3
2185 1909 1774 2320 0.637 0.765
T2
, T3
1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
Max error in
trials that are
most distant in
time
12. X. Amatriain et. al
Rate It Again
Prediction Accuracy
#Ti
#Tj
# RMSE
T1
, T2
2185 1961 1838 2308 0.573 0.707
T1
, T3
2185 1909 1774 2320 0.637 0.765
T2
, T3
1969 1909 1730 2140 0.557 0.694
● Pairwise RMSE between trials considering
intersection and union of both sets
Significant less
error when 2nd
trial is involved
13. X. Amatriain et. al
Rate It Again
Algorithm Robustness to NN
Alg./Trial T1
T2
T3
Tworst
/Tbest
User
Average
1.2011 1.1469 1.1945 4.7%
Item
Average
1.0555 1.0361 1.0776 4%
Userbased
kNN
0.9990 0.9640 1.0171 5.5%
Itembased
kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms
when predicting each of the trials
14. X. Amatriain et. al
Rate It Again
Algorithm Robustness to NN
Alg./Trial T1
T2
T3
Tworst
/Tbest
User
Average
1.2011 1.1469 1.1945 4.7%
Item
Average
1.0555 1.0361 1.0776 4%
Userbased
kNN
0.9990 0.9640 1.0171 5.5%
Itembased
kNN
1.0429 1.0031 1.0417 4%
SVD 1.0244 0.9861 1.0285 4.3%
● RMSE for different Recommendation algorithms
when predicting each of the trials
Trial 2 is
consistently the
least noisy
15. X. Amatriain et. al
Rate It Again
Algorithm Robustness to NN (2)
TrainingTesting
Dataset
T1
-T2
T1
-T3
T2
-T3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms
when predicting ratings in one trial (testing) from
ratings on another (training)
16. X. Amatriain et. al
Rate It Again
Algorithm Robustness to NN (2)
TrainingTesting
Dataset
T1
-T2
T1
-T3
T2
-T3
User Average 1.1585 1.2095 1.2036
Movie Average 1.0305 1.0648 1.0637
Userbased kNN 0.9693 1.0143 1.0184
Itembased kNN 1.0009 1.0406 1.0590
SVD 0.9741 1.0491 1.0118
● RMSE for different Recommendation algorithms
when predicting ratings in one trial (testing) from
ratings on another (training)
Noise is minimized
when we predict
Trial 2
17. X. Amatriain et. al
Rate It Again
Let's recap
● Users are inconsistent
● Inconsistencies can depend on many things
including how the items are presented
● Inconsistencies produce natural noise
● Natural noise reduces our prediction accuracy
independently of the algorithm
18. X. Amatriain et. al
Rate It Again
Hypothesis
● If we can somehow reduce natural noise due to
user inconsistencies we could greatly
improve recommendation accuracy.
● We can reduce natural noise by taking
advantage of user inconsistencies when re
rating items.
19. X. Amatriain et. al
Rate It Again
Algorithm
● Given a rating dataset where (some) items
have been rerated,
● Two fairness conditions:
1. Algorithm should remove as few ratings as
possible (i.e. only when there is some
certainty that the rating is only adding noise)
2.Algorithm should not make up new ratings but
decide on which of the existing ones are
valid.
20. X. Amatriain et. al
Rate It Again
Algorithm
● One source rerating case:
● Given the following milding function:
21. X. Amatriain et. al
Rate It Again
Results
● Onesource rerating (Denoised Denoising)
⊚
T1
⊚T2
ΔT1
T1
⊚T3
ΔT1
T2
⊚T3
ΔT2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(
⊚ T2
, T3
) ΔT1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twosource rerating (Denoising T1
with the other 2)
22. X. Amatriain et. al
Rate It Again
Results
● Onesource rerating (Denoised Denoising)
⊚
T1
⊚T2
ΔT1
T1
⊚T3
ΔT1
T2
⊚T3
ΔT2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(
⊚ T2
, T3
) ΔT1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twosource rerating (Denoising T1
with the other 2)
Best results (above 10%!)
when denoising noisy trial
with less noisy
23. X. Amatriain et. al
Rate It Again
Results
● Oneway rerating (Denoised Denoising)
⊚
T1
⊚T2
ΔT1
T1
⊚T3
ΔT1
T2
⊚T3
ΔT2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(
⊚ T2
, T3
) ΔT1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twoway rerating (Denoising T1
with the other 2)
Smaller (yet important)
improvement when
denoising less noisy set
24. X. Amatriain et. al
Rate It Again
Results
● Oneway rerating (Denoised Denoising)
⊚
T1
⊚T2
ΔT1
T1
⊚T3
ΔT1
T2
⊚T3
ΔT2
Userbased kNN 0.8861 11.3% 0.8960 10.3% 0.8984 6.8%
SVD 0.9121 11.0% 0.9274 9.5% 0.9159 7.1%
Datasets T1
(
⊚ T2
, T3
) ΔT1
Userbased kNN 0.8647 13.4%
SVD 0.8800 14.1%
● Twoway rerating (Denoising T1
with the other 2)
Improvements
up to 14% with
2 reratings!
25. X. Amatriain et. al
Rate It Again
But...
● We can't expect all users to rerate all items
once or twice to improve accuracy!
● Need to devise methods to selectively choose
which ratings to denoise:
– Random selection
– Datadependent (select ratings based on values)
– Userdependent (select ratings based on how
“noisy” user is)
26. X. Amatriain et. al
Rate It Again
Random rerating
● Improvement in RMSE when doing oncesource (left) and
twosource (right) rerating as a function of the percentage
of randomlyselected denoised ratings (T1
⊚T3
)
27. X. Amatriain et. al
Rate It Again
Random rerating
● Improvement in RMSE when doing oncesource (left) and
twosource (right) rerating as a function of the percentage
of randomlyselected denoised ratings (T1
⊚T3
)
28. X. Amatriain et. al
Rate It Again
Denoise Extreme Ratings
● Improvement in RMSE when doing oncesource (left)
and twosource (right) rerating as a function of the
percentage of denoised ratings: selecting only extreme
29. X. Amatriain et. al
Rate It Again
Denoise Extreme Ratings
● Improvement in RMSE when doing oncesource (left)
and twosource (right) rerating as a function of the
percentage of denoised ratings: selecting only extreme
30. X. Amatriain et. al
Rate It Again
Denoise outliers
● Improvement in RMSE when doing oncesource (left) and two
source (right) rerating as a function of the percentage of denoised
ratings and users: selecting only noisy users and extreme ratings
31. X. Amatriain et. al
Rate It Again
Denoise outliers
● Improvement in RMSE when doing oncesource (left) and two
source (right) rerating as a function of the percentage of denoised
ratings and users: selecting only noisy users and extreme ratings
32. X. Amatriain et. al
Rate It Again
Value of Rating
● Is it worth to add new ratings or rerate existing items?
RMSE improvement as a function of new ratings added
in each case.
An extreme re
rating improves
RMSE 10 times
more than adding a
new rating!
33. X. Amatriain et. al
Rate It Again
Conclusions
● Improving data can be more beneficial than
improving the algorithm
● Natural noise limits the accuracy of Recommender
Systems
● We can reduce natural noise by asking users to rerate
items
● There are strategies to minimize the impact of the re
rating process
● The value of a rerate may be higher than that of a
new rating
34. X. Amatriain et. al
Rate It Again
Rate it Again
Increasing Recommendation Accuracy by
User reRating
Thanks!