6. But first...
About Telefonica and Telefonica R&D
 Â
7. Telefonica is a fast-growing Telecom
1989 2000 2008
Clients About 12 About 68 About 260
million million million
subscribers customers customers
Services Basic Wireline and mobile Integrated ICT
telephone and voice, data and solutions for all
data services Internet services customers
Geographies
Operations in Operations in
Spain 25 countries
16 countries
Staff
About 71,000 About 149,000 About 257,000
professionals professionals professionals
Finances Rev: 4,273 M⏠Rev: 28,485 M⏠Rev: 57,946 MâŹ
EPS(1): 0.45 ⏠EPS(1): 0.67 ⏠EPS: 1.63 âŹ
 (1) EPS: Earnings per share Â
8. Currently among the largest in the world
Telco sector worldwide ranking by market cap (US$ bn)
Source: Bloomberg, 06/12/09
 Â
9. Telefonica R&D (TID) is the Research and Development Unit
of the TelefĂłnica Group
MISSION
âTo contribute to the
improvement of the
n Founded in 1988 TelefĂłnica Groupâs
n Largest private R&D center in Spain competitivness through
n More than 1100 professionals technological innovationâ
n Five centers in Spain and two in Latin America
TelefĂłnica was in 2008 the first Spanish company by R&D Investment and the
third in the EU
Applied R&D
research 61 M⏠R&D
Products / Services / Processes 594 MâŹ
development 4.384 MâŹ
Technological Innovation
(1)
 Â
10. Internet Scientific Areas
Content Distribution and P2P Wireless and Mobile Systems Social Networks
Next generation Managed Wireless bundling Information Propagation
P2P-TV
Device2Device Content Social Search Engines
Future Internet: Content Distribution
Networking Infrastructure for Social
Large Scale mobile data based cloud computing
Delay Tolerant Bulk analysis
Distribution
Network Transparency
 Â
11. Multimedia Scientific Areas
Multimedia Mobile and Ubicomp HCC
Core
Multimedia Data Context Multimodal User
Analysis, Search Awareness Interfaces
& Retrieval
Urban Computing Expression, Gesture,
Video, Audio,
Emotion Recognition
Image, Music, Mobile Multimedia
Text, Sensor Data & Search Personalization &
Recommendation
Understanding, Wearable Systems
Summarization, Physiological
Visualization Monitoring Super Telepresence
 Â
12. Data Mining & User Modeling
Areas
SOCIAL NETWORK ANALYSYS & BUSINESS INT.
-
Analytical CRM
-
Trend-spotting, service propagation & churn
-
Social Graph Analysis (construction, dynamics)
USER MODELING
-
Application to new services (technology for development)
-
Cognitive, socio-cultural, and contextual modeling
-
Behavioral user modeling (service-use patterns)
DATA MINING
Integration of statistical & knowledge-based techniques
-
- Stream mining
 Large scale & distributed machine learning
-
Â
13. Index
Now seriously,
this is where the index should go!
 Â
15. The Age of Search has come
to an end
... long live the Age of Recommendation!
Chris Anderson in âThe Long Tailâ
âWe are leaving the age of information and entering the age
of recommendationâ
CNN Money, âThe race to create a 'smart' Googleâ:
âThe Web, they say, is leaving the era of search and entering
one of discovery. What's the difference? Search is what you
do when you're looking for something. Discovery is when
something wonderful that you didn't know existed, or didn't
know how to ask for, finds you.â
 Â
17. The value of
recommendations
Netflix: 2/3 of the movies rented are
recommended
Google News: recommendations generate
38% more clickthrough
Amazon: 35% sales from recommendations
Choicestream: 28% of the people would buy
more music if they found what they liked.
u
 Â
18. The âRecommender problemâ
Estimate a utility function that is able to
automatically predict how much a user will like
an item that is unknown for her. Based on:
Past behavior
Relations to other users
Item similarity
Context
...
 Â
19. The âRecommender problemâ
Let C be a large set of all users and let S be a large set of
all possible items that can be recommended (e.g books,
movies, or restaurants).
Let u be a utility function that measures the usefulness of
item s to user c, i.e., u : C X SâR, where R is a totally
ordered set. Then, for each user c Ń C, we want to choose
such item sâ Ń S that maximizes u.
Utility of an item is usually represented by rating but can
also can be an arbitrary function, including a profit function.
 Â
20. Approaches to Recommendation
Collaborative Filtering
Recommend items based only on the users past behavior
User-based
Find similar users to me and recommend what they liked
Item-based
Find similar items to those that I have previously liked
Content-based
Recommend based on features inherent to the items
Social recommendations (trust-based)
 Â
22. The Netflix Prize
î 500K users x 17K movie
titles = 100M ratings = $1M
(if you âonlyâ improve
existing system by 10%!
From 0.95 to 0.85 RMSE)
î 49K contestants on 40K teams from
184 countries.
î 41K valid submissions from 5K
teams; 64 submissions per day
î Wining approach uses hundreds of
predictors from several teams
î Is this general?
 Â
î Why did it take so long?
23. What works
It depends on the domain and particular problem
However, in the general case it has been demonstrated that
(currently) the best isolated approach is CF.
Item-based in general more efficient and better but mixing CF
approaches can improve result
Other approaches can be hybridized to improve results in specific
cases (cold-start problem...)
What matters:
Data preprocessing: outlier removal, denoising, removal of global
effects (e.g. individual user's average)
âSmartâ dimensionality reduction using MF such as SVD
Combining classifiers
 Â
24. I like it... I like it not
Evaluating User Ratings Noise in
Recommender Systems
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver
Telefonica Research
 Â
28. Natural Noise Limits our User Model
DIDÂ YOUÂ HEARÂ WHATÂ
IÂ LIKE??!!
...and Our Prediction Accuracy
 Â
29. The Magic Barrier
î Magic Barrier = Limit on prediction accuracy
due to noise in original data
î Natural Noise = involuntary noise introduced by
users when giving feedback
î Due to (a) mistakes, and (b) lack of resolution in
personal rating scale (e.g. In a 1 to 5 scale a 2 may mean the
same than a 3 for some users and some items).
î Magic Barrier >= Natural Noise Threshold
î We cannot predict with less error than the
resolution in the original data
 Â
30. Our related research questions
î Q1. Are users inconsistent when providing
explicit feedback to Recommender Systems via
the common Rating procedure?
î Q2. How large is the prediction error due to
these inconsistencies?
î Q3. What factors affect user inconsistencies?
 Â
31. Experimental Setup (I)
î Test-retest procedure: you need at least 3 trials
to separate
î Reliability: how much you can trust the instrument
you are using (i.e. ratings)
î r = r12 r23 /r13
î Stability: drift in user opinion
î s12 =r13 /r23 ; s23 =r13 /r12 ; s13 =r13 ²/r12 r23
î Users rated movies in 3 trials
î Trial 1 <-> 24 h <-> Trial 2 <-> 15 days <-> Trial 3
 Â
32. Experimental Setup (II)
î 100 Movies selected from Netflix dataset doing
a stratified random sampling on popularity
î Ratings on a 1 to 5 star scale
î Special ânot seenâ symbol.
î Trial 1 and 3 = random order; trial 2 = ordered
by popularity
î 118 participants
 Â
34. Comparison to Netflix Data
î Distribution of number of ratings per movie very
similar to Netflix but average rating is lower
(users are not voluntarily choosing what to rate)
 Â
35. Test-retest Reliability and Stability
î Overall reliability = 0.924 (good reliabilities are
expected to be > 0.9)
î Removing mild ratings yields higher reliabilities,
while removing extreme ratings yields lower
î Stabilities: s12 = 0.973, s23 = 0.977, and s13 =
0.951
î Stabilities might also be accounting for âlearning
effectâ (note s12<s23)
 Â
36. Users are Inconsistent
â What is the probability of making an inconsistencyÂ
given an original rating
 Â
37. Users are Inconsistent
Mild ratings areÂ
noisier
â What is the percentage of inconsistencies given anÂ
original rating
 Â
38. Users are Inconsistent
NegativeÂ
ratings areÂ
noisier
â What is the percentage of inconsistencies given anÂ
original rating
 Â
46. Let's recap
î Users are inconsistent
î Inconsistencies can depend on many things
including how the items are presented
î Inconsistencies produce natural noise
î Natural noise reduces our prediction accuracy
independently of the algorithm
 Â
47. Item order effect
î R1 is the trial with most inconsistencies
î R3 has less, but not when excluding ânot seenâ
(learning effect improves ânot seenâ discrimination)
î R2 minimizes inconsistencies because of order
(reducing âcontrast effectâ).
 Â
48. User Rating Speed Effect
î Evaluation time decreases as survey progresses in R1
and R3 (users losing attention but also learning)
î In R2 evaluation time starts decreasing until users find
segment of âpopularâ movies
î Rating speed is not correlated with inconsistencies
 Â
50. Different proposals
î In order to deal with noise in user feedback we
have so far proposed 3 different approaches:
1. Denoise user feedback by using a re-rating
approach (Recsys09)
2. Instead of regular users, take feedback from
experts, which we expect to be less noisy
(SIGIR09)
3. Combine ensembles of datasets to identify which
works better for each user (IJCAI09)
 Â
51. Rate it Again
Rate it Again
Increasing Recommendation Accuracy
by User re-Rating
Xavier Amatriain (with J.M. Pujol, N. Tintarev, N. Oliver)
Telefonica Research
 Â
52. Rate it again
î By asking users to rate items again we can
remove noise in the dataset
î Improvements of up to 14% in accuracy!
î Because we don't want all users to re-rate all
items we design ways to do partial denoising
î Data-dependent: only denoise extreme ratings
î User-dependent: detect ânoisyâ users
 Â
53. Algorithm
î Given a rating dataset where (some) items
have been re-rated,
î Two fairness conditions:
1. Algorithm should remove as few ratings as
possible (i.e. only when there is some certainty that
the rating is only adding noise)
2.Algorithm should not make up new ratings but
decide on which of the existing ones are valid.
 Â
54. Algorithm
î One source re-rating case:
î Given the following milding function:
 Â
57. The Wisdom of the Few
A Collaborative Filtering Approach Based on
Expert Opinions from the Web
Xavier Amatriain (@xamat), Josep M. Pujol, Nuria Oliver
Telefonica Research (Barcelona)
Neal Lathia
UCL (London)
 Â
58. Crowds are not always wise
î Collaborative filtering is the preferred approach
for Recommender Systems
î Recommendations are drawn from your past
behavior and that of similar users in the system
î Standard CF approach:
î Find your Neighbors from the set of other users
î Recommend things that your Neighbors liked and you
have not âseenâ
î Problem: predictions are based on a large
dataset that is sparse and noisy
 Â
59. Overview of the Approach
î expert = individual that we can trust to have produced
thoughtful, consistent and reliable evaluations (ratings) of
items in a given domain
î Expert-based Collaborative Filtering
î Find neighbors from a reduced set of experts instead of
regular users.
1. Identify domain experts with reliable ratings
2. For each user, compute âexpert neighborsâ
3. Compute recommendations similar to standard kNN CF
 Â
60. Advantages of the Approach
î Noise î Cold Start problem
î Experts introduce less î Experts rate items as
natural noise soon as they are
î Malicious Ratings available
î Dataset can be monitored
î Scalability
to avoid shilling î Dataset is several order of
î Data Sparsity magnitudes smaller
î Reduced set of domain
î Privacy
experts can be motivated î Recommendations can be
to rate items computed locally
 Â
61. Mining the Web for Expert Ratings
î Collections of expert
ratings can be obtained
almost directly on the web:
we crawled the Rotten
Tomatoes movie critics
mash-up
î Only those (169) with
more than 250 ratings in
the Neflix dataset were
used
 Â
62. Dataset Analysis. Summary
î Experts...
î are much less sparse
î rate movies all over the rating scale instead of
being biased towards rating only âgoodâ movies
(different incentives).
î but, they seem to consistently agree on the good
movies.
î have a lower overall standard deviation per movie:
they tend to agree more than regular users.
î tend to deviate less from their personal average
rating.
 Â
63. Evaluation Procedure
î Use the 169 experts to predict ratings from
10.000 users sampled from the Netflix dataset
î Prediction MAE using a 80-20 holdout
procedure (5-fold cross-validation)
î Top-N precision by classifying items as being
ârecommendableâ given a threshold
î Results show Expert CF to behave similar to
standard CF
î But... we have a user study backing up the
approach
 Â
64. User Study
î 57 participants, only 14.5 ratings/participant
î 50% of the users consider Expert-based CF to be
good or very good
î Expert-based CF: only algorithm with an average
rating over 3 (on a 0-4 scale)
 Â
65. Current Work
î Music recommendations
(using metacritics.com),
mobile geo-located
recommendations...
 Â
66. Adaptive Data Sources
Collaborative Filtering With Adaptive
Information Sources
(ITWP @ IJCAI)
With Neal Lathia
UCL (London)
 Â
67. Adaptive data sources
like-
minded?
similarity friends?
trust
user modeling experts?
reputation
 Â
68. Adaptive Data sources
î Given
î a simple, un-tuned, kNN predictor and multiple
information sources
î A problem
î users are subjective, accuracy varies with source
î A promise
î optimal classification of users to best source
produces incredibly accurate predictions
 Â
70. Conclusions
î For many applications such as Recommender
Systems (but also Search, Advertising, and
even Networks) understanding data and users
is vital
î Algorithms can only be as good as the data
they use as input
î Importance of User/Data Mining is going to be a
growing trend in many areas in the coming
years
 Â
71. Thanks!
Questions?
Xavier Amatriain
xar@tid.es
xavier.amatriain.net
technocalifornia.blogspot.com
twitter.com/xamat
 Â