2. 04/18/13 Data Mining: Principles and Algorithms
2
3. Outline
Motivation
Systems in Action
A Conceptual Framework
User-User Methods
Item-Item Methods
Recent Advances and Open Problems
04/18/13 Data Mining: Principles and Algorithms
3
4. Motivation
User Perspective
Lots of online products, books, movies, etc.
Reduce my choices…please…
Manager Perspective
“ if I have 3 million customers on the web, I should have
3 million stores on the web.”
CEO of Amazon.com [SCH01]
04/18/13 Data Mining: Principles and Algorithms
4
5. Example: Recommendation
Customers who bought this book also bought:
•Data Preparation for Data Mining: by Dorian Pyle (Author)
•The Elements of Statistical Learning: by T. Hastie, et al
•Data Mining: Introductory and Advanced Topics: by Margaret H. Dunham
•Mining the Web: Analysis of Hypertext and Semi Structured Data
04/18/13 Data Mining: Principles and Algorithms
5
7. Other Examples
Movielens: movies
Moviecritic: movies again
My launch: music
Gustos starrater: web pages
Jester: Jokes
TV Recommender: TV shows
Suggest 1.0 : different products
And much more…
04/18/13 Data Mining: Principles and Algorithms
7
8. How it Works?
Each user has a profile
Users rate items
Explicitly: score from 1..5
Implicitly: web usage mining
Time spent in viewing the item
Navigation path
Etc…
System does the rest, How?
This is what we will show today
04/18/13 Data Mining: Principles and Algorithms
8
9. Basic Approaches
Collaborative Filtering (CF)
Look at users collective behavior
Look at the active user history
Combine!
Content-based Filtering
Recommend items based on key-words
More appropriate for information retrieval
04/18/13 Data Mining: Principles and Algorithms
9
10. Collaborative Filtering: A
Framework
Items: I
i1 i2 … ij … in
u1 The task:
u2 3 1.5 …. … 2 Q1: Find Unknown ratings?
Q2: Which items should we
…
rij=? recommend to this user?
.
ui 2
.
... .
1
Users: U
um Unknown function
f: U x I→ R
04/18/13 Data Mining: Principles and Algorithms
10
12. User-User Similarity: Intuition
Target
Customer
Q3: How to combine?
Q1: How to measure
similarity?
Q2: How to select
neighbors?
04/18/13 Data Mining: Principles and Algorithms
12
13. How to Measure Similarity?
i1 in
Pearson correlation coefficient ui
∑Rated Itemsra )(rij − ri )
j∈ Commonly
(raj −
ua
w p ( a, i ) =
∑ (raj − ra ) 2
j∈Commonly Rated Items
∑ ( rij − ri ) 2
j∈Commonly Rated Items
Cosine measure
Users are vectors in product-dimension space
ra .ri
wc (a, i ) =
r a 2 * ri 2
04/18/13 Data Mining: Principles and Algorithms
13
14. Nearest Neighbor Approaches
[SAR00a]
Offline phase:
Do nothing…just store transactions
Online phase:
Identify highly similar users to the active one
Best K ones
All with a measure greater than a threshold
Prediction
∑ w(a, i)(r − r )
ij i
raj = ra + i
User a’s neutral
∑ w(a, i)
i
User i’s deviation
User a’s estimated deviation
04/18/13 Data Mining: Principles and Algorithms
14
15. Horting Method [ AGG99 ]
K-NN is not transitive
Horting takes advantage of transitivity
Uses new similarity measure: Predictability
User i predicts user a if
They have rated sufficiently common items
There is an error-bounded linear
transformation from user i’s ratings to a’s ones
04/18/13 Data Mining: Principles and Algorithms
15
16. How Horting Works?
Offline phase: build neighborhood graph
Online phase: Compute raj
1- Identify users who predict ua
2- Identify users who rated j
Ua 3- Find shortest paths from group1 to 2
4- Backward propagation and averaging
- Better for sparse environments
- Not well evaluated
04/18/13 Data Mining: Principles and Algorithms
16
17. Clustering [BRE98]
Offline phase:
Build clusters: k-mean, k-medoid, etc.
Online phase:
Identify the nearest cluster to the active user
Prediction:
Use the center of the cluster
Weighted average between cluster members
Weights depend on the active user
Faster Slower but a little
more accurate
04/18/13 Data Mining: Principles and Algorithms
17
18. Clustering vs. k-NN
Approaches
K-NN using Pearson measure is slower but more
accurate
Clustering is more scalable
Active user
Bad recommendations
We can use soft clustering but
will lose computational edge
04/18/13 Data Mining: Principles and Algorithms
18
19. Did We Answer the Questions?
Target
Customer
Q3: How to combine?
Q1: How to measure
similarity?
Q2: How to select
neighbors?
04/18/13 Data Mining: Principles and Algorithms
19
20. Are We Done?
Q1:How to measure similarity? Done... Really??
Sparsity results from the poor representation!
∑ ......
j∈ Commonly Rated Items
w p ( a, i ) = U1 rates recycled letter pads High
.....
U2 rates recycled memo pads High
Both of them like Recycled office products
They are similar but the math won’t work
for that
What about Sparsity?
Not enough common Items Example from [SAR00P]
implies spurious neighbors
and hence bad recommendations
By working at the right level of abstraction we
can eliminate sparsity
04/18/13 Data Mining: Principles and Algorithms
20
21. The Power of Representation [UNG98]
Action Foreign Classic
Q1-B: How can we formalize this intuition?
04/18/13 Data Mining: Principles and Algorithms
21
22. How to Abstract?
Semi-manual Methods
Use product features
Cluster products first, then cluster users
Works only if we have descriptive features
Automatic Methods
Adjusted Product Taxonomy
Latent Semantic Indexing
04/18/13 Data Mining: Principles and Algorithms
22
23. Adjusted Product Taxonomy [CHO04]
• Input : product taxonomy
•Output: modified taxonomy with even distribution
04/18/13 Data Mining: Principles and Algorithms
23
24. Adjusted Product Taxonomy (2)
Using
original
taxonomy
Number of transactions
having this category
Using
adjusted
taxonomy
04/18/13 Data Mining: Principles and Algorithms
24
25. Latent Semantic Indexing [SAR00b]
=
Sk I’
R R UUk S
k Ik’
mXn k
mXr rXr
k k k
rXn
The reconstructed matrix Rk = Uk.Sk.Ik’ is the closest
rank-k matrix to the original matrix R.
• Captures latent associations
• Reduced space is less-noisy
04/18/13 Data Mining: Principles and Algorithms
25
26. Are We Done? (2)
Not adequately
Q2:How to Select Neighbors? answered
We don’t expect to use the same neighbors
for all products
Neighbors should be product-category
specific
Q2-B. How can we determine whether or not a
user is relevant to a given product?
04/18/13 Data Mining: Principles and Algorithms
26
27. Selecting Relevant Instances
[YU01]
Superman and Batman and correlated
Predict this
Titanic and Batman are negatively correlated
“Dances with Wolves” has nothing to do with Batman’s rating
Karen is not a good instance to consider
How can we formalize this? Mutual Information
MI(X;Y) = H(X) – H(X|Y)
04/18/13 Data Mining: Principles and Algorithms
27
28. Selecting Relevant Instances (2)
Offline phase:
Estimate mutual information between items
For each item:
Find users who rated it
Compute their strength (how many relevant items
they also rated)
Retain subset of them (10% works fine)
Online phase:
To predict the target item’s rating, run k-NN on
its reduced instance space
Better results with less data… quality not quantity is what matter
04/18/13 Data Mining: Principles and Algorithms
28
29. Are We Done? (3)
Q3:How to combine?
Weighted average
Discover association rules in neighbors’ transactions
[LEE01, WAN04]
For every x in this group:
like(x, Item1) ^ like(x, Item2) like(x, Item3)
Use confidence and support to judge the quality of the
prediction
Prediction is done on the binary level (like, dislike)
Costly to run online
04/18/13 Data Mining: Principles and Algorithms
29
30. User-User Methods Evaluation
Achieve good quality in practice
The more processing we push offline, the better
the method scale
However:
User preference is dynamic
High update frequency of offline-calculated
information
No recommendation for new users
We don’t know much about them yet
04/18/13 Data Mining: Principles and Algorithms
30
32. Item-Item Similarity: The Intuition
Search for similarities among items
All computations can be done offline
Item-Item similarity is more stable that user-user
similarity
No need for frequent updates
First Order Models
Correlation Analysis
Linear Regression
Higher Order Models
Belief Network
Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
32
33. Correlation-based Methods [SAR01]
Same as in user-user similarity but on item vectors
Pearson correlation coefficient
Look for users who rated both items
i1 ii ij in
∑ (r uj − r )(rui − ri )
j
u1
sij = u∈ Users Rated Both Items
∑ (ruj − rj ) 2
u∈Users Rated Both Items
∑ (rui − ri ) 2
u∈Users Rated Both Items
um
04/18/13 Data Mining: Principles and Algorithms
33
34. Correlation-based Methods (2)
Offline phase:
Calculate n(n-1) similarity measures
For each item
Determine its k-most similar items
Online phase:
Predict rating for a given user-item pair as a
weighted sum over similar items that he rated
∑s r ij ai
raj = i∈similar items Ua 2 3 ? 4
∑s ij
i∈similar items
j
04/18/13 Data Mining: Principles and Algorithms
34
35. Regression Based Methods [VUC00]
Offline phase:
Fit n(n-1) linear regressions
F (x) is a linear transformation of a user rating on
ij
item i to his rating on item j
Online phase
Same as previous method
The weights are inversely proportional to the
regression error rates
∑ w f (r ij ij
i∈rated items by a
ai )
raj =
∑w ij
i∈rated items by a
04/18/13 Data Mining: Principles and Algorithms
35
36. Higher Order Models
Previous approaches used the Naïve Bayes
assumption
Item effects on a given one are independent
Not always true
Higher order models can do better
Belief Network
Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
36
37. Bayesian Belief Network: introduction
Bayesian belief network allows a subset of the variables to
be conditionally independent
A graphical model of causal relationships
Represents dependency among the variables
Gives a specification of joint probability distribution
Nodes: random variables
Links: dependency
X Y X,Y are the parents of Z, and Y is the
parent of P
Z No dependency between Z and P
P
Has no loops or cycles
04/18/13 Data Mining: Principles and Algorithms
37
38. Bayesian Belief Network: An Example
Family
Smoker
History
(FH, S) (FH, ~S) (~FH, S) (~FH, ~S)
LC 0.8 0.5 0.7 0.1
LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9
The conditional probability table
for the variable LungCancer:
PositiveXRay Dyspnea Shows the conditional probability
for each possible combination of its
parents n
Bayesian Belief Networks P ( z1,..., zn) = ∏ P ( z i | Parents ( Z i ))
i =1
04/18/13 Data Mining: Principles and Algorithms
38
39. Belief Network for CF [BRE98]
Every item is a node
Binary rating (like, dislike)
Learn offline a belief network over the training date
CPT table at each node is represented as a decision tree
Use greedy algorithms to determine the best network
structure
Use probabilistic inference for online prediction
04/18/13 Data Mining: Principles and Algorithms
39
40. Belief Network for CF: An Example
CPT
Friends B.H
M.P
Probability
decision tree for the random variable “Melrose Palace” in
the movie domain
04/18/13 Data Mining: Principles and Algorithms
40
41. Association Rule Mining
Offline processing
Work on the binary level (like, dislike)
View user as market basket containing items
liked by user
Discover association rules between items
Online processing:
Match items that the active user like with rules
left hand side
Recommend rules’ consequent based on
support and confidence
04/18/13 Data Mining: Principles and Algorithms
41
42. Association Rule Mining : Problems
High support threshold leads to low coverage and may
eliminate important, but infrequent items from
consideration
Low support thresholds result in very large model sizes,
computationally expensive offline pattern discovery phase
and slower online matching phase
Solution:
Adaptive Association Rule Mining
04/18/13 Data Mining: Principles and Algorithms
42
43. Adaptive Association Rule Mining [LIN01]
Given:
transaction dataset
Desired number
minConfidence
of rules
target item
desired range for number of
rules
specified minimum confidence
minSupport
Find: set S of association rules for target item such that
number of rules in S is in given range
rules in S satisfy minimum confidence constraint
rules in S have higher support than rules not in S that satisfy above
constraints
04/18/13 Data Mining: Principles and Algorithms
43
44. Adaptive Association Rule Mining (2)
Discover rules with one item on the head
Like (x, item1) ^ Like (x, item2) Like(x,
target)
The miner discovers association rules iteratively
(for each target item) until the desired number of
rules are extracted
Support is adjusted per-item
04/18/13 Data Mining: Principles and Algorithms
44
45. Item-Item Methods: Why It Works?
Like(x,Book1)^like(x,book2) Like(x,Movie1) like(x,Movie2)
like(x,book3)
Book1, Book2
Support Movie1 Support
Book Movie
gang gang
Without discovering the
We use the right neighbors for each groups themselves thus
item eliminating costly online
matching
In general better quality than user-user methods and better response time [LIN03]
04/18/13 Data Mining: Principles and Algorithms
45
46. Recent Work and Open Problems
Order-based methods
Ordering items is more informative than rating them
[KAM03] developed k-o’mean to work on orders
Preference-based methods
Total ordering of items is not feasible
Work on partial orders (preferences) [COH99]
Integrating background knowledge
User demographic information, item-features, etc..
Modeling time
Sequential patterns
04/18/13 Data Mining: Principles and Algorithms
46
47. References (1)
Charu C. Aggarwal, Joel L. Wolf, Kun-Lung Wu, Philip S. Yu: Horting Hatches
an Egg: A New Graph-Theoretic Approach to Collaborative Filtering. KDD
1999: 201-212
J. Breese, D. Heckerman, C. Kadie Empirical Analysis of Predictive
Algorithms for Collaborative Filtering. In Proc. 14th Conf. Uncertainty in
Artificial Intelligence, Madison, July 1998.
Yoon Ho Cho and Jae Kyeong Kim: Application of Web usage mining and
product taxonomy to collaborative recommendations in e-commerce. Expert
Systems with Applications, 26(2), 2003
William W. Cohen, Robert E. Schapire, and Yoram Singer. Learning to order
things. In Advances in Neural Processing Systems 10, Denver, CO, 1997
Jiawe Han, Fall 2003 online course notes available at:
http://www-courses.cs.uiuc.edu/~cs397han/slides/05.ppt
Toshihiro Kamishima: Nantonac collaborative filtering: recommendation
based on order responses. KDD 2003: 583-588
Lee, C.-H, Kim, Y.-H., Rhee, P.-K. Web personalization expert with combining
collaborative filtering and association rule mining technique. Expert Systems
with Applications, v 21, n 3, October, 2001, p 131-137
04/18/13 Data Mining: Principles and Algorithms
47
48. References (2)
W. Lin, 2001P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/LinAlvarezRuiz_W
ebKDD2000.ppt
Weiyang Lin, Sergio A. Alvarez, and Carolina Ruiz. Efficient adaptive-support
association rule mining for recommender systems. Data Mining and
Knowledge Discovery, 6:83--105, 2002
G. Linden, B. Smith, and J. York, "Amazon.com Recommendations Iemto
-item collaborative filtering", IEEE Internet Computing, Vo. 7, No. 1, pp. 7680,
Jan. 2003.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, John Riedl: Analysis
of recommendation algorithms for e-commerce. ACM Conf. Electronic
Commerce 2000: 158-167
B. Sarwar, G. Karypis, J. Konstan, and J. Riedl: Application of dimensionality
reduction in recommender systems--a case study. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
B. M. Sarwar, G. Karypis, J. A. Konstan, and J. Riedl. Item-based
collaborative filtering recommendation algorithms. WWW’01
04/18/13 Data Mining: Principles and Algorithms
48
49. References (3)
B. Sarwar, 2000P, online presentation available at: http://www.wiwi.hu-
berlin.de/~myra/WEBKDD2000/WEBKDD2000_ARCHIVE/badrul.ppt
J. Ben Schafer, Joseph A. Konstan, John Riedl: E-Commerce
Recommendation Applications. Data Mining and Knowledge Discovery 5(1/2):
115-153, 2001
L.H. Ungar and D.P. Foster: Clustering Methods for Collaborative Filtering,
AAAI Workshop on Recommendation Systems, 1998.
Yi-Fan Wang, Yu-Liang Chuang, Mei-Hua Hsu and Huan-Chao Keh: A
personalized recommender system for the cosmetic business. Expert
Systems with Applications, v 26, n 3, April, 2004 Pages 427-434
S. Vucetic and Z. Obradovic. A regression-based approach for scaling-up
personalized recommender systems in e-commerce. In ACM WebKDD 2000
Web Mining for E-Commerce Workshop, 2000.
Kai Yu, Xiaowei Xu, Martin Ester, and Hans-Peter Kriegel: Selecting relevant
instances for efficient accurate collaborative filtering. In Proceedings of the
10th CIKM, pages 239--246. ACM Press, 2001.
Cheng Zhai, Spring 2003 online course notes available at:
http://sifaka.cs.uiuc.edu/course/2003-497CXZ/loc/cf.ppt
04/18/13 Data Mining: Principles and Algorithms
49
50. 04/18/13 Data Mining: Principles and Algorithms
50