The content of this presentation is based on:
Chapter 1, 2 and 4 of the following book: Owen, Anil, Dunning, Friedman. Mahout in Action. Shelter Island, NY: Manning Publications Co., 2012.
Chapter “Discussion of Similarity Metrics” of the following publication: Shanley Philip. Data Mining Portfolio.
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
What are product recommendations, and how do they work?
1. CC 2.0 by Horia Varlan | http://flic.kr/p/7vjmof
2. Septem
ber 1,
2012
• What are Product Recommenders 2
• Introducing Recommenders
• A Simple Example
• Recommender Evaluation
• How do they work?
• Machine learning tool – Apache
Mahout
Namics Conference 2012
Agenda
3. Septem
ber 1,
2012
• Spin-off of MeMo News AG, the 3
leading provider for Social Media
Monitoring & Analytics in Switzerland
• Big Data expert, focused on Hadoop,
HBase and Solr
• Objective: Transforming data into
insights
Intro
About Sentric
5. Septem
ber 1,
2012
• Each day we form opinions about 5
things we like, don’t like, and don’t
even care about.
• People tend to like things …
• that similar people like
• that are similar to other things they like
• These patterns can be used to predict
such likes and dislikes.
Introducing Recommenders
The Patterns
6. Septem
ber 1,
2012
user-based – Look to what people with 6
similar tastes seem to like
Example:
Introducing Recommenders
Strategies for Discovering New Things
7. Septem
ber 1,
2012
item-based – Figure out what items are 7
like the ones you already like (again by looking to
others’ apparent preferences)
Example:
Introducing Recommenders
Strategies for Discovering New Things
8. Septem
ber 1,
2012
content-based – Suggest items based on 8
Septem
particular attribute (again by looking to others’ apparent
ber 1,
2012
preferences)
Example:
Introducing Recommenders
Strategies for Discovering New Things
9. Septem
ber 1,
2012
9
Collaborative Filtering –
Item-based
Producing recommendations
based on, and only based
on, knowledge of users’ User-based Content-based
relationships to items.
Recommenders
Recommendation is all about predicting
patterns of taste, and using them to
discover new and desirable things you
didn’t already know about.
Introducing Recommenders
The Definition of Recommendation
10. CC 2.0 by Will Scullin | http://flic.kr/p/6K9jb8
11. Septem
ber 1,
2012
• Let’s start with a simple example 11
Create
Input
Create
a
Analyse
the
Data
Recommender
Output
A Simple user-based Example
The Workflow
12. Septem
ber 1,
2012
• Recommendations will 1,101,5.0 12
1,102,3.0
base on input-data User 1 has a
preference 3.0 1,103,2.5
for item 102 2,101,2.0
• Data takes the form of 2,102,2.5
preferences –associations 2,103,5.0
2,104,2.0
from users to items 3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
Example: 4,101,5.0
4,103,3.0"
4,104,4.5"
These values might be ratings 4,106,4.0"
on a scale of 1 to 5, where 1 5,101,4.0"
5,102,3.0"
indicates items the user can’t 5,103,2.0"
5,104,4.0"
stand, and 5 indicates 5,105,3.5"
favorites. 5,106,4.0 "
A Simple user-based Example
Input Data
13. Septem
ber 1,
2012
• Trend visualization for positive users 1,101,5.0 13
1,102,3.0
preferences (in petrol) 1,103,2.5
2,101,2.0
2,102,2.5
1 5 3 2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
101 102 103 104 105 106 107 3,107,5.0
4,101,5.0
4,103,3.0"
4,104,4.5"
4,106,4.0"
5,101,4.0"
2 4 5,102,3.0"
5,103,2.0"
5,104,4.0"
• All other preferences are recognized as 5,105,3.5"
negative – the user doesn’t seem to like the 5,106,4.0 "
item that much (red, dotted)
A Simple user-based Example
Trend Visualization
14. Septem
ber 1,
2012
Users 1 and 5 seem to have similar tastes. 14
Both like 101, like 102 a little less, and like 103 less still
1 5
101 102 103 104 105 106 107
Users 1 and 4 seem to
have similar tastes. Both
2 4
seem to like 101 and 103
identically
Users 1 and 2 have tastes that seem
to run counter to each other
A Simple user-based Example
Trend Visualization
15. Septem
ber 1,
2012
So what product might be recommended to 15
user 1?
1 5 3
101 102 103 104 105 106 107
2 4
Obviously not 101, 102 or 103. User 1 already knows about these.
A Simple user-based Example
Analyzing the Output
16. Septem
ber 1,
2012
The output could be: [item:104, value:4.257081]" 16
The recommender engine did so because it
estimated user 1’s preference for 104 to be
about 4.3, and that was the highest among all
the items eligible for recommendation.
Questions:
• Is this the best recommendation for user 1?
• What exactly is a good recommendation?
A Simple user-based Example
Analyzing the Output
18. Septem
ber 1,
2012
Goal: 18
Evaluate how closely the estimated
preferences match the actual preferences.
How?
Produce Compare
estimate estimates with
Reasonable 30% for test
Prepare Split Run preferences Analyse test data à
data set
70 % for training
with training Calculate a
data score
Experiment with other recommenders
A Simple user-based Example
Evaluating a Recommender
19. Septem
ber 1,
2012
Example evaluation output for a 19
particular recommender engine
Item 1 Item 2 Item 3
Actual 3.0 5.0 4.0
Estimate 3.5 2.0 5.0
Difference 0.5 3.0 1.0
Average distance = (0.5+3.0+1.0)/3=1.5
Root-mean-square =√((0.52+3.02+1.02)/3)=1.8484
Note: A score of 0.0 would mean perfect estimation
A Simple user-based Example
Evaluating a Recommender
20. CC 2.0 by amtrak_russ | http://flic.kr/p/6fAPej
21. Septem
ber 1,
2012
• Mahout … 21
• Open-source machine learning library from
Apache (Java)
• Can be used for large data collections – it’s
scalable, build upon Apache Hadoop
• Implements algorithms such as
Classification, Recommenders, Clustering
• Incubates a number of techniques and
algorithms
• ML it’s a hype! But …
In a Nutshell
Apache Mahout
22. Septem
ber 1,
2012
A Simple Recommender 22
class RecommenderExample {"
… main(String[] args) throws … {"
DataModel model = new FileDataModel(new File(“examle.csv")); "
UserSimilarity similarity = "
new PearsonCorrelationSimilarity(model);"
UserNeighborhood neighborhood = "
new NearestNUserNeighborhood(2, similarity, model);"
Recommender recommender = "
new GenericUserBasedRecommender(model, neighborhood, similarity);"
List<RecommendedItem> recommendations = recommender.recommend(1, 1);"
" for (RecommendedItem recommendation : recommendations) {"
System.out.println(recommendation);"
}"
}}"
A Simple user-based Example
Create a Recommender
23. Septem
ber 1,
2012
23
<<interface>>
UserSimilarity
<<interface>>
<<interface>>
ApplicaAon
Recommender
DataModel
<<interface>>
UserNeighborhood
A user-based Recommender
Component Interaction
24. Septem
ber 1,
2012
NearestNUserNeighborhood ThresholdUserNeighborhood 24
2
2
1
1
5
5
3
3
4
4
A neighborhood around user 1
is chosen to consist of the Defining a neighborhood of
three most similar users: 5, 4, most-similar users with a
and 2 similarity threshold
Algorithms
UserNeighborhood
25. Septem
ber 1,
2012
Implementations of this interface define a 25
notion of similarity between two users.
Implementations should return values in the
range -1.0 to 1.0, with 1.0 representing perfect
similarity.
<<interface>>
UserSimilarity"
EuclideanDistance PearsonCorrelation UncenteredCosine
Similarity" Similarity" Similarity"
LogLikelihood TanimotoCoefficient
..."
Similarity" Similarity"
Algorithms
User Similarity
26. Septem
ber 1,
2012
Similarity between data objects can be represented in 26
a variety of ways:
• Distance between data objects is sum of the
distances of each attribute of the data objects (i.e.
Euclidean Distance)
• Measuring how the attributes of both data objects
change with respect to the variation of the mean
value for the attributes (Pearson Correlation
coefficient)
• Using the word frequencies for each document, the
normalized dot product of the frequencies can be
used as a measure of similarity (cosine similarity)
• An a few more ..
Algorithms
User Similarity
27. Septem
ber 1,
2012
Similarity between 27
two data objects: 5
4
User 5 User 1
3
102
User 2
2
1
User 3 User 4
0
0 1 2 3 4 5
101
Mathematically & Plot
Euclidean Distance
28. Septem
ber 1,
2012
Similarity between 28
two data objects:
5
4.5
4 104 101
3.5
3 102
User 5
2.5
2 103
1.5
1
0.5
0
0 1 2 3 4 5
User 1
Mathematically & Plot
Pearson Correlation
29. Septem
ber 1,
2012
29
Questions?
Jean-Pierre König, jean-pierre.koenig@sentric.ch
Namics Conference 2012
Thank you!
30. Septem
ber 1,
2012
• References 30
The content of this presentation is based on:
• Chapter 1, 2 and 4 of the following book:
Owen, Anil, Dunning, Friedman. Mahout in
Action. Shelter Island, NY: Manning
Publications Co., 2012.
• Chapter “Discussion of Similarity Metrics” of
the following publication: Shanley Philip.
Data Mining Portfolio.
• Links
http://bitly.com/bundles/jpkoenig/1
A Simple user-based Example
Literatur & Links