PhD defense

Mathematical Methods
of Tensor Factorization
Applied to Recommender Systems
Dott. Giuseppe Ricci
Scuola di Dottorato in Informatica
XXVI Ciclo
PhD Defense – 26 May 2014
Semantic
Web
Access and
Personalization
research group
http://www.di.uniba.it/~swap
Dipartimento
di Informatica
1

Outline
 Motivations and Contributions
 Information Overload & Recommender
Systems
 Matrix and Tensor Factorization in RS
literature
 Proposed solutions
 Experimental Evaluation
 Summary and Future Work
2

Motivations and Contributions 1/2
 Matrix Factorization (MF) techniques have proved to be
a quite promising solution to the problem of designing
efficient filtering algorithms in the Big Data Era.
 Several challenges in Recommender Systems (RS)
research area:
 missing values: data sparsity
 incorporating contextual information: CARS
 context relevance (weighting) in CARS.
This work focuses on CARS
Objective: to propose new methods to understand which
contextual information is relevant, and use this information to
improve the quality of the recommendations.
3

 Matrix and Tensor Factorization literature review.
 CP-WOPT algorithm  solution for sparsity of RS
data.
 CARS and context-weighting:
 2 proposed solutions to introduce only relevant
contextual information in recommendation
process
 empirical evaluation of the 2 solutions.
4
Motivations and Contributions 2/2

Information Overload
&
Recommender Systems
5

Information Overload
Source: www.go-globe.com
Surplus of content compared to user’s ability to find relevant
information  result is either you are late in making decisions,
or you make the wrong decisions.
“Information Overload” was used by the futurologist
Alvin Toffler in 1970, when he predicted that the rapidly
increasing amounts of information being produced would
eventually cause people problems.
6

Recommender Systems 1/2
 Recommender Systems (RS) represent a response to the
problem of Information Overload and are now a widely recognized
field of research [Ricci].
 RS fall in the area of information filtering. With the growing amount
of information available on the web, a very sensitive issue is to
develop methods that can effectively and efficiently handle large
amounts of data.
 Mathematical methods have been proved useful in dealing with this
problem recently in the context of the RS.
 The search for more effective and efficient methods than those
known in literature also guided by the interest in industrial research
in this field, as evidenced by the NetFlixPrize competition.
[Ricci] Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor,
editors. Recommender Systems Handbook. Springer, 2011.
7

Recommender Systems 2/2
 Usually rating is stored in a matrix called user-item matrix
or rating matrix.
 RS calculate a rating estimate for item/product not
purchased/tried  suggestion list with the highest rating
estimation.
8
2
5

Examples of RS
Applications:
• e-commerce
• advertising
• e-mail filtering
• social network
……
9

Basics of Recommender Systems
10

Recommender Systems: definitions
 The area of RSs is relatively new  mid-1990s.
 Concept: tools and techniques able to provide personalized
information access to large collections of structured and
unstructured data and to provide users with advices about
items they might be interested in.
Some definitions:
 [Olsson]: “RS is a system that helps a user to select a
suitable item among a set of selectable items using a
knowledge-base that can be hand-coded by experts or
learned from recommendations generated by the users”.
 [Burke]: “RS have the effect of guiding the user in a
personalized way to interesting or useful objects in a large
space of possible options”.
[Olsson] Tomas Olsson. Bootstrapping and Decentralizing Recommender Systems . PhD thesis,
Department of Information Technology, Uppsala University and SICS, 2003.
[Burke] R. Burke. Hybrid Recommender Systems: Survey and Experiments. User
Modeling and User-Adapted Interaction, 12(4):331–370, 2002.
11

RS Classification [Burke]
[Burke] Robin Burke. Hybrid recommender systems: Survey and experiments.
User Modeling and User-Adapted Interaction , 12(4):331–370, 2002.
Context Aware
Recommender Systems (CARS)
12

Content-Based RS (CBRS)
 Assumption: user preferences remain stable over time.
 They suggest items similar to those previously labeled as
relevant by the target user.
 Based on the analysis and exploitation of textual contents
since  each item to be recommended has to be described by
means of textual features.
 Needs 2 pieces of information: a textual description of the
item and a user profile describing user interests in terms of
textual features.
13

Collaborative Filtering RS
 Assumption: users that in the past shared similar
tastes will have similar tastes in the future as well
 nearest neighbors.
 Rely with a matrix where each user is mapped on a
row and each item is represented by a column 
user/item or rating matrix.
 A recent trend is to exploit matrix factorization
methods  A common technique applied in CFRS is
Singular Value Decomposition (SVD).
14

Hybrid Recommender Systems
 Combining 2 or more classes of algorithms in order to
emphasize their strengths and to level out their
corresponding weaknesses.
 For example, a collaborative system and a content-based
system might be combined to compensate the new user
problem, providing recommendations to users whose profiles
are too poor to trigger the collaborative recommendation
process.
 Burke proposed an analytical classification of hybrid
systems, listing a number of hybridization methods to
combine pairs of recommender algorithms. In [Burke] 7
different hybridization techniques are introduced.
[Burke] Robin Burke. The adaptive web. chapter HybridWeb Recommender Systems,
pages 377–408. Springer-Verlag, Berlin, Heidelberg, 2007.
15

Context
What is the context?
 One of the most cited definition of context is that of Dey [Dey] et al. that
defines context as:
”Any information that can be used to characterize the situation of an entity.
An entity is a person, place, or object that is considered relevant to the
interaction between a user and an application, including the user and the
applications themselves”.
Bazire and Brezillon [Bazire] examined and compared some 150 different
definitions of context from a number of different fields and concluded that
the multifaceted nature of the concept makes it difficult in find a unifying
definition.
Li [Li] et al. define 5 context dimensions: who (user), what (object), how
(activities), where (location) and when (time).
[Dey] Anind K. Dey. Understanding and using context. Personal Ubiquitous Comput.,5(1):4–7, 2001.
[Bazire] Mary Bazire and Patrick Brézillon. Understanding context before using it. In Proceedings
of the 5th International Conference on Modeling and Using Context ,CONTEXT’05, pages 29–40,
Berlin, Heidelberg, 2005. Springer-Verlag.
[Li] Luyi Li, Yanlin Zheng, Hiroaki Ogata, and Yoneo Yano. A framework of ubiquitous learning
environment. In CIT , pages 345–350. IEEE Computer Society, 2004.
16

 Context-Aware Recommender System (CARS)
take account of contextual factors,such as
available time, location, people nearby, etc.,
that identify the context where the product is
tried.
 We suppose these factors may have a
structure:
 for example "location" may be defined in
terms of home, public place, theatre,
cinema, etc.
Context Aware RS (CARS)
17

Challenges of a CARS are:
 relevance of contextual factors: it is important to
decide which contextual variables are relevant in the
recommendation process;
 availability of contextual information: relevant
contestual factors can be considered as a part of the
data collection but such historical contextual
information is often not available when designing the
system;
 extraction of contextual information from user’s
activities: these data need to be recorded;
 evaluation and lack of publicly available datasets.
Context Aware RS
18

 CARS incorporates users and items information as well as
other types of data such as context, using these to infer
unkonwn ratings:
f: Users x Items x Contexts  Rating
 CARS deals with a quadruple input:
<user, item, context, rating>
where the recommender records the preference of the user
from the selected item according to the context information
which tells you if the product is consumed by the user.
Context Aware RS
19

Paradigm to incoporate context
In a movie RS, if a user wants to see a film
one day during the holidays, only the
ratings assigned in holidays are used
Data are used in the estimation of the
ratings by a multidimensional function
or by a heuristic calculations to
incorporate contextual information in
addition to the user and item data
20
Pre-filtering
Post-filtering Contextual
Modeling

Context Weighting
 It is not always simple to provide what contextual information
is important for a specific scope.
 Many parameters - in different manners.
Not all acquired contextual information are important for
the recommendation process: some contextual variables can
introduce noise  degrade the quality of suggestions.
 For each user, what contextual information is helpful to give,
for more precise and reliable recommendations.
 PROBLEM: users may rate items in different contexts, but it
is not guaranteed that we can find dense contextual ratings
under the same context, i.e. there may be very few users
who have rated the items in the same contexts.
 Solutions: 2 branches: Context Selection (survey) and
Context Relaxation (binary selection).
21

Matrix
Factorization in RS literature
22

Background
 With the ever-increasing information available,the
challenge of implementing personalized filters has
become the challenge of designing algorithms able to
manage huge amounts of data for the elicitation of user
needs and preferences.
 Matrix Factorization techniques have proved to be a
quite promising solution.
 MF techniques fall into the class of CF methods, and,
particularly, in the class of latent factor models 
similarity between users and items is induced by some
factors hidden in the data.
 We will focus our attention on Singular Value
Decomposition (SVD).
23

Basics of MF
 U: set of users
 D: set of items
 R: the matrix of ratings.
 MF aims to factorize R into two matrices P and Q such that
their product approximates R:
A factorization used in RS literature is Singular Value Decomposition (SVD)
introduced by Simon Funk in the NetFlix Prize.
SVD-objective: reducing the dimensionality, i.e. the rank, of the user-item
matrix, in order to capture latent relationships between users and items.
24

SVD in RS Literature 1/2
 Sarwar:
 SVD based algorithm
 Low-rank approximation: retaining only k << r singular values (the
biggest) by discarding other entries.
 Koren:
 SVD based algorithm (Asymmetric-SVD, SVD++)
 Explicit and implicit feedback
 Baseline estimates.
 Julià:
 Alternation Algorithm
 An alternative to SVD
 The aim is the same as the one of SVD
 Alternation makes it possible to deal with missing.
25
user-factors vector pu
item-factors vector qi

Advantages:
 limited computational cost and good quality recommendations
(Sarwar)
 good algorithms and high accuracy (Koren)
 Alternation Algorithm deals with missing values and good
computational resources required (Julià).
Problems:
 technique not applicable on frequently updated database
(Sarwar)
 models are not justified by a formal model (previous ratings are
not explained) (Koren)
 r known values in each row/column (Julià).
[Sarwar] Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl Incremental Singular
Value Decomposition Algorithms for Highly Scalable Recommender Systems, 5th International
Conference on Computer and Information Technology (ICCIT), 2002
[Koren] Yehuda Koren Factorization Meets the Neighborhood: a Multifaceted Collaborative
Filtering Model, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD'08), 2008
[Julià] Carme Julià, Angel D Sappa, Felipe Lumbreras, Joan Serrat, Antonio López Predicting
Missing Ratings in Recommender Systems: Adapted Factorization Approach, in International
Journal of Electronic Commerce (2009)
26
SVD in RS Literature 2/2

Summary
• We analyzed MF technique
• We focused our attention on SVD techniques
• The main limitations of MF techniques:
• they take into account only the standard profile of the
users
• does not allow to integrate further information such as the
context.
27

Matrix 2 Tensor
 Matrix and MF can’t be used in a CARS based on a
contextual modeling paradigm:
 context information is used in the process of
recommendation and matrices are not adeguate for this
scope.
 We need to introduce tensors.
users
contexts
items
<user, item, context, rating>
28

Tensor Factorization:
HOSVD and PARAFAC
in RS literature
29

Tensors
 Tensors  higher-dimensional arrays of numbers, might be
exploited in order to include additional contextual
information in the recommendation process.
 In standard multivariate data analysis, data are arranged in a
2D structure, but for a wide variety of domains, more
appropriate structures are required for taking into account
more dimensions:
xijk i=1,..,I j=1,..,J k=1,..,K.
2 particular TF can be considered to be higher-order extensions
of matrix Singular Value Decomposition:
1. High Order Singular Value Decomposition (HOSVD)
which is a generalization of SVD for matrices;
2. PARallel FACtor analysis or CANonical
DECOMPosition (PARAFAC/CANDECOMP) higher-order form of
Principal Component Analysis.
30

HOSVD decomposes the initial tensor in N matrices (where
N is the size of the tensor) and a tensor whose size is
smaller than the original one (core tensor).
Tensor Factorization
31
In RS literature, the most frequently used technique for
tensor factorization is HOSVD.

HOSVD in RS Literature 1/2
 Baltrunas:
 Multiverse Recommendations algorithm
 HOSVD TF based algorithm
 data: users, movies, contextual information and user
ratings  3-order tensor.
 Rendle:
 RTF algorithm
 social tagging system
 Reconstructed tensor: measure the strength of
association between users, items and tags.
 Chen:
 CubeSVD  Personalized web search
 Hidden relationships <user, query, web pages>
 Output: < u, q, p, w>: w measures the popularity of page
p as a result of query q made by the user u.
32

33
HOSVD in RS Literature 2/2
Advantages:
 good algorithm with improvement of results (Baltrunas)
 good algorithm with improvement of results (Rendle)
 CubeSVD tested on MSN clickthrough gives good results
(Chen).
Problems:
 high computational cost (all)
 time consuming algorithm (Chen).
[Baltrunas] Alexandros Karatzoglou, Xavier Amatriain, Linas Baltrunas, and Nuria Oliver.
Multiverse recommendation: n-dimensional tensor factorization for context-aware
collaborative filtering. In Proceedings of the fourth ACM conference on Recommender
systems , RecSys ’10, pages 79–86, New York, NY, USA, 2010. ACM.
[Rendle] Steffen Rendle, Leandro Balby Marinho, Alexandros Nanopoulos, and Lars Schmidt-
Thieme. Learning optimal ranking with tensor factorization for tag recommendation. In KDD
, pages 727–736, 2009.
[Chen] Jian-Tao Sun, Hua-Jun Zeng, Huan Liu, Yuchang Lu, and Zheng Chen. Cubesvd: a
novel approach to personalized web search. In Proceedings of the 14th international
conference on World Wide Web , WWW’05, pages 382–390, New York, NY, USA, 2005. ACM.

PARAFAC (PARallel FACtor analysis)
PARAFAC (PARallel FACtor analysis) is a decomposition method.
The PARAFAC model was independently proposed by Harshman and by
Carroll and Chang.
A PARAFAC model of a 3D array is given by 3 loading matrices A, B, and
C with typical elements aif, bjf, and ckf.
34

HOSVD Vs PARAFAC
HOSVD:
• HOSVD is an extension of the SVD to higher order dimensions;
• is the ability of simultaneously taking into account more dimensions;
• better data modeling than standard SVD;
• dimension reduction can be performed not only in one dimension but
also separately for each dimension.
HOSVD:
• it is not an optimal tensor decomposition: HOSVD does not require an
iterative algorithms, but needs standard SVD computation only;
• it has not the truncation property of the SVD, where truncating the
first n singular values allows to find the best n-rank approximation
of a given matrix;
• HOSVD cannot deal with missing values, they are treated as 0;
• to prevent overfitting, HOSVD should use regularization.
35

PARAFAC:
• is faster than HOSVD: linear computation time in
comparison to HOSVD;
• does not collapse data, but retains its natural three-
dimensional structure;
• despite PARAFAC mode’s lack of ortogonalithy, Kruskal
showed that components are unique, up to permutation
and scaling, under mild conditions.
PARAFAC Vs HOSVD
36

PARAFAC in [Baltrunas12]
TFMAP  PARAFAC  top-N context-aware recommendations of mobile
applications. A tensor of 3 dimensions is factorized:
• users
• items
• context types.
Dimensions  3 factor matrices  calculate user m’s preference to item i
under context type k:
The authors introduced an optimization process using a gradient ascendent
to avoid overfitting.
[Baltrunas12] Yue Shi, Alexandros Karatzoglou, Linas Baltrunas, Martha Larson, Alan Hanjalic,
and Nuria Oliver. Tfmap: optimizing map for top-n context-aware recommendation. In Proceedings
of the 35th international ACM SIGIR conference on Research and development in information
retrieval , SIGIR ’12, pages 155–164, New York, NY, USA, 2012. ACM
37

Advantages:
 TFMAP tested on Appazar projecet dataset increase
MAE and Precision compared to other algorithms
 good scalability: the training time of TFMAP increases
almost linearly.
 Problems:
 TFMAP is tested only on 1 dataset
 Significance of results ??
PARAFAC in [Baltrunas12]
38

PARAFAC in [Acar]
[Acar] Evrim Acar, Daniel M. Dunlavy, Tamara G. Kolda, and Morten Mørup. Scalable tensor
factorizations with missing data. In SDM10: Proceedings of the 2010 SIAM International Conference on
Data Mining , pages 701–712, Philadelphia, April 2010. SIAM.
PARAFAC  goal: to capture the latent structure of the data via a
higher-order factorization, even in the presence of missing data.
The authors develop a scalable algorithm called CP-WOPT (CP
Weighted OPTimization).
Numerical experiments on simulated data sets  CP-WOPT can
successfully factor tensors with noise and up to 70% missing data.
39

CP-WOPT is tested on EEG dataset:
• it is not uncommon in EEG analysis that the signals from some
channels are ignored due to the malfunctioning of the electrodes
• the factors extracted by the CP-WOPT algorithm can capture brain
dynamics in EEG analysis even if signals are missing from some
channels.
PARAFAC in [Acar] 40

Advantages:
 CP-WOPT deal with missing values
 CP-WOPT uses a weighted factorization based on
PARAFAC
 good results on tested dataset.
 IDEA: CP-WOPT  RS
 Problems
 Computational cost ??
PARAFAC in [Acar] 41

Proposed solutions for missing
values and context weighting
42

Scenario
 CARS represent an evolution of the traditional CF paradigm.
 State-of-the-art is based on TF as a generalization of the
classical user-item MF that accomodates for the contextual
information.
 We are interested in the PARAFAC technique for its ability
to deal with missing values.
 We will propose the use of the algorithm CP-WOPT: our
target is to identify the most promising method of
factorization (PARAFAC) and the best algorithm
implementing this factorization.
 We propose 2 solutions to the problem of context weighting.
43

CP-WOPT Algorithm
W tensor
Rank of the tensor X
Gradient
Matrices
44

Implementation Details
 CP-WOPT algorithm is implemented in Java.
 Input tensor is given from a CSV file.
 Values range from 1 to 5.
 Missing values are conventionally represented 0.
 The output returned approximation of the input tensor
with the reconstructed missing data is stored into a CSV
file.
 Values less than 0 are normalized to 0.
45

CWBPA (Context Weighting with
Bayesian Probabilistic Approach) 1/4
Idea: Conditional Probability + Bayes’ Theorem.
1) Conditional Probability for each user and each context.
2) Compare this distribution with an equiprobable distribution
 divergence measure.
• If the 2 distributions are similar  context does not
influence the user’s rating;
• If they are very different  rating is influenced by the
context where the divergence measure is the highest.
46

CWBPA 2/4
cij="clearly", "sunny", "cloudy", "rainy”
Assumption: liking = rating is influenced from context
Contingency table for the context ci
L: Liking
variable
E. G.: ci=“weather”
n tables (contexts’ nr) x 1 user
47

CWBPA 3/4
P(ci=cij|L = 1); i = 1,..,mi ?  Bayes’ Theorem
48

• Comparing 2 distributions  divergent?
• Degree of divergence: divergence index.
DEF.: given 2 distributions A and B, which both refer to
the same quality character X, calling fA
k and fB
k the
relative frequencies related to the k,
k = 1,..,K modality of the A and B distributions, a
possibile family of divergence index is:
CWBPA 4/4
49

CWAIC (Context Weighting Association
Index Calculation) 1/2
• Idea: for each user and each context we want to calculate
the Association Index of Cramér between liking and context.
• Objective: to determine if context influences the rating.
• We establish a threshold under which there is not a
dependency rating-context, but over which there is influence
or dependency.
• Association measures are based on the value of X2, obtained
from a r x c contingency table.
• X2 test is helpful to verify independence hypotheses
(corresponding to a zero association) between:
• the modalities of the row variable
• the modalities of the coloumn variable.
50

CWAIC 2/2
Cramér’s Index Φc
The Cramér’s Index  contingency table of dimensions rxc.
Based on X2 which is the most applied index for associations
measures.
It is calculated as:
Φc=>0  not association
Φc=1  perfectly correlation but only if the table is square
Total observation
number
k=min(r, c)
51

52
Using CWBPA and CWAIC
Tensor – all context
CWBPA
CWAIC
Influential Variable
NOT Influential Variable
Output
REDUCED TENSOR
Factorization
with CP-WOPT

Evaluation of RS 1/3
 Standard metrics have been defined by judging how
much the prediction deviate from the actual rating.
 Predictive accuracy metrics:
 Mean Absolute Error (MAE): this metric measures the
deviation between prediction and actual rating provided by
the user:
 Root Mean Squared Error (RMSE): follows the same
principle of MAE but it squares the error before summing.
Consequently, it penalizes large errors since they become
much more pronounced than small ones.
54

 Classification metrics: these metrics evaluate how well a RS
can split the item space into relevant and non-relevant
items.
 Precision: this metric counts how many items among the
recommended ones are actually relevant for the target user.
 Recall: this metric counts how many items among those that
are relevant for the target user are actually recommended.
Recommended Content NOT Recommended Content
Relevant Content True Positive (TR) False Negative (FN)
Irrelevant Content False Positive (FR) True Negative (TR)
55

 F-Measure: a metric defined as the harmonic mean of
precision and recall metrics. Let β be a parameter that
determines the relative influence of both precision and
recall, the F-Measure is calculated as follows:
β=1 
F = 2
PR·RE
PR+ RE
56

• 3 preliminary tests of the CP-WOPT  verify
the effectiveness of this algorithm and
to evaluate standard metrics;
• 1 evaluation without context;
• 2 evaluations to test our solutions
CWBPA and CWAIC for context
weighting.
Introduction 1/2
57

58
Introduction 2/2
Why 2 Baselines?
• 1 without contextual information on 1 dataset
• 1 with all contextual information available on 1
dataset.
Does the proposed solutions work as a “filter” for
contextual information?

CP-WOPT: preliminary evaluations 1/5
Preliminary user study:
• 7 real users
• rated a fixed number of movies (11)
• 3 contextual factors.
3 contextual factors:
i) if they like to watch the movie at home or at the cinema;
ii) with friends or with a partner;
iii) with or without family.
Ratings range: 1-5 with “encoding” of context into rating:
• rating 1 and 2 express a strong and a modest preference,
respectively, for the first context term;
• rating 3 expresses neutrality;
• rating 4 and 5 express a modest and a strong preference,
respectively, for the second context term.
59

Metrics used: accuracy – coverage.
Accuracy: the percentage of known values
correctly reconstructed:
Coverage: the percentage of non-zero values
returned:
60

The experiment shows that it is possible to express, through the n -
dimensional factorization, not only recommendations to the single
user, but also more general considerations such as the mode of using
an item, i.e. its trend of use.
61

• Dataset used: subset of Movielens 100K
• Input: tensor of dimensions 100 users x 150 movies x
21 occupations.
• Contextual information: occupation (only available
information in the dataset as contextual information)
• Results:
• acc = 92,09%
• cov = 99,96%
• MAE = 0,60
• RMSE = 0,93.
Acceptable accuracy
Coverage is very good
62

Baseline: MyMediaLite* RS
• UserItem-Baseline: CF algorithm
• SVDPlusPlus: MF algorithm based on Singular Value
Decomposition
* http://www.mymedialite.net
63

Evaluation of an explicit context dataset
Dataset: LDOS-CoMoDa**
LDOS-CoMoDa contains:
• ratings for the movies
• the 12 pieces of contextual information describing the
situation in which the movies were watched.
Properties:
• ratings and the contextual information are explicitly
acquired from the users immediately after they consumed
the item;
• the ratings and the contextual information are from real user-item
Interaction;
• users are able to rate the same item more than once if they consumed
the item multiple times.
** www.ldos.si/comoda.html
64

 LDOS-CoMoDa dataset has been in development since 15
September 2010. It contains 3 main groups of
information:
 general user information: provided by the user upon
registering in the system  user’s age, sex, country and
city;
 item metadata: inserted into the dataset for each movie
rated by at least one user  director’s name and surname,
country, language, year;
 contextual information.
LDOS-CoMoDa
65

We experimented CP-WOPT on LDOS-CoMoDa dataset
with ALL CONTEXT selected (19 contextual features).
Accuracy
Metrics
We use 70% of ratings, by replacing the
30% of known rating with zero values.
The 30% of values is randomly choosen.
Evaluation on explicit context dataset 1/2
66

0
0.2
0.4
0.6
0.8
1
1.2
CAMF (CAMF_C) DCW 1.017 SpliingApproaches (UI
Splitting)
CP-WOPT
RMSE
Evaluation of explicit context dataset 2/2
67

Baseline without context
 This experiment aims at creating a baseline to compare our
standard recommendation algorithms which do not exploit
contextual information, so we want to use a 2D
recommender.
 For this purpose we run Mahout Algorithms on LDOS-CoMoDa
dataset.
 The Mahout recommender requires an input file or data. We
will use a CSV file where user’s ratings assigned under some
contextual situations are stored.
 We neglect contextual information.
 We remove the ratings given on the same item under
different contexts case.
 We consider the first rating in temporal order ignoring the
others.
 We will rearrange the data as triplet: <id user, id item,
rating>.
68

Mahout algorithms compared
 Some standard collaborative filtering algorithms are
compared:
 Singular Valued Decomposition
 Different algorithms based on several user similarity
measures (Spearman Correlation, Pearson Correlation,
Euclidean Distance, Tanimoto Coefficient)
 Algorithms based on item similarity (Log Likelihood,
Euclidean Distance, Pearson Correlation)
 Slope One Recommender.
 For user similarity we use 10 neighborhoods to calculate
the similarity between users.
 We use 60% of the data as training set and 40% as test
set.
69

Experimental Evaluation 1/6
0.00
0.20
0.40
0.60
0.80
1.00
1.20
1.40
1.60
1.80
SVD Pearson User
Similarity
Euclidean User
Similarity
Tanimoto User
Similarity
Spearman User
Similarity
Euclidian Item
Similarity
Pearson Item
Similarity
Tanimoto Item
Similarity
LogLikelihood Item
Similarity
SlopeOne
MAE
RMSE
70

0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.10
SVD PearsonUser
Similarity
EuclideanUser
Similarity
TanimotoUser
Similarity
SpearmanUser
Similarity
EuclidianItem
Similarity
PearsonItem
Similarity
TanimotoItem
Similarity
LogLikelihoodItem
Similarity
SlopeOne
P@5
R@5
F-score @5
71

0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
SVD Pearson User
Similarity
Euclidean User
Similarity
Tanimoto User
Similarity
Spearman User
Similarity
Euclidian Item
Similarity
Pearson Item
Similarity
Tanimoto Item
Similarity
LogLikelihood
Item Similarity
SlopeOne
P@10
R@10
F-score @10
72

0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
SVD Pearson User
Similarity
Euclidean User
Similarity
Tanimoto User
Similarity
Spearman User
Similarity
Euclidian Item
Similarity
Pearson Item
Similarity
Tanimoto Item
Similarity
LogLikelihood
Item Similarity
SlopeOne
P@20
R@20
F-score @20
73

0.00
0.05
0.10
0.15
0.20
0.25
SVD Euclidean User Similarity Spearman User Similarity Pearson Item Similarity LogLikelihood Item
Similarity
P@50
R@50
F-score @50
74

 In general the low values are due to the fact that the
methodology used for evaluating the ranked item lists
includes unrated items in the test set.
 These items are tagged as not-relevant, therefore leading to
likely underestimated performance, compared to a situation
where all ratings are available.
 This is not a problem in our evaluation, since the goal is just
to compare algorithms, and performance is equally
understimated for all of them.
 Spearman User Similarity algorithm, which gave the lowest
error, and Euclidean User Similarity algorithms, which gave
the best accurancy, as baseline.
75

LDOS-CoMoDa dataset:
d = 19 contextual features
User’s ratings with context information are stored in a
CSV file.
We use 70% of ratings, by replacing the 30% of known
rating with zero values.
The 30% of values is randomly choosen.
CW Evaluation: Preliminary Phase
76
CW Proposed Solutions
Reduced Tensor

CWBPA Evaluation 1/2
This experiment is performed to test the 2 proposed solutions CWBPA
and CWAIC for context weighting.
We apply the 2 methods on LDOS-CoMoDa dataset for evaluating
standard metrics MAE, RMSE, accuracy, coverage, P and R.
Contingency table L=1
We compare the probability
distribution obtained from the previous
calculations with the probability
distribution 1/K,
K = number of context variables.
Divergence measure:
77

Contingency table L=1
for each context and each user.
For each table we
calculate the X2 coefficient and the Cramér’s
index
Threshold.
CWAIC Evaluation
79

CWBPA Vs CWAIC
7 runs of the 2
algorithms:
4 for CWBPA
3 for CWAIC
we select the most
significant
contextual
configurations.
80

CWBPA Vs CWAIC 1/2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Spearman User
Similarity
Euclidean User
Similarity
CWAIC CWBPA CP-WOPT
MAE
RMSE
81

CWBPA Vs CWAIC 2/2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Spearman
User Similarity
Euclidean User
Similarity
CWAIC CWBPA CP-WOPT
P R
82

CWBPA Vs CWAIC – All users 1/2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Spearman User
Similarity
Euclidean User
Similarity
CWAIC CWBPA CP-WOPT
MAE
RMSE
83

CWBPA Vs CWAIC – All users 2/2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Spearman User
Similarity
Euclidean User
Similarity
CWAIC CWBPA CP-WOPT
P R
84

Result Analysis 1/2
• Evaluated CP-WOPT algorithm as possibile solution to the
missing values:
• with a small dataset
• on a Movilens 100K subset we had good results with a low
error and good coverage value  CP-WOPT is able to
reconstruct the tensor leaving only few values as missing
data;
• On Movielens: results reached are in line with those know in
literature;
• CP-WOPT on LDOS-CoMoDa dataset is better than other state-
of-art recommendation algorithms;
• Neglecting the contextual information by using a regular 2D
RS, CF algorithms Spearman User Similarity and Euclidean
User Similarity provided better performance.
85

• CWBPA and CWAIC give different responses to the
problem of context weighting;
• CWBPA and CWAIC are evaluated on LDOS-CoMoDa
dataset, showing their effectiveness;
• Using only some contextual variables lead to give more
precise recommendations;
• CWAIC has better performance than CWBPA.
Result Analysis 2/2
86

Recap
CF  MF Tensors
TF - ContextProposals:
CP-WOPT
CWBPA
CWAIC
90

Recap – Experimental Evaluation
5 Evaluations to test:
• Effectiveness of CP-WOPT into RS;
• 2 proposed solutions for context weighting:
• both approaches seem effective;
• using only relevant contexts leads better
recommendations compared to a traditional 2D RS
or using all contextual information available.
91

Future Work 1/3
LDOS-CoMoDa dataset experiment on all context
available.
• 12 contextual variables in the LDOS-CoMoDa dataset;
• We used only 5 of them to reduce the computational
effort;
• New extended evaluation of the Bayesian Probabilistic
Approach and of the Association Index to minimize the
dimensions of the tensor.
92

Future Work 2/3
Test on another contextual dataset.
We want to test CP-WOPT, CWBPA and CWAIC on other
datasets having explicit contextual information such as:
• AIST Food dataset
• TripAdvisor dataset
to improve the significance of the results.
93

Future Work 3/3
A Real Application.
We want to implement a web-based system to acquire
data and test our proposed solutions in a concrete scenario, such
as:
Personalized Context-Aware Electronic Program Guides.
94

95
Pubblications
Most of the work presented is collected in the publications:
Giuseppe Ricci, Marco de Gemmis, Giovanni Semeraro
Matrix and Tensor Factorization Techniques applied to Recommender
Systems: a Survey.
International Journal of Computer and Information Technology
(2277 – 0764) Volume 01– Issue 01, September 2012.
Giuseppe Ricci, Marco de Gemmis, Giovanni Semeraro
Mathematical Methods of Tensor Factorization Applied to Recommender
Systems
New Trends in Databases and Information Systems
17th East European Conference on Advances in Databases and
Information Systems Volume 241, ISBN 978-3-319-01862-1, 2013, pp
383-388.
Results of Experimental Evaluation are in phase of submission.

Questions? 96
“In things which are absolutely
indifferent there can be no choice
and consequently no option or will.”
Gottfried Wilhelm von Leibniz

PhD defense

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a PhD defense

Similar a PhD defense (20)

Último

Último (20)

PhD defense