SlideShare una empresa de Scribd logo
1 de 19
Descargar para leer sin conexión
Hindawi Publishing Corporation
Advances in Artificial Intelligence
Volume 2009, Article ID 421425, 19 pages
doi:10.1155/2009/421425




Review Article
A Survey of Collaborative Filtering Techniques

           Xiaoyuan Su and Taghi M. Khoshgoftaar
           Department of Computer Science and Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA

           Correspondence should be addressed to Xiaoyuan Su, suxiaoyuan@gmail.com

           Received 9 February 2009; Accepted 3 August 2009

           Recommended by Jun Hong

           As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences
           of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first
           introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy
           protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, model-
           based, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative
           algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic
           techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a
           roadmap for research and practice in this area.

           Copyright © 2009 X. Su and T. M. Khoshgoftaar. This is an open access article distributed under the Creative Commons
           Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
           properly cited.



1. Introduction                                                                   user, ui , has a list of items, Iui , which the user has rated,
                                                                                  or about which their preferences have been inferred through
In everyday life, people rely on recommendations from                             their behaviors. The ratings can either be explicit indications,
other people by spoken words, reference letters, news reports                     and so forth, on a 1–5 scale, or implicit indications, such
from news media, general surveys, travel guides, and so                           as purchases or click-throughs [4]. For example, we can
forth. Recommender systems assist and augment this natural                        convert the list of people and the movies they like or dislike
social process to help people sift through available books,                       (Table 1(a)) to a user-item ratings matrix (Table 1(b)),
articles, webpages, movies, music, restaurants, jokes, grocery                    in which Tony is the active user that we want to make
products, and so forth to find the most interesting and                            recommendations for. There are missing values in the matrix
valuable information for them. The developers of one of                           where users did not give their preferences for certain items.
the first recommender systems, Tapestry [1] (other earlier                              There are many challenges for collaborative filtering
recommendation systems include rule-based recommenders                            tasks (Section 2). CF algorithms are required to have the
and user-customization), coined the phrase “collaborative                         ability to deal with highly sparse data, to scale with the
filtering (CF),” which has been widely adopted regardless of                       increasing numbers of users and items, to make satisfactory
the facts that recommenders may not explicitly collaborate                        recommendations in a short time period, and to deal with
with recipients and recommendations may suggest particu-                          other problems like synonymy (the tendency of the same or
larly interesting items, in addition to indicating those that                     similar items to have different names), shilling attacks, data
should be filtered out [2]. The fundamental assumption                             noise, and privacy protection problems.
of CF is that if users X and Y rate n items similarly, or                              Early generation collaborative filtering systems, such as
have similar behaviors (e.g., buying, watching, listening), and                   GroupLens [5], use the user rating data to calculate the simi-
hence will rate or act on other items similarly [3].                              larity or weight between users or items and make predictions
    CF techniques use a database of preferences for items                         or recommendations according to those calculated similarity
by users to predict additional topics or products a new user                      values. The so-called memory-based CF methods (Section 3)
might like. In a typical CF scenario, there is a list of m users                  are notably deployed into commercial systems such as
{u1 , u2 , . . . , um } and a list of n items {i1 , i2 , . . . , in }, and each   http://www.amazon.com/ (see an example in Figure 1) and
2                                                                                            Advances in Artificial Intelligence

           Table 1: An example of a user-item matrix.

                              (a)

      Alice: (like) Shrek, Snow White, (dislike) Superman
      Bob: (like) Snow White, Superman, (dislike) spiderman
      Chris: (like) spiderman, (dislike) Snow white
      Tony: (like) Shrek, (dislike) Spiderman
                             (b)

           Shrek Snow White Spider-man Super-man
     Alice Like     Like                 Dislike
     Bob            Like      Dislike     Like
     Chris         Dislike     Like
     Tony Like                Dislike       ?
                                                                  Figure 1: Amazon recommends products to customers by cus-
                                                                  tomizing CF systems.


Barnes and Noble, because they are easy-to-implement and              To evaluate CF algorithms (Section 6), we need to use
highly effective [6, 7]. Customization of CF systems for each      metrics according to the types of CF application. Instead
user decreases the search effort for users. It also promises       of classification error, the most widely used evaluation
a greater customer loyalty, higher sales, more advertising        metric for prediction performance of CF is Mean Absolute
revenues, and the benefit of targeted promotions [8].              Error (MAE). Precision and recall are widely used metrics
    However, there are several limitations for the memory-        for ranked lists of returned items in information retrieval
based CF techniques, such as the fact that the similarity         research. ROC sensitivity is often used as a decision support
values are based on common items and therefore are                accuracy metric.
unreliable when data are sparse and the common items are              As drawing convincing conclusions from artificial data is
therefore few. To achieve better prediction performance and       risky, data from live experiments are more desirable for CF
overcome shortcomings of memory-based CF algorithms,              research. The commonly used CF databases are MovieLens
model-based CF approaches have been investigated. Model-          [18], Jester [19], and Netflix prize data [20]. In Section 7, we
based CF techniques (Section 4) use the pure rating data          give the conclusion and discussion of this work.
to estimate or learn a model to make predictions [9]. The
model can be a data mining or machine learning algorithm.
Well-known model-based CF techniques include Bayesian             2. Characteristics and Challenges of
belief nets (BNs) CF models [9–11], clustering CF models             Collaborative Filtering
[12, 13], and latent semantic CF models [7]. An MDP
                                                                  E-commerce recommendation algorithms often operate
(Markov decision process)-based CF system [14] produces
                                                                  in a challenging environment, especially for large online
a much higher profit than a system that has not deployed the
                                                                  shopping companies like eBay and Amazon. Usually, a
recommender.
                                                                  recommender system providing fast and accurate recom-
    Besides collaborative filtering, content-based filtering is
                                                                  mendations will attract the interest of customers and bring
another important class of recommender systems. Content-
                                                                  benefits to companies. For CF systems, producing high-
based recommender systems make recommendations by
                                                                  quality predictions or recommendations depends on how
analyzing the content of textual information and finding
                                                                  well they address the challenges, which are characteristics of
regularities in the content. The major difference between
                                                                  CF tasks as well.
CF and content-based recommender systems is that CF
only uses the user-item ratings data to make predictions
and recommendations, while content-based recommender              2.1. Data Sparsity. In practice, many commercial recom-
systems rely on the features of users and items for predictions   mender systems are used to evaluate very large product
[15]. Both content-based recommender systems and CF               sets. The user-item matrix used for collaborative filtering
systems have limitations. While CF systems do not explicitly      will thus be extremely sparse and the performances of the
incorporate feature information, content-based systems do         predictions or recommendations of the CF systems are
not necessarily incorporate the information in preference         challenged.
similarity across individuals [8].                                    The data sparsity challenge appears in several situations,
    Hybrid CF techniques, such as the content-boosted CF          specifically, the cold start problem occurs when a new user
algorithm [16] and Personality Diagnosis (PD) [17], com-          or item has just entered the system, it is difficult to find
bine CF and content-based techniques, hoping to avoid             similar ones because there is not enough information (in
the limitations of either approach and thereby improve            some literature, the cold start problem is also called the
recommendation performance (Section 5).                           new user problem or new item problem [21, 22]). New items
    A brief overview of CF techniques is depicted in Table 2.     cannot be recommended until some users rate it, and new
Advances in Artificial Intelligence                                                                                                         3

                                        Table 2: Overview of collaborative filtering techniques.

CF categories           Representative techniques             Main advantages                       Main shortcomings
                        ∗                                      ∗
                          Neighbor-based CF                      easy implementation                ∗
                                                                                                      are dependent on human ratings
                        (item-based/user-based CF              ∗
                                                                 new data can be added easily and   ∗
                                                                                                      performance decrease when data
Memory-based CF         algorithms with Pearson/vector        incrementally                         are sparse
                        cosine correlation)                    ∗                                    ∗
                        ∗                                        need not consider the content of     cannot recommend for new users
                          Item-based/user-based top-N
                                                              the items being recommended           and items
                        recommendations                                                             ∗
                                                              ∗                                       have limited scalability for large
                                                                  scale well with co-rated items
                                                                                                    datasets
                        ∗                                      ∗
                          Bayesian belief nets CF                better address the sparsity,       ∗
                                                                                                        expensive model-building
                        ∗
                          clustering CF                       scalability and other problems
                        ∗                                                                           ∗
Model-based CF            MDP-based CF                        ∗                                       have trade-off between prediction
                        ∗
                                                                  improve prediction performance
                          latent semantic CF                                                        performance and scalability
                        ∗                                                                            ∗
                          sparse factor analysis              ∗
                                                                give an intuitive rationale for        lose useful information for
                        ∗
                          CF using dimensionality             recommendations                       dimensionality reduction
                        reduction techniques, for example,                                          techniques
                        SVD, PCA
                                                              ∗
                        ∗                                      overcome limitations of CF and       ∗
                          content-based CF recommender,                                              have increased complexity and
                                                              content-based or other
                        for example, Fab                                                            expense for implementation
Hybrid recommenders                                           recommenders
                                                                                                    ∗
                        ∗                                     ∗                                       need external information that
                            content-boosted CF                    improve prediction performance
                                                                                                    usually not available
                        ∗
                         hybrid CF combining
                                                              ∗
                        memory-based and model-based            overcome CF problems such as
                        CF algorithms, for example,           sparsity and gray sheep
                        Personality Diagnosis


users are unlikely given good recommendations because of                    Hybrid CF algorithms, such as the content-boosted CF
the lack of their rating or purchase history. Coverage can              algorithm [16], are found helpful to address the sparsity
be defined as the percentage of items that the algorithm                 problem, in which external content information can be used
could provide recommendations for. The reduced coverage                 to produce predictions for new users or new items. In Ziegler
problem occurs when the number of users’ ratings may be                 et al. [28], a hybrid collaborative filtering approach was
very small compared with the large number of items in                   proposed to exploit bulk taxonomic information designed
the system, and the recommender system may be unable to                 for exact product classification to address the data sparsity
generate recommendations for them. Neighbor transitivity                problem of CF recommendations, based on the generation
refers to a problem with sparse databases, in which users               of profiles via inference of super-topic score and topic
with similar tastes may not be identified as such if they                diversification [28]. Schein et al. proposed the aspect model
have not both rated any of the same items. This could                   latent variable method for cold start recommendation, which
reduce the effectiveness of a recommendation system which                combines both collaborative and content information in
relies on comparing users in pairs and therefore generating             model fitting [29]. Kim and Li proposed a probabilistic
predictions.                                                            model to address the cold start problem, in which items are
    To alleviate the data sparsity problem, many approaches             classified into groups and predictions are made for users
have been proposed. Dimensionality reduction techniques,                considering the Gaussian distribution of user ratings [30].
such as Singular Value Decomposition (SVD) [23], remove                     Model-based CF algorithms, such as TAN-ELR (tree aug-
unrepresentative or insignificant users or items to reduce               mented na¨ve Bayes optimized by extended logistic regres-
                                                                                    ı
the dimensionalities of the user-item matrix directly. The              sion) [11, 31], address the sparsity problem by providing
patented Latent Semantic Indexing (LSI) used in information             more accurate predictions for sparse data. Some new model-
retrieval is based on SVD [24, 25], in which similarity                 based CF techniques that tackle the sparsity problem include
between users is determined by the representation of the                the association retrieval technique, which applies an asso-
users in the reduced space. Goldberg et al. [3] developed               ciative retrieval framework and related spreading activation
eigentaste, which applies Principle Component Analysis                  algorithms to explore transitive associations among users
(PCA), a closely-related factor analysis technique first                 through their rating and purchase history [32]; Maximum
described by Pearson in 1901 [26], to reduce dimensionality.            margin matrix factorizations (MMMF), a convex, infinite
However, when certain users or items are discarded, useful              dimensional alternative to low-rank approximations and
information for recommendations related to them may                     standard factor models [33, 34]; ensembles of MMMF
get lost and recommendation quality may be degraded [6,                 [35]; multiple imputation-based CF approaches [36]; and
27].                                                                    imputation-boosted CF algorithms [37].
4                                                                                               Advances in Artificial Intelligence

2.2. Scalability. When numbers of existing users and items           and ignore the smaller, less important ones. The performance
grow tremendously, traditional CF algorithms will suffer              of LSI in addressing the synonymy problem is impressive
serious scalability problems, with computational resources           at higher recall levels where precision is ordinarily quite
going beyond practical or acceptable levels. For example,            low, thus representing large proportional improvements.
with tens of millions of customers (M) and millions of               However, the performance of the LSI method at the lowest
distinct catalog items (N), a CF algorithm with the com-             levels of recall is poor [25].
plexity of O(n) is already too large. As well, many systems              The LSI method gives only a partial solution to the
need to react immediately to online requirements and make            polysemy problem, which refers to the fact that most words
recommendations for all users regardless of their purchases          have more than one distinct meaning [25].
and ratings history, which demands a high scalability of a CF
system [6].
    Dimensionality reduction techniques such as SVD can              2.4. Gray Sheep. Gray sheep refers to the users whose
deal with the scalability problem and quickly produce                opinions do not consistently agree or disagree with any group
good quality recommendations, but they have to undergo               of people and thus do not benefit from collaborative filtering
expensive matrix factorization steps. An incremental SVD             [46]. Black sheep are the opposite group whose idiosyncratic
CF algorithm [38] precomputes the SVD decomposition                  tastes make recommendations nearly impossible. Although
using existing users. When a new set of ratings are added            this is a failure of the recommender system, non-electronic
to the database, the algorithm uses the folding-in projection        recommenders also have great problems in these cases, so
technique [25, 39] to build an incremental system without re-        black sheep is an acceptable failure [47].
computing the low-dimensional model from scratch. Thus it                Claypool et al. provided a hybrid approach combining
makes the recommender system highly scalable.                        content-based and CF recommendations by basing a predic-
    Memory-based CF algorithms, such as the item-based               tion on a weighted average of the content-based prediction
Pearson correlation CF algorithm can achieve satisfactory            and the CF prediction. In that approach, the weights of the
scalability. Instead of calculating similarities between all pairs   content-based and CF predictions are determined on a per-
of items, item-based Pearson CF calculates the similarity            user basis, allowing the system to determine the optimal
only between the pair of co-rated items by a user [6,                mix of content-based and CF recommendation for each user,
40]. A simple Bayesian CF algorithm tackles the scalability          helping to solve the gray sheep problem [46].
problem by making predictions based on observed ratings
[41]. Model-based CF algorithms, such as clustering CF               2.5. Shilling Attacks. In cases where anyone can provide
algorithms, address the scalability problem by seeking users         recommendations, people may give tons of positive rec-
for recommendation within smaller and highly similar                 ommendations for their own materials and negative rec-
clusters instead of the entire database [13, 42–44], but there       ommendations for their competitors. It is desirable for CF
are tradeoffs between scalability and prediction performance.         systems to introduce precautions that discourage this kind of
                                                                     phenomenon [2].
                                                                         Recently, the shilling attacks models for collaborative
2.3. Synonymy. Synonymy refers to the tendency of a                  filtering system have been identified and their effectiveness
number of the same or very similar items to have different            has been studied. Lam and Riedl found that item-based
names or entries. Most recommender systems are unable                CF algorithm was much less affected by the attacks than
to discover this latent association and thus treat these             the user-based CF algorithm, and they suggest that new
products differently. For example, the seemingly different             ways must be used to evaluate and detect shilling attacks on
items “children movie” and “children film” are actual the             recommender systems [48]. Attack models for shilling the
same item, but memory-based CF systems would find no                  item-based CF systems have been examined by Mobasher
match between them to compute similarity. Indeed, the                et al., and alternative CF systems such as hybrid CF systems
degree of variability in descriptive term usage is greater than      and model-based CF systems were believed to have the ability
commonly suspected. The prevalence of synonyms decreases             to provide partial solutions to the bias injection problem
the recommendation performance of CF systems.                        [49]. O’Mahony et al. contributed to solving the shilling
    Previous attempts to solve the synonymy problem                  attacks problem by analyzing robustness, a recommender
depended on intellectual or automatic term expansion, or             system’s resilience to potentially malicious perturbations in
the construction of a thesaurus. The drawback for fully              the customer/product rating matrix [50].
automatic methods is that some added terms may have                      Bell and Koren [51] used a comprehensive approach to
different meanings from intended, thus leading to rapid               the shilling attacks problem by removing global effects in
degradation of recommendation performance [45].                      the data normalization stage of the neighbor-based CF, and
    The SVD techniques, particularly the Latent Semantic             working with residual of global effects to select neighbors.
Indexing (LSI) method, are capable of dealing with the               They achieved improved CF performance on the Netflix [20]
synonymy problems. SVD takes a large matrix of term-                 data.
document association data and construct a semantic space
where terms and documents that are closely associated are
placed closely to each other. SVD allows the arrangement of          2.6. Other Challenges. As people may not want their habits
the space to reflect the major associative patterns in the data,      or views widely known, CF systems also raise concerns about
Advances in Artificial Intelligence                                                                                                         5

     Table 3: The Nexflix Prize Leaderboard as of July 2009.        neighborhood models [58]. Some interesting research papers
                                                                   on the Netflix prize challenge can be found in the 2008 KDD
Rank Team                 Best RMSE score Improvement (%)
                                                                   Netflix Workshop (http://netflixkddworkshop2008.info/).
     BellKor’s Pragmatic
1                              0.8556          10.07
     Chaos
2    Grand Prize Team          0.8571           9.91               3. Memory-Based Collaborative
3
     Opera Solutions and
                               0.8573           9.89                  Filtering Techniques
     Vandelay United
4    Vandelay Industries!      0.8579           9.83               Memory-based CF algorithms use the entire or a sample of
5    Pragmatic Theory          0.8582           9.80               the user-item database to generate a prediction. Every user is
6                              0.8590           9.71
                                                                   part of a group of people with similar interests. By identifying
     BellKor in BigChaos
                                                                   the so-called neighbors of a new user (or active user), a
7    Dace                      0.8605           9.55
                                                                   prediction of preferences on new items for him or her can
8    Opera Solutions           0.8611           9.49               be produced.
9    BellKor                   0.8612           9.48                   The neighborhood-based CF algorithm, a prevalent
10   BigChaos                  0.8613           9.47               memory-based CF algorithm, uses the following steps: cal-
                                                                   culate the similarity or weight, wi, j , which reflects distance,
                                                                   correlation, or weight, between two users or two items, i
          Table 4: A simple example of ratings matrix.
                                                                   and j; produce a prediction for the active user by taking the
               I1              I2             I3              I4   weighted average of all the ratings of the user or item on
U1             4                ?             5               5    a certain item or user, or using a simple weighted average
U2             4               2              1                    [40]. When the task is to generate a top-N recommendation,
                                                                   we need to find k most similar users or items (nearest
U3             3                              2               4
                                                                   neighbors) after computing the similarities, then aggregate
U4             4               4
                                                                   the neighbors to get the top-N most frequent items as the
U5             2               1               3              5    recommendation.


personal privacy. Miller et al. [4] and Canny [52] find ways        3.1. Similarity Computation. Similarity computation be-
to protect users’ privacy for CF recommendation tasks.             tween items or users is a critical step in memory-based
    Increased noise (or sabotage) is another challenge, as         collaborative filtering algorithms. For item-based CF algo-
the user population becomes more diverse. Ensembles of             rithms, the basic idea of the similarity computation between
maximum margin matrix factorizations [35] and instance             item i and item j is first to work on the users who have
selection techniques [53] are found useful to address the          rated both of these items and then to apply a similarity
noise problems of CF tasks. As Dempster-Shafer (DS)                computation to determine the similarity, wi, j , between the
theory [54, 55] and imputation techniques [56] have been           two co-rated items of the users [40]. For a user-based CF
successfully applied to accommodate imperfect and noisy            algorithm, we first calculate the similarity, wu,v , between the
data for knowledge representation and classification tasks,         users u and v who have both rated the same items.
they are also potentially useful to deal with the noise problem        There are many different methods to compute similarity
of CF tasks.                                                       or weight between users or items.
    Explainability is another important aspect of recom-
mender systems. An intuitive reasoning such as “you will like
this book because you liked those books” will be appealing         3.1.1. Correlation-Based Similarity. In this case, similarity
and beneficial to readers, regardless of the accuracy of the        wu,v between two users u and v, or wi, j between two items
explanations [57].                                                 i and j, is measured by computing the Pearson correlation or
                                                                   other correlation-based similarities.
                                                                        Pearson correlation measures the extent to which two
2.7. The Netflix Prize Challenge. Launched in October 2006,         variables linearly relate with each other [5]. For the user-
the Netflix prize challenge [20] attracted thousands of             based algorithm, the Pearson correlation between users u and
researchers to compete in the million-dollar-prize race for        v is
a most improved performance for movie recommendations.
The challenge is featured with a large-scale industrial
dataset (with 480,000 users and 17,770 movies), and a rigid                                  i∈I   ru,i − r u rv,i − r v
                                                                            wu,v =                                                    ,   (1)
performance metric of RMSE (see detailed description in                                                   2                       2
                                                                                       i∈I   ru,i − r u        i∈I   rv,i − r v
Section 6).
    Up to July, 2009, the Leaderboard on the Netflix prize
competition is as Table 3, in which the leading team “BellKor      where the i ∈ I summations are over the items that both the
in Pragmatic Chaos” (with 10.05% improved RMSE over the            users u and v have rated and r u is the average rating of the
Netflix movie recommendation system: Cinematch) based               co-rated items of the uth user. In an example in Table 4, we
their solution on a merged model of latent factor and              have w1,5 = 0.756.
6                                                                                                              Advances in Artificial Intelligence

                       1 2 ···       i       j ··· m − 1 m                      where “•” denotes the dot-product of the two vectors. To
               1                     R        ?                                 get the desired similarity computation, for n items, an n × n
               2                     R       R                                  similarity matrix is computed [27]. For example, if the vector
               .
               .                                                                A = {x1 , y1 }, vector B = {x2 , y2 }, the vector cosine similarity
               .
                                                                                between A and B is
               l                     R       R
               .                                                                                               A• B              x1 x2 + y 1 y 2
               .                                                                  wA,B = cos A, B =                       =                          .
               .                                                                                              A ∗ B             2       2  2     2
                                                                                                                               x1   + y 1 x2 + y 2
           n−1                       ?       R                                                                                                           (4)
             n                       R       R
                                                                                     In an actual situation, different users may use different
Figure 2: item-based similarity (wi, j ) calculation based on the co-           rating scales, which the vector cosine similarity cannot take
rated items i and j from users 2, l and n.                                      into account. To address this drawback, adjusted cosine sim-
                                                                                ilarity is used by subtracting the corresponding user average
                                                                                from each co-rated pair. The Adjusted cosine similarity has
                                                                                the same formula as Pearson correlation (2). In fact, Pearson
    For the item-based algorithm, denote the set of users u ∈                   correlation performs cosine similarity with some sort of
U who rated both items i and j, then the Pearson Correlation                    normalization of the user’s ratings according to his own
will be                                                                         rating behavior. Hence, we may get negative values with
                                                                                Pearson correlation, but not with cosine similarity, supposing
                         u∈U     ru,i − r i ru, j − r j                         we have an n-point rating scale.
         wi, j =                                                      ,   (2)
                      u∈U   (ru,i − r i )2    u∈U   (ru, j − r j )2
                                                                                3.1.3. Other Similarities. Another similarity measure is con-
                                                                                ditional probability-based similarity [62, 63]. As it is not
where ru,i is the rating of user u on item i, r i is the average
                                                                                commonly-used, we will not discuss it in detail in this paper.
rating of the ith item by those users, see Figure 2 [40].
    Some variations of item-based and user-based Pearson
correlations can be found in [59]. The Pearson correlation-                     3.2. Prediction and Recommendation Computation. To obtain
based CF algorithm is a representative CF algorithm, and is                     predictions or recommendations is the most important step
widely used in the CF research community.                                       in a collaborative filtering system. In the neighborhood-
    Other correlation-based similarities include: constrained                   based CF algorithm, a subset of nearest neighbors of the
Pearson correlation, a variation of Pearson correlation that                    active user are chosen based on their similarity with him
uses midpoint instead of mean rate; Spearman rank correla-                      or her, and a weighted aggregate of their ratings is used to
tion, similar to Pearson correlation, except that the ratings are               generate predictions for the active user [64].
ranks; and Kendall’s τ correlation, similar to the Spearman
rank correlation, but instead of using ranks themselves, only                   3.2.1. Weighted Sum of Others’ Ratings. To make a prediction
the relative ranks are used to calculate the correlation [3,                    for the active user, a, on a certain item, i, we can take a
60].                                                                            weighted average of all the ratings on that item according to
    Usually the number of users in the computation of                           the following formula [5]:
similarity is regarded as the neighborhood size of the active
                                                                                                               u∈U    ru,i − r u · wa,u
user, and similarity based CF is deemed as neighborhood-                                       Pa,i = r a +                             ,                (5)
based CF.                                                                                                             u∈U wa,u

                                                                                where r a and r u are the average ratings for the user a and
3.1.2. Vector Cosine-Based Similarity. The similarity between                   user u on all other rated items, and wa,u is the weight between
two documents can be measured by treating each document                         the user a and user u. The summations are over all the users
as a vector of word frequencies and computing the cosine                        u ∈ U who have rated the item i. For the simple example
of the angle formed by the frequency vectors [61]. This                         in Table 4, using the user-based CF algorithm, to predict the
formalism can be adopted in collaborative filtering, which                       rating for U1 on I2 , we have
uses users or items instead of documents and ratings instead                                        ru,2 − r u · w1,u
                                                                                                u
of word frequencies.                                                            P1,2 = r 1 +
                                                                                                        u w1,u
    Formally, if R is the m × n user-item matrix, then the
similarity between two items, i and j, is defined as the cosine                                 r2,2 − r 2 w1,2 + r4,2 − r 4 w1,4 + r5,2 − r 5 w1,5
of the n dimensional vectors corresponding to the ith and jth                        = r1 +
                                                                                                            w1,2 + w1,4 + w1,5
column of matrix R.
    Vector cosine similarity between items i and j is given by                                  (2 − 2.5)(−1) + (4 − 4)0 + (1 − 3.33)0.756
                                                                                     = 4.67 +
                                                                                                              1 + 0 + 0.756
                                                 i•j                                 = 3.95.
                   wi, j = cos i, j =                     ,               (3)
                                             i ∗ j                                                                                                       (6)
Advances in Artificial Intelligence                                                                                                   7

    Note the above prediction is based on the neighborhood          potentially produce suboptimal recommendations. To solve
of the active users.                                                this problem, Deshpande and Karypis [63] developed higher-
                                                                    order item-based top-N recommendation algorithms that
3.2.2. Simple Weighted Average. For item-based prediction,          use all combinations of items up to a particular size when
we can use the simple weighted average to predict the rating,       determining the itemsets to be recommended to a user.
Pu,i , for user u on item i [40]
                                                                    3.4. Extensions to Memory-Based Algorithms
                              n∈N ru,n wi,n
                     Pu,i =                   ,              (7)
                               n∈N   wi,n                           3.4.1. Default Voting. In many collaborative filters, pair-
                                                                    wise similarity is computed only from the ratings in the
where the summations are over all other rated items n ∈ N           intersection of the items both users have rated [5, 27]. It
for user u, wi,n is the weight between items i and n, ru,n is the   will not be reliable when there are too few votes to generate
rating for user u on item n.                                        similarity values. Also, focusing on intersection set similarity
                                                                    neglects the global rating behavior reflected in a user’s entire
3.3. Top-N Recommendations. Top-N recommendation is to              rating history.
recommend a set of N top-ranked items that will be of                   Empirically, assuming some default voting values for the
interest to a certain user. For example, if you are a returning     missing ratings can improve the CF prediction performance.
customer, when you log into your http://amazon.com/                 Herlocker et al. [64] accounts for small intersection sets by
account, you may be recommended a list of books (or other           reducing the weight of users that have fewer than 50 items in
products) that may be of your interest (see Figure 1). Top-         common. Chee et al. [13] uses the average of the clique (or
N recommendation techniques analyze the user-item matrix            small group) as default voting to extend each user’s rating
to discover relations between different users or items and           history. Breese et al. [9] uses a neutral or somewhat negative
use them to compute the recommendations. Some models,               preference for the unobserved ratings and then computes the
such as association rule mining based models, can be used to        similarity between users on the resulting ratings data.
make top-N recommendations, which we will introduce in
Section 4.                                                          3.4.2. Inverse User Frequency. The idea of inverse user fre-
                                                                    quency [61] applied in collaborative filtering is that univer-
3.3.1. User-Based Top-N Recommendation Algorithms. User-            sally liked items are not as useful in capturing similarity as
based top-N recommendation algorithms firstly identify the           less common items. The inverse frequency can be defined
k most similar users (nearest neighbors) to the active user         as f j = log(n/n j ), where n j is the number of users who
using the Pearson correlation or vector-space model [9, 27], in     have rated item j and n is the total number of users. If
which each user is treated as a vector in the m-dimensional         everyone has rated item j, then f j is zero. To apply inverse
item space and the similarities between the active user and         user frequency while using the vector similarity-based CF
other users are computed between the vectors. After the k           algorithm, we need to use a transformed rating, which is
most similar users have been discovered, their corresponding        simply the original rating multiplied by the f j factor [9].
rows in the user-item matrix R are aggregated to identify
a set of items, C, purchased by the group together with             3.4.3. Case Amplification. Case amplification refers to a
their frequency. With the set C, user-based CF techniques           transform applied to the weights used in the basic collab-
then recommend the top-N most frequent items in C                   orative filtering prediction. The transform emphasizes high
that the active user has not purchased. User-based top-             weights and punishes low weights [9]:
N recommendation algorithms have limitations related to                                                          ρ−1
scalability and real-time performance [62].                                              wi, j = wi, j · wi, j         ,           (8)

3.3.2. Item-Based Top-N Recommendation Algorithms. Item-            where ρ is the case amplification power, ρ ≥ 1, and a typical
based top-N recommendation algorithms have been devel-              choice of ρ is 2.5 [65]. Case amplification reduces noise in the
oped to address the scalability problem of user-based top-N         data. It tends to favor high weights as small values raised to a
recommendation algorithms. The algorithms firstly compute            power become negligible. If the weight is high, for example,
the k most similar items for each item according to the             wi, j = 0.9, then it remains high (0.92.5 ≈ 0.8); if it is low, for
similarities; then identify the set, C, as candidates of recom-     example, wi, j = 0.1, then it will be negligible (0.12.5 ≈ 0.003).
mended items by taking the union of the k most similar items
and removing each of the items in the set, U, that the user         3.4.4. Imputation-Boosted CF Algorithms. When the rating
has already purchased; then calculate the similarities between      data for CF tasks are extremely sparse, it will be prob-
each item of the set C and the set U. The resulting set of          lematic to produce accurate predictions using the Pearson
the items in C, sorted in decreasing order of the similarity,       correlation-based CF. Su et al. [37, 66] proposed a framework
will be the recommended item-based Top-N list [62]. One             of imputation-boosted collaborative filtering (IBCF), which
problem of this method is, when the joint distribution              first uses an imputation technique to fill in the missing
of a set of items is different from the distributions of             data, before using a traditional Pearson correlation-based
the individual items in the set, the above schemes can              CF algorithm on this completed data to predict a specific
8                                                                                              Advances in Artificial Intelligence

user rating for a specified item. After comprehensively           variable, each directed arc a ∈ A between nodes is
investigating the use of various standard imputation tech-       a probabilistic association between variables, and Θ is a
niques (including mean imputation, linear regression impu-       conditional probability table quantifying how much a node
tation, and predictive mean matching [67] imputation, and        depends on its parents [72]. Bayesian belief nets (BNs) are
Bayesian multiple imputation [68]), and machine learning         often used for classification tasks.
classifiers [66] (including na¨ve Bayes, SVM, neural network,
                              ı
decision tree, lazy Bayesian rules) as imputers for IBCF, they   4.1.1. Simple Bayesian CF Algorithm. The simple Bayesian
found that the proposed IBCF algorithms can perform very         CF algorithm uses a na¨ve Bayes (NB) strategy to make pre-
                                                                                          ı
effectively in general, and that IBCF using Bayesian multiple     dictions for CF tasks. Assuming the features are independent
imputation, IBCF-NBM (a mixture IBCF which uses IBCF             given the class, the probability of a certain class given all of
using na¨ve Bayes for denser datasets and IBCF using mean
         ı                                                       the features can be computed, and then the class with the
imputation for sparser ones) [37], and IBCF using na¨ve     ı    highest probability will be classified as the predicted class
Bayes perform especially well, outperforming the content-        [41]. For incomplete data, the probability calculation and
boosted CF algorithm (a representative hybrid CF), and do        classification production are computed over observed data
so without using external content information.                   (the subscript o in the following equation indicates observed
                                                                 values):
3.4.5. Weighted Majority Prediction. The weighted majority
prediction algorithm proposed by Goldman and Warmuth                 class = arg max p class j         P Xo = xo | class j .    (9)
[69] makes its prediction using the rows with observed data                     j ∈ classSet       o
in the same column, weighted by the believed similarity
between the rows, with binary rating values. The weights             The Laplace Estimator is used to smooth the probability
(or similarities, with initialized values of 1) are increased    calculation and avoid a conditional probability of 0:
by multiplying it by (2 − γ) when the compared values are
same, and decreased by multiplying by γ when different,                                          # Xi = xi , Y = y + 1
                                                                       P Xi = xi | Y = y =                            ,        (10)
with γ ∈ (0, 1). This update is equivalent to wii =                                               # Y = y + |X i |
(2 − γ)Cii γWii , where Cii is the number of rows that have
the same value as in row i and Wii is the number of rows         where |Xi | is the size of the class set {Xi }. For an example of
having different values. The prediction for a rating on a         binary class, P(Xi = 0 | Y = 1) = 0/2 will be (0+1)/(2+2) =
certain item by the active user is determined by the rating on   1/4, P(Xi = 1 | Y = 1) = 2/2 will be (2 + 1)/(2 + 2) = 3/4
the item by a certain user, who has the highest accumulated      using the Laplace Estimator.
weight value with the active user. This algorithm can be             Using the same example in Table 4, the class set is
generalized to multiclass data, and be extended from user-to-    {1, 2, . . . , 5}, to produce the rating for U1 on I2 using the
user similarity to item-to-item similarity and to user-item-     simple Bayesian CF algorithm and the Laplace Estimator, we
combined similarity [70]. One shortcoming of this algorithm      have
is the scalability, when the user number or item number
                                                                  class = arg max p c j | U2 = 2, U4 = 4, U5 = 1
grows over a certain large number n, it will be impractical               c j ∈ {1,2,3,4,5}
for the user-to-user or item-to-item similarity computations
to update the O(n2 ) similarity matrices.                              = arg max p c j P U2 = 2 | c j P U4 = 4 | c j
                                                                          c j ∈{1,2,3,4,5}
                                                                                                                               (11)
4. Model-Based Collaborative
                                                                          × P U5 = 1 | c j
   Filtering Techniques
                                                                       = arg max {0, 0, 0, 0.0031, 0.0019} = 4
The design and development of models (such as machine                     c j ∈{1,2,3,4,5}
learning, data mining algorithms) can allow the system to
learn to recognize complex patterns based on the training        in which p(5)P(U2 = 2 | 5)P(U4 = 4 | 5)P(U5 = 1 | 5) =
data, and then make intelligent predictions for the col-         (2/3) ∗ (1/7) ∗ (1/7) ∗ (1/7) = 0.0019.
laborative filtering tasks for test data or real-world data,          In Miyahara and Pazzani [10], multiclass data are firstly
based on the learned models. Model-based CF algorithms,          converted to binary-class data, and then converted to a
such as Bayesian models, clustering models, and dependency       Boolean feature vector rating matrix. These conversions make
networks, have been investigated to solve the shortcomings of    the use of the NB algorithm for CF tasks easier, but bring the
memory-based CF algorithms [9, 71]. Usually, classification       problems of scalability and the loss of multiclass information
algorithms can be used as CF models if the user ratings are      for multiclass data. In Miyahara and Pazzani [41], they
categorical, and regression models and SVD methods and be        applied the simple Bayesian CF model only on binary data.
used for numerical ratings.                                          Because most real-world CF data are multiclass ones,
                                                                 Su and Khoshgoftaar [11] apply the simple Bayesian CF
4.1. Bayesian Belief Net CF Algorithms. A Bayesian belief        algorithm to multiclass data for CF tasks, and found
net (BN) is a directed, acyclic graph (DAG) with a triplet       simple Bayesian CF has worse predictive accuracy but better
 N, A, Θ , where each node n ∈ N represents a random             scalability than the Pearson correlation-based CF as it makes
Advances in Artificial Intelligence                                                                                                 9

predictions based on observed ratings, and the prediction-             and q is a positive integer. When q = 1, d is Manhattan
making process is less time-consuming.                                 distance; when q = 2, d is Euclidian distance [76].
    The simple Bayesian CF algorithm can be regarded                       Clustering methods can be classified into three cate-
as memory-based CF technique because of its in-memory                  gories: partitioning methods, density-based methods, and
calculation for CF predictions. We put it in this section              hierarchical methods [76, 77]. A commonly-used parti-
for the reason that most other Bayesian CF algorithms are              tioning method is k-means, proposed by MacQueen [78],
model-based CFs.                                                       which has two main advantages: relative efficiency and easy
                                                                       implementation. Density-based clustering methods typically
4.1.2. NB-ELR and TAN-ELR CF Algorithms. Because of the                search for dense clusters of objects separated by sparse
limitations of the simple Bayesian algorithm for CF tasks,             regions that represent noise. DBSCAN [79] and OPTICS
advanced BNs CF algorithms, with their ability to deal with            [80] are well-known density-based clustering methods. Hier-
incomplete data, can be used instead [11]. Extended logistic           archical clustering methods, such as BIRCH [81], create a
regression (ELR) is a gradient-ascent algorithm [31, 73],              hierarchical decomposition of the set of data objects using
which is a discriminative parameter-learning algorithm that            some criterion.
maximizes log conditional likelihood.                                      In most situations, clustering is an intermediate step
    TAN-ELR and NB-ELR (tree augmented na¨ve Bayes [74]
                                               ı                       and the resulting clusters are used for further analysis or
and na¨ve Bayes optimized by ELR, resp.) have been proven
        ı                                                              processing to conduct classification or other tasks. Clustering
to have high classification accuracy for both complete and              CF models can be applied in different ways. Sarwar et al. [43]
incomplete data [31, 73].                                              and O’Connor and Herlocker [42] use clustering techniques
    Applied to CF tasks, working on real-world multiclass              to partition the data into clusters and use a memory-based
CF datasets and using MAE as evaluation criterion, the                 CF algorithm such as a Pearson correlation-based algorithm
empirical results show that the TAN-ELR CF and NB-                     to make predictions for CF tasks within each cluster.
ELR CF algorithms perform significantly better than the                     Using the k-means method with k = 2, the RecTree
simple Bayesian CF algorithm, and consistently better than             method, proposed by Chee et al. [13], recursively splits
the Pearson correlation memory-based CF algorithm [11].                the originally large rating data into two sub-clusters as
However, TAN-ELR and NB-ELR need a longer time to train                it constructs the RecTree from the root to its leaves. The
the models. A solution is to run the time-consuming training           resulting RecTree resembles an unbalanced binary tree, of
stage offline, and the online prediction-producing stage will            which leaf nodes have a similarity matrix and internal nodes
take a much shorter time.                                              maintain rating centroids of their subtrees. The prediction
                                                                       is made within the leaf node that the active user belongs to.
                                                                       RecTree scales by O(n log2 (n)) for off-line recommendation
4.1.3. Other Bayesian CF Algorithms. Bayesian belief nets with
                                                                       and O(b) for on-line recommendation, where n is the dataset
decision trees at each node. This model has a decision tree
                                                                       size and b is the partition size, a constant, and it has an
at each node of the BNs, where a node corresponds to each
                                                                       improved accuracy over the Pearson correlation-based CF
item in the domain and the states of each node correspond to
                                                                       when selecting an appropriate size of advisors (cluster of
the possible ratings for each item [9]. Their results show that
                                                                       users).
this model has similar prediction performance to Pearson
                                                                           Ungar and Foster [12] clusters users and items separately
correlation-based CF methods, and has better performance
                                                                       using variations of k-means and Gibbs sampling [82], by
than Bayesian-clustering and vector cosine memory-based
                                                                       clustering users based on the items they rated and clustering
CF algorithms.
                                                                       items based on the users that rated them. Users can be
    Baseline Bayesian model uses a Bayesian belief net with
                                                                       reclustered based on the number of items they rated, and
no arcs (baseline model) for collaborative filtering and rec-
                                                                       items can be similarly re-clustered. Each user is assigned to
ommends items on their overall popularity [75]. However,
                                                                       a class with a degree of membership proportional to the
the performance is suboptimal.
                                                                       similarity between the user and the mean of the class. Their
                                                                       CF performance on synthetic data is good, but not good on
4.2. Clustering CF Algorithms. A cluster is a collection of            real data.
data objects that are similar to one another within the same               A flexible mixture model (FMM) extends existing clus-
cluster and are dissimilar to the objects in other clusters            tering algorithms for CF by clustering both users and items
[76]. The measurement of the similarity between objects is             at the same time, allowing each user and item to be in
determined using metrics such as Minkowski distance and                multiple clusters and modeling the clusters of users and items
Pearson correlation.                                                   separately [15]. Experimental results show that the FMM
     For two data objects, X = (x1 , x2 , . . . , xn ) and Y =         algorithm has better accuracy than the Pearson correlation-
(y1 , y2 , . . . , yn ), the popular Minkowski distance is defined as   based CF algorithm and aspect model [83].
                                                                           Clustering models have better scalability than typical
                                    n
                                                  q                    collaborative filtering methods because they make predic-
                   d(X, Y ) =   q
                                          xi − y i ,           (12)
                                    i=1
                                                                       tions within much smaller clusters rather than the entire
                                                                       customer base [13, 27, 44, 84]. The complex and expen-
where n is the dimension number of the object and xi , yi are          sive clustering computation is run offline. However, its
the values of the ith dimension of object X and Y respectively,        recommendation quality is generally low. It is possible to
10                                                                                              Advances in Artificial Intelligence

improve quality by using numerous fine-grained segments,              4.4. MDP-Based CF Algorithms. Instead of viewing the
but then online user-segment classification becomes almost            recommendation process as a prediction problem, Shani
as expensive as finding similar customers using memory-               et al. [14] views it as a sequential optimization problem
based collaborative filtering [6]. As optimal clustering over         and uses a Markov decision processes (MDPs) model [89] for
large data sets is impractical, most applications use var-           recommender systems.
ious forms of greedy cluster generation techniques. For                   An MDP is a model for sequential stochastic decision
very large datasets, especially those with high dimension-           problems, which is often used in applications where an agent
ality, sampling or dimensionality reduction is also neces-           is influencing its surrounding environment through actions.
sary.                                                                An MDP can be defined as a four-tuple: S, A, R, Pr , where
                                                                     S is a set of states, A is a set of actions, R is a real-valued
4.3. Regression-Based CF Algorithms. For memory-based CF             reward function for each state/action pair, and Pr is the
algorithms, in some cases, two rating vectors may be distant         transition probability between every pair of states given each
in terms of Euclidean distances but they have very high              action.
similarity using vector cosine or Pearson correlation measures,           An optimal solution to the MDP is to maximize the
where memory-based CF algorithms do not fit well and need             function of its reward stream. By starting with an initial
better solutions. Also, numerical ratings are common in real-        policy π0 (s) = arg maxa∈A R(s, a), computing the reward
life recommender systems. Regression methods that are good           value function Vi (s) based on the previous policy, and
at making predictions for numerical values are helpful to            updating the policy with the new value function at each step,
address these problems.                                              the iterations will converge to an optimal policy [90, 91].
     A regression method uses an approximation of the                     In Shani et al. [14], the states of the MDP for the
ratings to make predictions based on a regression model. Let         CF system are k tuples of items, with some null values
X = (X1 , X2 , . . . , Xn ) be a random variable representing a      corresponding to missing items; the actions of the MDP
user’s preferences on different items. The linear regression          correspond to a recommendation of an item; and the rewards
model can be expressed as                                            in the MDP correspond to the utility of selling an item,
                                                                     for example, the net profit. The state following each recom-
                         Y = ΛX + N,                         (13)    mendation is the user’s response to that recommendation,
                                                                     such as taking the recommended item, taking the non-
                                                                     recommended item, or selecting nothing. To handle the large
where Λ is a n × k matrix. N = (N1 , . . . , Nn ) is a random
                                                                     action space, it is assumed that the probability that a user
variable representing noise in user choices, Y is an n × m
                                                                     buys an item depends on his current state, item, and whether
matrix with Yi j is the rating of user i on item j, and X is a
                                                                     or not the item is recommended, but does not depend on the
k × m matrix with each column as an estimate of the value of
                                                                     identity of the other recommended items.
the random variable X (user’s ratings in the k-dimensional
                                                                          Working on an Israeli online bookstore, Mitos, the
rating space) for one user. Typically, the matrix Y is very
                                                                     deployed MDP-recommender system produced a much
sparse.
                                                                     higher profit than the system without using the recom-
    To remedy this, Canny [52] proposed a sparse factor
                                                                     mender. Also, the MDP CF model performs much better
analysis, which replaces missing elements with default voting
                                                                     than the simpler Markov chain (MC) model, which is simply
values (the average of some nonmissing elements, either
                                                                     an MDP without actions [14].
the average by columns, or by rows, or by all), and uses
                                                                          The MDP-Based CF model in Shani et al. [14] can
the regression model as the initialization for Expectation
                                                                     be viewed as approximating a partial observable MDP
Maximization (EM) [85] iterations. According to Canny
                                                                     (POMDP) by using a finite rather than unbounded win-
[52], the sparse factor analysis has better scalability than Pear-
                                                                     dow of past history to define the current state. As the
son correlation-based CF and Personality Diagnosis (PD), a
                                                                     computational and representational complexity of POMDPs
representative hybrid CF algorithm [86], and better accuracy
                                                                     is high, appropriate approaches to tackling these problems
than singular value decomposition (SVD) [23]. Sparse factor
                                                                     must be developed, which are generally classified into three
analysis also protects user privacy, as it supports computation
                                                                     broad strategies: value function approximation [92], policy
on encrypted user data [52].
                                                                     based optimization [84, 93], and stochastic sampling [94].
    Vucetic and Obradovic [87] proposed a regression-
                                                                     The application of these strategies to CF tasks may be an
based approach to CF tasks on numerical ratings data that
                                                                     interesting direction of future research.
searches for similarities between items, builds a collection
of simple linear models, and combines them efficiently to
provide rating predictions for an active user. They used             4.5. Latent Semantic CF Models. A Latent semantic CF
ordinary least squares to estimate the parameters of the linear      technique relies on a statistical modeling technique that
regression function. Their experimental results show the             introduces latent class variables in a mixture model setting
approach has good performance in addressing the sparsity,            to discover user communities and prototypical interest
prediction latency and numerical prediction problems of              profiles. Conceptionally, it decomposes user preferences
CF tasks. Lemire and Maclachlan [88] proposed slope one              using overlapping user communities. The main advantages
algorithms to make faster CF prediction than memory-based            of this technique over standard memory-based methods are
CF algorithms.                                                       its higher accuracy and scalability [7, 95].
Advances in Artificial Intelligence                                                                                               11

    The aspect model, proposed by Hofmann and Puzicha             cyclic. The probability component of a dependency network
[83], is a probabilistic latent-space model, which models         is a set of conditional distributions, one for each node
individual ratings as a convex combination of rating factors.     given its parents. Although less accurate than Bayesian
The latent class variable is associated with each observed pair   belief nets, dependency networks are faster in generating
of {user, item}, with the assumption that users and items are     predictions and require less time and memory to learn
independent from each other given the latent class variable.      [75]. Decision tree CF models treat collaborative filtering
The performance of the aspect model is much better than the       as a classification task and use decision tree as the clas-
clustering model working on the EachMovie dataset [96].           sifier [103]. Horting is a graph-based technique in which
    A multinomial model is a simple probabilistic model for       nodes are users and edges between nodes are degrees
categorical data [9, 97] that assumes there is only one type      of similarity between users [104]. Multiple multiplicative
of user. A multinomial mixture model assumes that there are       factor models (MMFs) are a class of causal, discrete latent
multiple types of users underlying all profiles, and that the      variable models combining factor distributions multiplica-
rating variables are independent with each other and with         tively and are able to readily accommodate missing data
the user’s identity given the user’s type [98]. A user rating     [105]. Probabilistic principal components analysis (pPCA)
profile (URP) model [97] combines the intuitive appeal of the      [52, 106] determines the principal axes of a set of observed
multinomial mixture model and aspect model [83], with the         data vectors through maximum-likelihood estimation of
high-level generative semantics of Latent Dirichlet Allocation    parameters in a latent variable model closely related to
(LDA, a generative probabilistic model, in which each item        factor analysis. Matrix factorization based CF algorithms
is modeled as a finite mixture over an underlying set of           have been proven to be effective to address the scalability
users) [99]. URP performs better than the aspect model and        and sparsity challenges of CF tasks [33, 34, 107]. Wang
multinomial mixtures models for CF tasks.                         et al. showed how the development of collaborative filtering
                                                                  can gain benefits from information retrieval theories and
                                                                  models, and proposed probabilistic relevance CF models
4.6. Other Model-Based CF Techniques. For applications in         [108, 109].
which ordering is more desirable than classifying, Cohen
et al. [100] investigated a two-stage order learning CF
approach to learning to order. In that approach, one first         5. Hybrid Collaborative Filtering Techniques
learns a preference function by conventional means, and then
orders a new set of instances by finding the total ordering that   Hybrid CF systems combine CF with other recommendation
best approximates the preference function, which returns a        techniques (typically with content-based systems) to make
confidence value reflecting how likely that one is preferred        predictions or recommendations.
to another. As the problem of finding the total ordering is             Content-based recommender systems make recommen-
NP-complete, a greedy-order algorithm is used to obtain           dations by analyzing the content of textual information,
an approximately optimal ordering function. Working on            such as documents, URLs, news messages, web logs, item
EachMovie [96], this order learning CF approach performs          descriptions, and profiles about users’ tastes, preferences, and
better than a nearest neighbor CF algorithm and a linear          needs, and finding regularities in the content [110]. Many
regression algorithm.                                             elements contribute to the importance of the textual content,
    Association rule based CF algorithms are more often           such as observed browsing features of the words or pages
used for top-N recommendation tasks than prediction ones.         (e.g., term frequency and inverse document frequency), and
Sarwar et al. [27] describes their approach to using a            similarity between items a user liked in the past [111]. A
traditional association rule mining algorithm to find rules        content-based recommender then uses heuristic methods or
for developing top-N recommender systems. They find the            classification algorithms to make recommendations [112].
top-N items by simply choosing all the rules that meet            Content-based techniques have the start-up problem, in
the thresholds for support and confidence values, sorting          which they must have enough information to build a reliable
items according to the confidence of the rules so that items       classifier. Also, they are limited by the features explicitly asso-
predicted by the rules that have a higher confidence value         ciated with the objects they recommend (sometimes these
are ranked higher, and finally selecting the first N highest        features are hard to extract), while collaborative filtering
ranked items as the recommended set [27]. Fu et al. [101]         can make recommendations without any descriptive data.
develop a system to recommend web pages by using an               Also, content-based techniques have the overspecialization
a priori algorithm to mine association rules over users’          problem, that is, they can only recommend items that score
navigation histories. Leung et al. proposed a collaborative       highly against a user’s profile or his/her rating history [21,
filtering framework using fuzzy association rules and multi-       113].
level similarity [102].                                                Other recommender systems include demographic-based
    Other model-based CF techniques include a maximum             recommender systems, which use user profile information
entropy approach, which clusters the data first, and then in a     such as gender, postcode, occupation, and so forth [114];
given cluster uses maximum entropy as an objective function       utility-based recommender systems and knowledge-based rec-
to form a conditional maximal entropy model to make               ommender systems, both of which require knowledge about
predictions [17]. A dependency network is a graphical model       how a particular object satisfies the user needs [115, 116].
for probabilistic relationships, whose graph is potentially       We will not discuss these systems in detail in this work.
12                                                                                              Advances in Artificial Intelligence

    Hoping to avoid limitations of either recommender               Table 5: Content-boosted CF and its variations (a) content data
system and improve recommendation performance, hybrid               and originally sparse rating data (b) pseudorating data filled by
CF recommenders are combined by adding content-based                content predictor (c) predictions from (weighted) Pearson CF on
characteristics to CF models, adding CF characteristics to          the pseudo rating data.
content-based models, combining CF with content-based or                                          (a)
other systems, or combining different CF algorithms [21,
117].                                                                             Content information       Rating matrix
                                                                               Age Sex Career       zip  I 1 I2 I3 I4 I5
                                                                          U1   32   F     writer   22904          4
5.1. Hybrid Recommenders Incorporating CF and Content-
                                                                          U2   27 M student 10022 2               4 3
Based Features. The content-boosted CF algorithm uses na¨ve   ı
Bayes as the content classifier, it then fills in the missing               U3   24 M engineer 60402           1
values of the rating matrix with the predictions of the                   U4   50   F      other   60804     3 3 3 3
content predictor to form a pseudo rating matrix, in which                U5   28 M educator 85251 1
observed ratings are kept untouched and missing ratings                                           (b)
are replaced by the predictions of a content predictor.
It then makes predictions over the resulting pseudo rat-                                   Pseudo rating data
ings matrix using a weighted Pearson correlation-based CF                 I1        I2         I3          I4               I5
algorithm, which gives a higher weight for the item that                  2         3          4           3                2
more users rated, and gives a higher weight for the active                2         2          4           3                2
user [16] (see an illustration in Table 5). The content-                  3         1          3           4                3
boosted CF recommender has improved prediction perfor-                    3         3          3           3                3
mance over some pure content-based recommenders and                       1         2          4           1                2
some pure memory-based CF algorithms. It also overcomes
                                                                                                  (c)
the cold start problem and tackles the sparsity problem
of CF tasks. Working on reasonably-sized subsets instead                                 Pearson-CF prediction
of the original rating data, Greinemr et al. used TAN-                   I1        I2        I3        I4              I5
ELR [31] as the content-predictor and directly applied the               2         3          4        2               3
Pearson correlation-based CF instead of a weighted one
                                                                         3         4          2        2               3
on the pseudo rating matrix to make predictions, and
                                                                         3         3          2        3               3
they achieved improved CF performance in terms of MAE
[118].                                                                   3         3          3        3               3
                                                                         1         3          1        2               2
    Ansari et al. [8] propose a Bayesian preference model
that statistically integrates several types of information useful
for making recommendations, such as user preferences, user          5.2. Hybrid Recommenders Combining CF and Other Rec-
and item features, and expert evaluations. They use Markov          ommender Systems. A weighted hybrid recommender com-
chain Monte Carlo (MCMC) methods [119] for sampling-                bines different recommendation techniques by their weights,
based inference, which involve sampling parameter estima-           which are computed from the results of all of the available
tion from the full conditional distribution of parameters.          recommendation techniques present in the system [115].
They achieved better performance than pure collaborative            The combination can be linear, the weights can be adjustable
filtering.                                                           [46], and weighted majority voting [110, 122] or weighted
    The recommender Fab, proposed by Balabanovi´ and       c        average voting [118] can be used. For example, the P-
Shoham [117], maintains user profiles of interest in web             Tango system [46] initially gives CF and content-based
pages using content-based techniques, and uses CF tech-             recommenders equal weight, but gradually adjusts the
niques to identify profiles with similar tastes. It can then         weighting as predictions about user ratings are confirmed or
recommend documents across user profiles. Sarwar et al.              disconfirmed. The strategy of the P-Tango system is similar
[120] implemented a set of knowledge-based “filterbots”              to boosting [123].
as artificial users using certain criteria. A straightforward            A switching hybrid recommender switches between rec-
example of a filterbot is a genrebot, which bases its opinion        ommendation techniques using some criteria, such as con-
solely on the genre of the item, for example, a “jazzbot”           fidence levels for the recommendation techniques. When the
would give a full mark to a CD simply because it is in the jazz     CF system cannot make a recommendation with sufficient
category, while it would give a low score to any other CD in        confidence, then another recommender system such as a
the database. Mooney and Roy [121] use the prediction from          content-based system is attempted. Switching hybrid recom-
the CF system as the input to a content-based recommender.          menders also introduce the complexity of parameterization
Condiff et al. [113] propose a Bayesian mixed-effects model           for the switching criteria [115].
that integrates user ratings, user, and item features in a single       Other hybrid recommenders in this category include
unified framework. The CF system Ripper, proposed by Basu            mixed hybrid recommenders [124], cascade hybrid recom-
et al. [71], uses both user ratings and contents features to        menders [115], meta-level recommenders [110, 115, 117, 125],
produce recommendations.                                            and so forth.
Advances in Artificial Intelligence                                                                                              13

    Many papers empirically compared the performance of          be broadly classified into the following broad categories:
hybrid recommenders with the pure CF and content-based           predictive accuracy metrics, such as Mean Absolute Error
methods and found that hybrid recommenders may make              (MAE) and its variations; classification accuracy metrics, such
more accurate recommendations, especially for the new user       as precision, recall, F1-measure, and ROC sensitivity; rank
and new item situations where a regular CF algorithm cannot      accuracy metrics, such as Pearson’s product-moment correla-
make satisfactory recommendations. However, hybrid rec-          tion, Kendall’s Tau, Mean Average Precision (MAP), half-life
ommenders rely on external information that is usually not       utility [9], and normalized distance-based performance metric
available, and they generally have increased complexity of       (NDPM) [128].
implementation [110, 115, 126].                                       We only introduce the commonly-used CF metrics
                                                                 MAE, NMAE, RMSE, and ROC sensitivity here. For other
                                                                 CF performance metrics of recommendation quality, see
5.3. Hybrid Recommenders Combining CF Algorithms. The
                                                                 [60]. There are other evaluations of recommender systems
two major classes of CF approaches, memory-based and
                                                                 including usability evaluation [129] and so forth.
model-based CF approaches, can be combined to form
hybrid CF approaches. The recommendation performances
of these algorithms are generally better than some pure          6.1. Mean Absolute Error (MAE) and Normalized Mean
memory-based CF algorithms and model-based CF algo-              Absolute Error (NMAE). Instead of classification accuracy
rithms [22, 86].                                                 or classification error, the most widely used metric in CF
     Probabilistic memory-based collaborative filtering (PMCF)    research literature is Mean Absolute Error (MAE) [3, 60],
combines memory-based and model-based techniques [22].           which computes the average of the absolute difference
They use a mixture model built on the basis of a set             between the predictions and true ratings
of stored user profiles and use the posterior distribution
of user ratings to make predictions. To address the new                                        {i, j }   pi, j − ri, j
user problem, an active learning extension to the PMCF                             MAE =                                 ,    (14)
                                                                                                          n
system can be used to actively query a user for additional
information when insufficient information is available. To         where n is the total number of ratings over all users, pi, j is
reduce the computation time, PMCF selects a small subset         the predicted rating for user i on item j, and ri, j is the actual
called profile space from the entire database of user ratings     rating. The lower the MAE, the better the prediction.
and gives predictions from the small profile space instead            Different recommender systems may use different
of the whole database. PMCF has better accuracy than the         numerical rating scales. Normalized Mean Absolute Error
Pearson correlation-based CF and the model-based CF using        (NMAE) normalizes MAE to express errors as percentages
na¨ve Bayes.
   ı                                                             of full scale [3]:
     Personality diagnosis (PD) is a representative hybrid CF
approach that combines memory-based and model-based CF                                              MAE
algorithms and retains some advantages of both algorithms                            NMAE =                   ,               (15)
                                                                                                  rmax − rmin
[86]. In PD, the active user is assumingly generated by
choosing one of the other users uniformly at random and          where rmax and rmin are the upper and lower bounds of the
adding Gaussian noise to his or her ratings. Given the active    ratings.
user’s known ratings, we can calculate the probability that
he or she is the same “personality type” as other users, and
the probability he or she will like the new items. PD can also   6.2. Root Mean Squared Error (RMSE). Root Mean Squared
be regarded as a clustering method with exactly one user per     Error (RMSE) is becoming popular partly because it is
cluster. Working on EachMovie [96] and CiteSeer [127], PD        the Netflix prize [20] metric for movie recommendation
makes better predictions than Pearson correlation-based and      performance:
vector similarity-based CF algorithms and the two model-
based algorithms, Bayesian clustering and Bayesian network,                                   1                       2
                                                                                 RMSE =                 pi, j − ri, j ,       (16)
investigated by Breese et al. [9].                                                            n {i, j }
     As an ensemble classifier is able to give more accurate
prediction than a member classifier, a hybrid CF system that
                                                                 where n is the total number of ratings over all users, pi, j
combines different CF algorithms using an ensemble scheme
                                                                 is the predicted rating for user i on item j, and ri, j is the
will also be helpful to improve predictive performance of CF
                                                                 actual rating again. RMSE amplifies the contributions of the
tasks [118].
                                                                 absolute errors between the predictions and the true values.
                                                                     Although accuracy metrics have greatly helped the field
6. Evaluation Metrics                                            of recommender systems, the recommendations that are
                                                                 most accurate are sometimes not the ones that are most
The quality of a recommender system can be decided on            useful to users, for example, users might prefer to be
the result of evaluation. The type of metrics used depends       recommended with items that are unfamiliar with them,
on the type of CF applications. According to Herlocker           rather than the old favorites they do not likely want again
et al. [60], metrics evaluating recommendation systems can       [130]. We therefore need to explore other evaluation metrics.
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques
A Survey Of Collaborative Filtering Techniques

Más contenido relacionado

La actualidad más candente

Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonHetu Bhavsar
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental AnalysisIRJET Journal
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysisRahul Jha
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitterpiya chauhan
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...IRJET Journal
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET Journal
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysissneha penmetsa
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET Journal
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering ShowcaseTucker Truesdale
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learningijtsrd
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment AnalysisRexNige
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter DataNurendra Choudhary
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment reviewLalit Jain
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...eSAT Journals
 

La actualidad más candente (20)

Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
IRJET - Twitter Sentimental Analysis
IRJET -  	  Twitter Sentimental AnalysisIRJET -  	  Twitter Sentimental Analysis
IRJET - Twitter Sentimental Analysis
 
Twitter sentiment analysis
Twitter sentiment analysisTwitter sentiment analysis
Twitter sentiment analysis
 
Sentiment Analysis Using Twitter
Sentiment Analysis Using TwitterSentiment Analysis Using Twitter
Sentiment Analysis Using Twitter
 
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...IRJET-  	  Twitter Sentimental Analysis for Predicting Election Result using ...
IRJET- Twitter Sentimental Analysis for Predicting Election Result using ...
 
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2VecIRJET-  	  Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
IRJET- Improved Real-Time Twitter Sentiment Analysis using ML & Word2Vec
 
project sentiment analysis
project sentiment analysisproject sentiment analysis
project sentiment analysis
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Project report
Project reportProject report
Project report
 
IRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic RegressionIRJET- Fake News Detection using Logistic Regression
IRJET- Fake News Detection using Logistic Regression
 
Opinion Mining – Twitter
Opinion Mining – TwitterOpinion Mining – Twitter
Opinion Mining – Twitter
 
Final Poster for Engineering Showcase
Final Poster for Engineering ShowcaseFinal Poster for Engineering Showcase
Final Poster for Engineering Showcase
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Fake News Detection using Machine Learning
Fake News Detection using Machine LearningFake News Detection using Machine Learning
Fake News Detection using Machine Learning
 
Prediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social NetworksPrediction of Reaction towards Textual Posts in Social Networks
Prediction of Reaction towards Textual Posts in Social Networks
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Sentiment analysis of Twitter Data
Sentiment analysis of Twitter DataSentiment analysis of Twitter Data
Sentiment analysis of Twitter Data
 
Amazon Product Sentiment review
Amazon Product Sentiment reviewAmazon Product Sentiment review
Amazon Product Sentiment review
 
Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...Design, analysis and implementation of geolocation based emotion detection te...
Design, analysis and implementation of geolocation based emotion detection te...
 
Social Data Mining
Social Data MiningSocial Data Mining
Social Data Mining
 

Destacado (8)

BYO3D 2011: Emerging Technology
BYO3D 2011: Emerging TechnologyBYO3D 2011: Emerging Technology
BYO3D 2011: Emerging Technology
 
BYO3D 2011: History
BYO3D 2011: HistoryBYO3D 2011: History
BYO3D 2011: History
 
BYO3D 2011: Construction
BYO3D 2011: ConstructionBYO3D 2011: Construction
BYO3D 2011: Construction
 
BYO3D 2011: Interlacing
BYO3D 2011: InterlacingBYO3D 2011: Interlacing
BYO3D 2011: Interlacing
 
BYO3D 2011: Rendering
BYO3D 2011: RenderingBYO3D 2011: Rendering
BYO3D 2011: Rendering
 
Dependency Injection with PHP and PHP 5.3
Dependency Injection with PHP and PHP 5.3Dependency Injection with PHP and PHP 5.3
Dependency Injection with PHP and PHP 5.3
 
How to build a recommender system?
How to build a recommender system?How to build a recommender system?
How to build a recommender system?
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 

Similar a A Survey Of Collaborative Filtering Techniques

A literature survey on recommendation
A literature survey on recommendationA literature survey on recommendation
A literature survey on recommendationaciijournal
 
A Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental AnalysisA Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental Analysisaciijournal
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative FilteringScott Donald
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
typicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendationtypicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendationswathi78
 
An Adaptive Framework for Enhancing Recommendation Using Hybrid Technique
An Adaptive Framework for Enhancing Recommendation Using Hybrid TechniqueAn Adaptive Framework for Enhancing Recommendation Using Hybrid Technique
An Adaptive Framework for Enhancing Recommendation Using Hybrid Techniqueijcsit
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015Journal For Research
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systemsvivatechijri
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...Content Savvy
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET Journal
 
Typicality-Based Collaborative Filtering Recommendation
Typicality-Based Collaborative Filtering RecommendationTypicality-Based Collaborative Filtering Recommendation
Typicality-Based Collaborative Filtering RecommendationPapitha Velumani
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking ijcseit
 
System For Product Recommendation In E-Commerce Applications
System For Product Recommendation In E-Commerce ApplicationsSystem For Product Recommendation In E-Commerce Applications
System For Product Recommendation In E-Commerce ApplicationsIJERD Editor
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems BasicsJarin Tasnim Khan
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfAmber Ford
 
An Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User ModelsAn Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User ModelsDaniel Wachtel
 

Similar a A Survey Of Collaborative Filtering Techniques (20)

A literature survey on recommendation
A literature survey on recommendationA literature survey on recommendation
A literature survey on recommendation
 
A Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental AnalysisA Literature Survey on Recommendation System Based on Sentimental Analysis
A Literature Survey on Recommendation System Based on Sentimental Analysis
 
Advances In Collaborative Filtering
Advances In Collaborative FilteringAdvances In Collaborative Filtering
Advances In Collaborative Filtering
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Recommender-technology-ReColl08
Recommender-technology-ReColl08Recommender-technology-ReColl08
Recommender-technology-ReColl08
 
typicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendationtypicality-based collaborative filtering recommendation
typicality-based collaborative filtering recommendation
 
An Adaptive Framework for Enhancing Recommendation Using Hybrid Technique
An Adaptive Framework for Enhancing Recommendation Using Hybrid TechniqueAn Adaptive Framework for Enhancing Recommendation Using Hybrid Technique
An Adaptive Framework for Enhancing Recommendation Using Hybrid Technique
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
A REVIEW PAPER ON BFO AND PSO BASED MOVIE RECOMMENDATION SYSTEM | J4RV4I1015
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extr...
 
IRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation SystemIRJET- Hybrid Book Recommendation System
IRJET- Hybrid Book Recommendation System
 
Typicality-Based Collaborative Filtering Recommendation
Typicality-Based Collaborative Filtering RecommendationTypicality-Based Collaborative Filtering Recommendation
Typicality-Based Collaborative Filtering Recommendation
 
243
243243
243
 
Recommendation System Using Social Networking
Recommendation System Using Social Networking Recommendation System Using Social Networking
Recommendation System Using Social Networking
 
System For Product Recommendation In E-Commerce Applications
System For Product Recommendation In E-Commerce ApplicationsSystem For Product Recommendation In E-Commerce Applications
System For Product Recommendation In E-Commerce Applications
 
Recommendation Systems Basics
Recommendation Systems BasicsRecommendation Systems Basics
Recommendation Systems Basics
 
20120140506003
2012014050600320120140506003
20120140506003
 
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdfA Literature Survey on Recommendation Systems for Scientific Articles.pdf
A Literature Survey on Recommendation Systems for Scientific Articles.pdf
 
An Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User ModelsAn Experiment In Cross-Representation Mediation Of User Models
An Experiment In Cross-Representation Mediation Of User Models
 

Último

Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...Pooja Nehwal
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...anjaliyadav012327
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxShobhayan Kirtania
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDThiyagu K
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...fonyou31
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 

Último (20)

Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...Russian Call Girls in Andheri Airport Mumbai WhatsApp  9167673311 💞 Full Nigh...
Russian Call Girls in Andheri Airport Mumbai WhatsApp 9167673311 💞 Full Nigh...
 
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
JAPAN: ORGANISATION OF PMDA, PHARMACEUTICAL LAWS & REGULATIONS, TYPES OF REGI...
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
The byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptxThe byproduct of sericulture in different industries.pptx
The byproduct of sericulture in different industries.pptx
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
Ecosystem Interactions Class Discussion Presentation in Blue Green Lined Styl...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 

A Survey Of Collaborative Filtering Techniques

  • 1. Hindawi Publishing Corporation Advances in Artificial Intelligence Volume 2009, Article ID 421425, 19 pages doi:10.1155/2009/421425 Review Article A Survey of Collaborative Filtering Techniques Xiaoyuan Su and Taghi M. Khoshgoftaar Department of Computer Science and Engineering, Florida Atlantic University, 777 Glades Road, Boca Raton, FL 33431, USA Correspondence should be addressed to Xiaoyuan Su, suxiaoyuan@gmail.com Received 9 February 2009; Accepted 3 August 2009 Recommended by Jun Hong As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we first introduce CF tasks and their main challenges, such as data sparsity, scalability, synonymy, gray sheep, shilling attacks, privacy protection, etc., and their possible solutions. We then present three main categories of CF techniques: memory-based, model- based, and hybrid CF algorithms (that combine CF with other recommendation techniques), with examples for representative algorithms of each category, and analysis of their predictive performance and their ability to address the challenges. From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area. Copyright © 2009 X. Su and T. M. Khoshgoftaar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. 1. Introduction user, ui , has a list of items, Iui , which the user has rated, or about which their preferences have been inferred through In everyday life, people rely on recommendations from their behaviors. The ratings can either be explicit indications, other people by spoken words, reference letters, news reports and so forth, on a 1–5 scale, or implicit indications, such from news media, general surveys, travel guides, and so as purchases or click-throughs [4]. For example, we can forth. Recommender systems assist and augment this natural convert the list of people and the movies they like or dislike social process to help people sift through available books, (Table 1(a)) to a user-item ratings matrix (Table 1(b)), articles, webpages, movies, music, restaurants, jokes, grocery in which Tony is the active user that we want to make products, and so forth to find the most interesting and recommendations for. There are missing values in the matrix valuable information for them. The developers of one of where users did not give their preferences for certain items. the first recommender systems, Tapestry [1] (other earlier There are many challenges for collaborative filtering recommendation systems include rule-based recommenders tasks (Section 2). CF algorithms are required to have the and user-customization), coined the phrase “collaborative ability to deal with highly sparse data, to scale with the filtering (CF),” which has been widely adopted regardless of increasing numbers of users and items, to make satisfactory the facts that recommenders may not explicitly collaborate recommendations in a short time period, and to deal with with recipients and recommendations may suggest particu- other problems like synonymy (the tendency of the same or larly interesting items, in addition to indicating those that similar items to have different names), shilling attacks, data should be filtered out [2]. The fundamental assumption noise, and privacy protection problems. of CF is that if users X and Y rate n items similarly, or Early generation collaborative filtering systems, such as have similar behaviors (e.g., buying, watching, listening), and GroupLens [5], use the user rating data to calculate the simi- hence will rate or act on other items similarly [3]. larity or weight between users or items and make predictions CF techniques use a database of preferences for items or recommendations according to those calculated similarity by users to predict additional topics or products a new user values. The so-called memory-based CF methods (Section 3) might like. In a typical CF scenario, there is a list of m users are notably deployed into commercial systems such as {u1 , u2 , . . . , um } and a list of n items {i1 , i2 , . . . , in }, and each http://www.amazon.com/ (see an example in Figure 1) and
  • 2. 2 Advances in Artificial Intelligence Table 1: An example of a user-item matrix. (a) Alice: (like) Shrek, Snow White, (dislike) Superman Bob: (like) Snow White, Superman, (dislike) spiderman Chris: (like) spiderman, (dislike) Snow white Tony: (like) Shrek, (dislike) Spiderman (b) Shrek Snow White Spider-man Super-man Alice Like Like Dislike Bob Like Dislike Like Chris Dislike Like Tony Like Dislike ? Figure 1: Amazon recommends products to customers by cus- tomizing CF systems. Barnes and Noble, because they are easy-to-implement and To evaluate CF algorithms (Section 6), we need to use highly effective [6, 7]. Customization of CF systems for each metrics according to the types of CF application. Instead user decreases the search effort for users. It also promises of classification error, the most widely used evaluation a greater customer loyalty, higher sales, more advertising metric for prediction performance of CF is Mean Absolute revenues, and the benefit of targeted promotions [8]. Error (MAE). Precision and recall are widely used metrics However, there are several limitations for the memory- for ranked lists of returned items in information retrieval based CF techniques, such as the fact that the similarity research. ROC sensitivity is often used as a decision support values are based on common items and therefore are accuracy metric. unreliable when data are sparse and the common items are As drawing convincing conclusions from artificial data is therefore few. To achieve better prediction performance and risky, data from live experiments are more desirable for CF overcome shortcomings of memory-based CF algorithms, research. The commonly used CF databases are MovieLens model-based CF approaches have been investigated. Model- [18], Jester [19], and Netflix prize data [20]. In Section 7, we based CF techniques (Section 4) use the pure rating data give the conclusion and discussion of this work. to estimate or learn a model to make predictions [9]. The model can be a data mining or machine learning algorithm. Well-known model-based CF techniques include Bayesian 2. Characteristics and Challenges of belief nets (BNs) CF models [9–11], clustering CF models Collaborative Filtering [12, 13], and latent semantic CF models [7]. An MDP E-commerce recommendation algorithms often operate (Markov decision process)-based CF system [14] produces in a challenging environment, especially for large online a much higher profit than a system that has not deployed the shopping companies like eBay and Amazon. Usually, a recommender. recommender system providing fast and accurate recom- Besides collaborative filtering, content-based filtering is mendations will attract the interest of customers and bring another important class of recommender systems. Content- benefits to companies. For CF systems, producing high- based recommender systems make recommendations by quality predictions or recommendations depends on how analyzing the content of textual information and finding well they address the challenges, which are characteristics of regularities in the content. The major difference between CF tasks as well. CF and content-based recommender systems is that CF only uses the user-item ratings data to make predictions and recommendations, while content-based recommender 2.1. Data Sparsity. In practice, many commercial recom- systems rely on the features of users and items for predictions mender systems are used to evaluate very large product [15]. Both content-based recommender systems and CF sets. The user-item matrix used for collaborative filtering systems have limitations. While CF systems do not explicitly will thus be extremely sparse and the performances of the incorporate feature information, content-based systems do predictions or recommendations of the CF systems are not necessarily incorporate the information in preference challenged. similarity across individuals [8]. The data sparsity challenge appears in several situations, Hybrid CF techniques, such as the content-boosted CF specifically, the cold start problem occurs when a new user algorithm [16] and Personality Diagnosis (PD) [17], com- or item has just entered the system, it is difficult to find bine CF and content-based techniques, hoping to avoid similar ones because there is not enough information (in the limitations of either approach and thereby improve some literature, the cold start problem is also called the recommendation performance (Section 5). new user problem or new item problem [21, 22]). New items A brief overview of CF techniques is depicted in Table 2. cannot be recommended until some users rate it, and new
  • 3. Advances in Artificial Intelligence 3 Table 2: Overview of collaborative filtering techniques. CF categories Representative techniques Main advantages Main shortcomings ∗ ∗ Neighbor-based CF easy implementation ∗ are dependent on human ratings (item-based/user-based CF ∗ new data can be added easily and ∗ performance decrease when data Memory-based CF algorithms with Pearson/vector incrementally are sparse cosine correlation) ∗ ∗ ∗ need not consider the content of cannot recommend for new users Item-based/user-based top-N the items being recommended and items recommendations ∗ ∗ have limited scalability for large scale well with co-rated items datasets ∗ ∗ Bayesian belief nets CF better address the sparsity, ∗ expensive model-building ∗ clustering CF scalability and other problems ∗ ∗ Model-based CF MDP-based CF ∗ have trade-off between prediction ∗ improve prediction performance latent semantic CF performance and scalability ∗ ∗ sparse factor analysis ∗ give an intuitive rationale for lose useful information for ∗ CF using dimensionality recommendations dimensionality reduction reduction techniques, for example, techniques SVD, PCA ∗ ∗ overcome limitations of CF and ∗ content-based CF recommender, have increased complexity and content-based or other for example, Fab expense for implementation Hybrid recommenders recommenders ∗ ∗ ∗ need external information that content-boosted CF improve prediction performance usually not available ∗ hybrid CF combining ∗ memory-based and model-based overcome CF problems such as CF algorithms, for example, sparsity and gray sheep Personality Diagnosis users are unlikely given good recommendations because of Hybrid CF algorithms, such as the content-boosted CF the lack of their rating or purchase history. Coverage can algorithm [16], are found helpful to address the sparsity be defined as the percentage of items that the algorithm problem, in which external content information can be used could provide recommendations for. The reduced coverage to produce predictions for new users or new items. In Ziegler problem occurs when the number of users’ ratings may be et al. [28], a hybrid collaborative filtering approach was very small compared with the large number of items in proposed to exploit bulk taxonomic information designed the system, and the recommender system may be unable to for exact product classification to address the data sparsity generate recommendations for them. Neighbor transitivity problem of CF recommendations, based on the generation refers to a problem with sparse databases, in which users of profiles via inference of super-topic score and topic with similar tastes may not be identified as such if they diversification [28]. Schein et al. proposed the aspect model have not both rated any of the same items. This could latent variable method for cold start recommendation, which reduce the effectiveness of a recommendation system which combines both collaborative and content information in relies on comparing users in pairs and therefore generating model fitting [29]. Kim and Li proposed a probabilistic predictions. model to address the cold start problem, in which items are To alleviate the data sparsity problem, many approaches classified into groups and predictions are made for users have been proposed. Dimensionality reduction techniques, considering the Gaussian distribution of user ratings [30]. such as Singular Value Decomposition (SVD) [23], remove Model-based CF algorithms, such as TAN-ELR (tree aug- unrepresentative or insignificant users or items to reduce mented na¨ve Bayes optimized by extended logistic regres- ı the dimensionalities of the user-item matrix directly. The sion) [11, 31], address the sparsity problem by providing patented Latent Semantic Indexing (LSI) used in information more accurate predictions for sparse data. Some new model- retrieval is based on SVD [24, 25], in which similarity based CF techniques that tackle the sparsity problem include between users is determined by the representation of the the association retrieval technique, which applies an asso- users in the reduced space. Goldberg et al. [3] developed ciative retrieval framework and related spreading activation eigentaste, which applies Principle Component Analysis algorithms to explore transitive associations among users (PCA), a closely-related factor analysis technique first through their rating and purchase history [32]; Maximum described by Pearson in 1901 [26], to reduce dimensionality. margin matrix factorizations (MMMF), a convex, infinite However, when certain users or items are discarded, useful dimensional alternative to low-rank approximations and information for recommendations related to them may standard factor models [33, 34]; ensembles of MMMF get lost and recommendation quality may be degraded [6, [35]; multiple imputation-based CF approaches [36]; and 27]. imputation-boosted CF algorithms [37].
  • 4. 4 Advances in Artificial Intelligence 2.2. Scalability. When numbers of existing users and items and ignore the smaller, less important ones. The performance grow tremendously, traditional CF algorithms will suffer of LSI in addressing the synonymy problem is impressive serious scalability problems, with computational resources at higher recall levels where precision is ordinarily quite going beyond practical or acceptable levels. For example, low, thus representing large proportional improvements. with tens of millions of customers (M) and millions of However, the performance of the LSI method at the lowest distinct catalog items (N), a CF algorithm with the com- levels of recall is poor [25]. plexity of O(n) is already too large. As well, many systems The LSI method gives only a partial solution to the need to react immediately to online requirements and make polysemy problem, which refers to the fact that most words recommendations for all users regardless of their purchases have more than one distinct meaning [25]. and ratings history, which demands a high scalability of a CF system [6]. Dimensionality reduction techniques such as SVD can 2.4. Gray Sheep. Gray sheep refers to the users whose deal with the scalability problem and quickly produce opinions do not consistently agree or disagree with any group good quality recommendations, but they have to undergo of people and thus do not benefit from collaborative filtering expensive matrix factorization steps. An incremental SVD [46]. Black sheep are the opposite group whose idiosyncratic CF algorithm [38] precomputes the SVD decomposition tastes make recommendations nearly impossible. Although using existing users. When a new set of ratings are added this is a failure of the recommender system, non-electronic to the database, the algorithm uses the folding-in projection recommenders also have great problems in these cases, so technique [25, 39] to build an incremental system without re- black sheep is an acceptable failure [47]. computing the low-dimensional model from scratch. Thus it Claypool et al. provided a hybrid approach combining makes the recommender system highly scalable. content-based and CF recommendations by basing a predic- Memory-based CF algorithms, such as the item-based tion on a weighted average of the content-based prediction Pearson correlation CF algorithm can achieve satisfactory and the CF prediction. In that approach, the weights of the scalability. Instead of calculating similarities between all pairs content-based and CF predictions are determined on a per- of items, item-based Pearson CF calculates the similarity user basis, allowing the system to determine the optimal only between the pair of co-rated items by a user [6, mix of content-based and CF recommendation for each user, 40]. A simple Bayesian CF algorithm tackles the scalability helping to solve the gray sheep problem [46]. problem by making predictions based on observed ratings [41]. Model-based CF algorithms, such as clustering CF 2.5. Shilling Attacks. In cases where anyone can provide algorithms, address the scalability problem by seeking users recommendations, people may give tons of positive rec- for recommendation within smaller and highly similar ommendations for their own materials and negative rec- clusters instead of the entire database [13, 42–44], but there ommendations for their competitors. It is desirable for CF are tradeoffs between scalability and prediction performance. systems to introduce precautions that discourage this kind of phenomenon [2]. Recently, the shilling attacks models for collaborative 2.3. Synonymy. Synonymy refers to the tendency of a filtering system have been identified and their effectiveness number of the same or very similar items to have different has been studied. Lam and Riedl found that item-based names or entries. Most recommender systems are unable CF algorithm was much less affected by the attacks than to discover this latent association and thus treat these the user-based CF algorithm, and they suggest that new products differently. For example, the seemingly different ways must be used to evaluate and detect shilling attacks on items “children movie” and “children film” are actual the recommender systems [48]. Attack models for shilling the same item, but memory-based CF systems would find no item-based CF systems have been examined by Mobasher match between them to compute similarity. Indeed, the et al., and alternative CF systems such as hybrid CF systems degree of variability in descriptive term usage is greater than and model-based CF systems were believed to have the ability commonly suspected. The prevalence of synonyms decreases to provide partial solutions to the bias injection problem the recommendation performance of CF systems. [49]. O’Mahony et al. contributed to solving the shilling Previous attempts to solve the synonymy problem attacks problem by analyzing robustness, a recommender depended on intellectual or automatic term expansion, or system’s resilience to potentially malicious perturbations in the construction of a thesaurus. The drawback for fully the customer/product rating matrix [50]. automatic methods is that some added terms may have Bell and Koren [51] used a comprehensive approach to different meanings from intended, thus leading to rapid the shilling attacks problem by removing global effects in degradation of recommendation performance [45]. the data normalization stage of the neighbor-based CF, and The SVD techniques, particularly the Latent Semantic working with residual of global effects to select neighbors. Indexing (LSI) method, are capable of dealing with the They achieved improved CF performance on the Netflix [20] synonymy problems. SVD takes a large matrix of term- data. document association data and construct a semantic space where terms and documents that are closely associated are placed closely to each other. SVD allows the arrangement of 2.6. Other Challenges. As people may not want their habits the space to reflect the major associative patterns in the data, or views widely known, CF systems also raise concerns about
  • 5. Advances in Artificial Intelligence 5 Table 3: The Nexflix Prize Leaderboard as of July 2009. neighborhood models [58]. Some interesting research papers on the Netflix prize challenge can be found in the 2008 KDD Rank Team Best RMSE score Improvement (%) Netflix Workshop (http://netflixkddworkshop2008.info/). BellKor’s Pragmatic 1 0.8556 10.07 Chaos 2 Grand Prize Team 0.8571 9.91 3. Memory-Based Collaborative 3 Opera Solutions and 0.8573 9.89 Filtering Techniques Vandelay United 4 Vandelay Industries! 0.8579 9.83 Memory-based CF algorithms use the entire or a sample of 5 Pragmatic Theory 0.8582 9.80 the user-item database to generate a prediction. Every user is 6 0.8590 9.71 part of a group of people with similar interests. By identifying BellKor in BigChaos the so-called neighbors of a new user (or active user), a 7 Dace 0.8605 9.55 prediction of preferences on new items for him or her can 8 Opera Solutions 0.8611 9.49 be produced. 9 BellKor 0.8612 9.48 The neighborhood-based CF algorithm, a prevalent 10 BigChaos 0.8613 9.47 memory-based CF algorithm, uses the following steps: cal- culate the similarity or weight, wi, j , which reflects distance, correlation, or weight, between two users or two items, i Table 4: A simple example of ratings matrix. and j; produce a prediction for the active user by taking the I1 I2 I3 I4 weighted average of all the ratings of the user or item on U1 4 ? 5 5 a certain item or user, or using a simple weighted average U2 4 2 1 [40]. When the task is to generate a top-N recommendation, we need to find k most similar users or items (nearest U3 3 2 4 neighbors) after computing the similarities, then aggregate U4 4 4 the neighbors to get the top-N most frequent items as the U5 2 1 3 5 recommendation. personal privacy. Miller et al. [4] and Canny [52] find ways 3.1. Similarity Computation. Similarity computation be- to protect users’ privacy for CF recommendation tasks. tween items or users is a critical step in memory-based Increased noise (or sabotage) is another challenge, as collaborative filtering algorithms. For item-based CF algo- the user population becomes more diverse. Ensembles of rithms, the basic idea of the similarity computation between maximum margin matrix factorizations [35] and instance item i and item j is first to work on the users who have selection techniques [53] are found useful to address the rated both of these items and then to apply a similarity noise problems of CF tasks. As Dempster-Shafer (DS) computation to determine the similarity, wi, j , between the theory [54, 55] and imputation techniques [56] have been two co-rated items of the users [40]. For a user-based CF successfully applied to accommodate imperfect and noisy algorithm, we first calculate the similarity, wu,v , between the data for knowledge representation and classification tasks, users u and v who have both rated the same items. they are also potentially useful to deal with the noise problem There are many different methods to compute similarity of CF tasks. or weight between users or items. Explainability is another important aspect of recom- mender systems. An intuitive reasoning such as “you will like this book because you liked those books” will be appealing 3.1.1. Correlation-Based Similarity. In this case, similarity and beneficial to readers, regardless of the accuracy of the wu,v between two users u and v, or wi, j between two items explanations [57]. i and j, is measured by computing the Pearson correlation or other correlation-based similarities. Pearson correlation measures the extent to which two 2.7. The Netflix Prize Challenge. Launched in October 2006, variables linearly relate with each other [5]. For the user- the Netflix prize challenge [20] attracted thousands of based algorithm, the Pearson correlation between users u and researchers to compete in the million-dollar-prize race for v is a most improved performance for movie recommendations. The challenge is featured with a large-scale industrial dataset (with 480,000 users and 17,770 movies), and a rigid i∈I ru,i − r u rv,i − r v wu,v = , (1) performance metric of RMSE (see detailed description in 2 2 i∈I ru,i − r u i∈I rv,i − r v Section 6). Up to July, 2009, the Leaderboard on the Netflix prize competition is as Table 3, in which the leading team “BellKor where the i ∈ I summations are over the items that both the in Pragmatic Chaos” (with 10.05% improved RMSE over the users u and v have rated and r u is the average rating of the Netflix movie recommendation system: Cinematch) based co-rated items of the uth user. In an example in Table 4, we their solution on a merged model of latent factor and have w1,5 = 0.756.
  • 6. 6 Advances in Artificial Intelligence 1 2 ··· i j ··· m − 1 m where “•” denotes the dot-product of the two vectors. To 1 R ? get the desired similarity computation, for n items, an n × n 2 R R similarity matrix is computed [27]. For example, if the vector . . A = {x1 , y1 }, vector B = {x2 , y2 }, the vector cosine similarity . between A and B is l R R . A• B x1 x2 + y 1 y 2 . wA,B = cos A, B = = . . A ∗ B 2 2 2 2 x1 + y 1 x2 + y 2 n−1 ? R (4) n R R In an actual situation, different users may use different Figure 2: item-based similarity (wi, j ) calculation based on the co- rating scales, which the vector cosine similarity cannot take rated items i and j from users 2, l and n. into account. To address this drawback, adjusted cosine sim- ilarity is used by subtracting the corresponding user average from each co-rated pair. The Adjusted cosine similarity has the same formula as Pearson correlation (2). In fact, Pearson For the item-based algorithm, denote the set of users u ∈ correlation performs cosine similarity with some sort of U who rated both items i and j, then the Pearson Correlation normalization of the user’s ratings according to his own will be rating behavior. Hence, we may get negative values with Pearson correlation, but not with cosine similarity, supposing u∈U ru,i − r i ru, j − r j we have an n-point rating scale. wi, j = , (2) u∈U (ru,i − r i )2 u∈U (ru, j − r j )2 3.1.3. Other Similarities. Another similarity measure is con- ditional probability-based similarity [62, 63]. As it is not where ru,i is the rating of user u on item i, r i is the average commonly-used, we will not discuss it in detail in this paper. rating of the ith item by those users, see Figure 2 [40]. Some variations of item-based and user-based Pearson correlations can be found in [59]. The Pearson correlation- 3.2. Prediction and Recommendation Computation. To obtain based CF algorithm is a representative CF algorithm, and is predictions or recommendations is the most important step widely used in the CF research community. in a collaborative filtering system. In the neighborhood- Other correlation-based similarities include: constrained based CF algorithm, a subset of nearest neighbors of the Pearson correlation, a variation of Pearson correlation that active user are chosen based on their similarity with him uses midpoint instead of mean rate; Spearman rank correla- or her, and a weighted aggregate of their ratings is used to tion, similar to Pearson correlation, except that the ratings are generate predictions for the active user [64]. ranks; and Kendall’s τ correlation, similar to the Spearman rank correlation, but instead of using ranks themselves, only 3.2.1. Weighted Sum of Others’ Ratings. To make a prediction the relative ranks are used to calculate the correlation [3, for the active user, a, on a certain item, i, we can take a 60]. weighted average of all the ratings on that item according to Usually the number of users in the computation of the following formula [5]: similarity is regarded as the neighborhood size of the active u∈U ru,i − r u · wa,u user, and similarity based CF is deemed as neighborhood- Pa,i = r a + , (5) based CF. u∈U wa,u where r a and r u are the average ratings for the user a and 3.1.2. Vector Cosine-Based Similarity. The similarity between user u on all other rated items, and wa,u is the weight between two documents can be measured by treating each document the user a and user u. The summations are over all the users as a vector of word frequencies and computing the cosine u ∈ U who have rated the item i. For the simple example of the angle formed by the frequency vectors [61]. This in Table 4, using the user-based CF algorithm, to predict the formalism can be adopted in collaborative filtering, which rating for U1 on I2 , we have uses users or items instead of documents and ratings instead ru,2 − r u · w1,u u of word frequencies. P1,2 = r 1 + u w1,u Formally, if R is the m × n user-item matrix, then the similarity between two items, i and j, is defined as the cosine r2,2 − r 2 w1,2 + r4,2 − r 4 w1,4 + r5,2 − r 5 w1,5 of the n dimensional vectors corresponding to the ith and jth = r1 + w1,2 + w1,4 + w1,5 column of matrix R. Vector cosine similarity between items i and j is given by (2 − 2.5)(−1) + (4 − 4)0 + (1 − 3.33)0.756 = 4.67 + 1 + 0 + 0.756 i•j = 3.95. wi, j = cos i, j = , (3) i ∗ j (6)
  • 7. Advances in Artificial Intelligence 7 Note the above prediction is based on the neighborhood potentially produce suboptimal recommendations. To solve of the active users. this problem, Deshpande and Karypis [63] developed higher- order item-based top-N recommendation algorithms that 3.2.2. Simple Weighted Average. For item-based prediction, use all combinations of items up to a particular size when we can use the simple weighted average to predict the rating, determining the itemsets to be recommended to a user. Pu,i , for user u on item i [40] 3.4. Extensions to Memory-Based Algorithms n∈N ru,n wi,n Pu,i = , (7) n∈N wi,n 3.4.1. Default Voting. In many collaborative filters, pair- wise similarity is computed only from the ratings in the where the summations are over all other rated items n ∈ N intersection of the items both users have rated [5, 27]. It for user u, wi,n is the weight between items i and n, ru,n is the will not be reliable when there are too few votes to generate rating for user u on item n. similarity values. Also, focusing on intersection set similarity neglects the global rating behavior reflected in a user’s entire 3.3. Top-N Recommendations. Top-N recommendation is to rating history. recommend a set of N top-ranked items that will be of Empirically, assuming some default voting values for the interest to a certain user. For example, if you are a returning missing ratings can improve the CF prediction performance. customer, when you log into your http://amazon.com/ Herlocker et al. [64] accounts for small intersection sets by account, you may be recommended a list of books (or other reducing the weight of users that have fewer than 50 items in products) that may be of your interest (see Figure 1). Top- common. Chee et al. [13] uses the average of the clique (or N recommendation techniques analyze the user-item matrix small group) as default voting to extend each user’s rating to discover relations between different users or items and history. Breese et al. [9] uses a neutral or somewhat negative use them to compute the recommendations. Some models, preference for the unobserved ratings and then computes the such as association rule mining based models, can be used to similarity between users on the resulting ratings data. make top-N recommendations, which we will introduce in Section 4. 3.4.2. Inverse User Frequency. The idea of inverse user fre- quency [61] applied in collaborative filtering is that univer- 3.3.1. User-Based Top-N Recommendation Algorithms. User- sally liked items are not as useful in capturing similarity as based top-N recommendation algorithms firstly identify the less common items. The inverse frequency can be defined k most similar users (nearest neighbors) to the active user as f j = log(n/n j ), where n j is the number of users who using the Pearson correlation or vector-space model [9, 27], in have rated item j and n is the total number of users. If which each user is treated as a vector in the m-dimensional everyone has rated item j, then f j is zero. To apply inverse item space and the similarities between the active user and user frequency while using the vector similarity-based CF other users are computed between the vectors. After the k algorithm, we need to use a transformed rating, which is most similar users have been discovered, their corresponding simply the original rating multiplied by the f j factor [9]. rows in the user-item matrix R are aggregated to identify a set of items, C, purchased by the group together with 3.4.3. Case Amplification. Case amplification refers to a their frequency. With the set C, user-based CF techniques transform applied to the weights used in the basic collab- then recommend the top-N most frequent items in C orative filtering prediction. The transform emphasizes high that the active user has not purchased. User-based top- weights and punishes low weights [9]: N recommendation algorithms have limitations related to ρ−1 scalability and real-time performance [62]. wi, j = wi, j · wi, j , (8) 3.3.2. Item-Based Top-N Recommendation Algorithms. Item- where ρ is the case amplification power, ρ ≥ 1, and a typical based top-N recommendation algorithms have been devel- choice of ρ is 2.5 [65]. Case amplification reduces noise in the oped to address the scalability problem of user-based top-N data. It tends to favor high weights as small values raised to a recommendation algorithms. The algorithms firstly compute power become negligible. If the weight is high, for example, the k most similar items for each item according to the wi, j = 0.9, then it remains high (0.92.5 ≈ 0.8); if it is low, for similarities; then identify the set, C, as candidates of recom- example, wi, j = 0.1, then it will be negligible (0.12.5 ≈ 0.003). mended items by taking the union of the k most similar items and removing each of the items in the set, U, that the user 3.4.4. Imputation-Boosted CF Algorithms. When the rating has already purchased; then calculate the similarities between data for CF tasks are extremely sparse, it will be prob- each item of the set C and the set U. The resulting set of lematic to produce accurate predictions using the Pearson the items in C, sorted in decreasing order of the similarity, correlation-based CF. Su et al. [37, 66] proposed a framework will be the recommended item-based Top-N list [62]. One of imputation-boosted collaborative filtering (IBCF), which problem of this method is, when the joint distribution first uses an imputation technique to fill in the missing of a set of items is different from the distributions of data, before using a traditional Pearson correlation-based the individual items in the set, the above schemes can CF algorithm on this completed data to predict a specific
  • 8. 8 Advances in Artificial Intelligence user rating for a specified item. After comprehensively variable, each directed arc a ∈ A between nodes is investigating the use of various standard imputation tech- a probabilistic association between variables, and Θ is a niques (including mean imputation, linear regression impu- conditional probability table quantifying how much a node tation, and predictive mean matching [67] imputation, and depends on its parents [72]. Bayesian belief nets (BNs) are Bayesian multiple imputation [68]), and machine learning often used for classification tasks. classifiers [66] (including na¨ve Bayes, SVM, neural network, ı decision tree, lazy Bayesian rules) as imputers for IBCF, they 4.1.1. Simple Bayesian CF Algorithm. The simple Bayesian found that the proposed IBCF algorithms can perform very CF algorithm uses a na¨ve Bayes (NB) strategy to make pre- ı effectively in general, and that IBCF using Bayesian multiple dictions for CF tasks. Assuming the features are independent imputation, IBCF-NBM (a mixture IBCF which uses IBCF given the class, the probability of a certain class given all of using na¨ve Bayes for denser datasets and IBCF using mean ı the features can be computed, and then the class with the imputation for sparser ones) [37], and IBCF using na¨ve ı highest probability will be classified as the predicted class Bayes perform especially well, outperforming the content- [41]. For incomplete data, the probability calculation and boosted CF algorithm (a representative hybrid CF), and do classification production are computed over observed data so without using external content information. (the subscript o in the following equation indicates observed values): 3.4.5. Weighted Majority Prediction. The weighted majority prediction algorithm proposed by Goldman and Warmuth class = arg max p class j P Xo = xo | class j . (9) [69] makes its prediction using the rows with observed data j ∈ classSet o in the same column, weighted by the believed similarity between the rows, with binary rating values. The weights The Laplace Estimator is used to smooth the probability (or similarities, with initialized values of 1) are increased calculation and avoid a conditional probability of 0: by multiplying it by (2 − γ) when the compared values are same, and decreased by multiplying by γ when different, # Xi = xi , Y = y + 1 P Xi = xi | Y = y = , (10) with γ ∈ (0, 1). This update is equivalent to wii = # Y = y + |X i | (2 − γ)Cii γWii , where Cii is the number of rows that have the same value as in row i and Wii is the number of rows where |Xi | is the size of the class set {Xi }. For an example of having different values. The prediction for a rating on a binary class, P(Xi = 0 | Y = 1) = 0/2 will be (0+1)/(2+2) = certain item by the active user is determined by the rating on 1/4, P(Xi = 1 | Y = 1) = 2/2 will be (2 + 1)/(2 + 2) = 3/4 the item by a certain user, who has the highest accumulated using the Laplace Estimator. weight value with the active user. This algorithm can be Using the same example in Table 4, the class set is generalized to multiclass data, and be extended from user-to- {1, 2, . . . , 5}, to produce the rating for U1 on I2 using the user similarity to item-to-item similarity and to user-item- simple Bayesian CF algorithm and the Laplace Estimator, we combined similarity [70]. One shortcoming of this algorithm have is the scalability, when the user number or item number class = arg max p c j | U2 = 2, U4 = 4, U5 = 1 grows over a certain large number n, it will be impractical c j ∈ {1,2,3,4,5} for the user-to-user or item-to-item similarity computations to update the O(n2 ) similarity matrices. = arg max p c j P U2 = 2 | c j P U4 = 4 | c j c j ∈{1,2,3,4,5} (11) 4. Model-Based Collaborative × P U5 = 1 | c j Filtering Techniques = arg max {0, 0, 0, 0.0031, 0.0019} = 4 The design and development of models (such as machine c j ∈{1,2,3,4,5} learning, data mining algorithms) can allow the system to learn to recognize complex patterns based on the training in which p(5)P(U2 = 2 | 5)P(U4 = 4 | 5)P(U5 = 1 | 5) = data, and then make intelligent predictions for the col- (2/3) ∗ (1/7) ∗ (1/7) ∗ (1/7) = 0.0019. laborative filtering tasks for test data or real-world data, In Miyahara and Pazzani [10], multiclass data are firstly based on the learned models. Model-based CF algorithms, converted to binary-class data, and then converted to a such as Bayesian models, clustering models, and dependency Boolean feature vector rating matrix. These conversions make networks, have been investigated to solve the shortcomings of the use of the NB algorithm for CF tasks easier, but bring the memory-based CF algorithms [9, 71]. Usually, classification problems of scalability and the loss of multiclass information algorithms can be used as CF models if the user ratings are for multiclass data. In Miyahara and Pazzani [41], they categorical, and regression models and SVD methods and be applied the simple Bayesian CF model only on binary data. used for numerical ratings. Because most real-world CF data are multiclass ones, Su and Khoshgoftaar [11] apply the simple Bayesian CF 4.1. Bayesian Belief Net CF Algorithms. A Bayesian belief algorithm to multiclass data for CF tasks, and found net (BN) is a directed, acyclic graph (DAG) with a triplet simple Bayesian CF has worse predictive accuracy but better N, A, Θ , where each node n ∈ N represents a random scalability than the Pearson correlation-based CF as it makes
  • 9. Advances in Artificial Intelligence 9 predictions based on observed ratings, and the prediction- and q is a positive integer. When q = 1, d is Manhattan making process is less time-consuming. distance; when q = 2, d is Euclidian distance [76]. The simple Bayesian CF algorithm can be regarded Clustering methods can be classified into three cate- as memory-based CF technique because of its in-memory gories: partitioning methods, density-based methods, and calculation for CF predictions. We put it in this section hierarchical methods [76, 77]. A commonly-used parti- for the reason that most other Bayesian CF algorithms are tioning method is k-means, proposed by MacQueen [78], model-based CFs. which has two main advantages: relative efficiency and easy implementation. Density-based clustering methods typically 4.1.2. NB-ELR and TAN-ELR CF Algorithms. Because of the search for dense clusters of objects separated by sparse limitations of the simple Bayesian algorithm for CF tasks, regions that represent noise. DBSCAN [79] and OPTICS advanced BNs CF algorithms, with their ability to deal with [80] are well-known density-based clustering methods. Hier- incomplete data, can be used instead [11]. Extended logistic archical clustering methods, such as BIRCH [81], create a regression (ELR) is a gradient-ascent algorithm [31, 73], hierarchical decomposition of the set of data objects using which is a discriminative parameter-learning algorithm that some criterion. maximizes log conditional likelihood. In most situations, clustering is an intermediate step TAN-ELR and NB-ELR (tree augmented na¨ve Bayes [74] ı and the resulting clusters are used for further analysis or and na¨ve Bayes optimized by ELR, resp.) have been proven ı processing to conduct classification or other tasks. Clustering to have high classification accuracy for both complete and CF models can be applied in different ways. Sarwar et al. [43] incomplete data [31, 73]. and O’Connor and Herlocker [42] use clustering techniques Applied to CF tasks, working on real-world multiclass to partition the data into clusters and use a memory-based CF datasets and using MAE as evaluation criterion, the CF algorithm such as a Pearson correlation-based algorithm empirical results show that the TAN-ELR CF and NB- to make predictions for CF tasks within each cluster. ELR CF algorithms perform significantly better than the Using the k-means method with k = 2, the RecTree simple Bayesian CF algorithm, and consistently better than method, proposed by Chee et al. [13], recursively splits the Pearson correlation memory-based CF algorithm [11]. the originally large rating data into two sub-clusters as However, TAN-ELR and NB-ELR need a longer time to train it constructs the RecTree from the root to its leaves. The the models. A solution is to run the time-consuming training resulting RecTree resembles an unbalanced binary tree, of stage offline, and the online prediction-producing stage will which leaf nodes have a similarity matrix and internal nodes take a much shorter time. maintain rating centroids of their subtrees. The prediction is made within the leaf node that the active user belongs to. RecTree scales by O(n log2 (n)) for off-line recommendation 4.1.3. Other Bayesian CF Algorithms. Bayesian belief nets with and O(b) for on-line recommendation, where n is the dataset decision trees at each node. This model has a decision tree size and b is the partition size, a constant, and it has an at each node of the BNs, where a node corresponds to each improved accuracy over the Pearson correlation-based CF item in the domain and the states of each node correspond to when selecting an appropriate size of advisors (cluster of the possible ratings for each item [9]. Their results show that users). this model has similar prediction performance to Pearson Ungar and Foster [12] clusters users and items separately correlation-based CF methods, and has better performance using variations of k-means and Gibbs sampling [82], by than Bayesian-clustering and vector cosine memory-based clustering users based on the items they rated and clustering CF algorithms. items based on the users that rated them. Users can be Baseline Bayesian model uses a Bayesian belief net with reclustered based on the number of items they rated, and no arcs (baseline model) for collaborative filtering and rec- items can be similarly re-clustered. Each user is assigned to ommends items on their overall popularity [75]. However, a class with a degree of membership proportional to the the performance is suboptimal. similarity between the user and the mean of the class. Their CF performance on synthetic data is good, but not good on 4.2. Clustering CF Algorithms. A cluster is a collection of real data. data objects that are similar to one another within the same A flexible mixture model (FMM) extends existing clus- cluster and are dissimilar to the objects in other clusters tering algorithms for CF by clustering both users and items [76]. The measurement of the similarity between objects is at the same time, allowing each user and item to be in determined using metrics such as Minkowski distance and multiple clusters and modeling the clusters of users and items Pearson correlation. separately [15]. Experimental results show that the FMM For two data objects, X = (x1 , x2 , . . . , xn ) and Y = algorithm has better accuracy than the Pearson correlation- (y1 , y2 , . . . , yn ), the popular Minkowski distance is defined as based CF algorithm and aspect model [83]. Clustering models have better scalability than typical n q collaborative filtering methods because they make predic- d(X, Y ) = q xi − y i , (12) i=1 tions within much smaller clusters rather than the entire customer base [13, 27, 44, 84]. The complex and expen- where n is the dimension number of the object and xi , yi are sive clustering computation is run offline. However, its the values of the ith dimension of object X and Y respectively, recommendation quality is generally low. It is possible to
  • 10. 10 Advances in Artificial Intelligence improve quality by using numerous fine-grained segments, 4.4. MDP-Based CF Algorithms. Instead of viewing the but then online user-segment classification becomes almost recommendation process as a prediction problem, Shani as expensive as finding similar customers using memory- et al. [14] views it as a sequential optimization problem based collaborative filtering [6]. As optimal clustering over and uses a Markov decision processes (MDPs) model [89] for large data sets is impractical, most applications use var- recommender systems. ious forms of greedy cluster generation techniques. For An MDP is a model for sequential stochastic decision very large datasets, especially those with high dimension- problems, which is often used in applications where an agent ality, sampling or dimensionality reduction is also neces- is influencing its surrounding environment through actions. sary. An MDP can be defined as a four-tuple: S, A, R, Pr , where S is a set of states, A is a set of actions, R is a real-valued 4.3. Regression-Based CF Algorithms. For memory-based CF reward function for each state/action pair, and Pr is the algorithms, in some cases, two rating vectors may be distant transition probability between every pair of states given each in terms of Euclidean distances but they have very high action. similarity using vector cosine or Pearson correlation measures, An optimal solution to the MDP is to maximize the where memory-based CF algorithms do not fit well and need function of its reward stream. By starting with an initial better solutions. Also, numerical ratings are common in real- policy π0 (s) = arg maxa∈A R(s, a), computing the reward life recommender systems. Regression methods that are good value function Vi (s) based on the previous policy, and at making predictions for numerical values are helpful to updating the policy with the new value function at each step, address these problems. the iterations will converge to an optimal policy [90, 91]. A regression method uses an approximation of the In Shani et al. [14], the states of the MDP for the ratings to make predictions based on a regression model. Let CF system are k tuples of items, with some null values X = (X1 , X2 , . . . , Xn ) be a random variable representing a corresponding to missing items; the actions of the MDP user’s preferences on different items. The linear regression correspond to a recommendation of an item; and the rewards model can be expressed as in the MDP correspond to the utility of selling an item, for example, the net profit. The state following each recom- Y = ΛX + N, (13) mendation is the user’s response to that recommendation, such as taking the recommended item, taking the non- recommended item, or selecting nothing. To handle the large where Λ is a n × k matrix. N = (N1 , . . . , Nn ) is a random action space, it is assumed that the probability that a user variable representing noise in user choices, Y is an n × m buys an item depends on his current state, item, and whether matrix with Yi j is the rating of user i on item j, and X is a or not the item is recommended, but does not depend on the k × m matrix with each column as an estimate of the value of identity of the other recommended items. the random variable X (user’s ratings in the k-dimensional Working on an Israeli online bookstore, Mitos, the rating space) for one user. Typically, the matrix Y is very deployed MDP-recommender system produced a much sparse. higher profit than the system without using the recom- To remedy this, Canny [52] proposed a sparse factor mender. Also, the MDP CF model performs much better analysis, which replaces missing elements with default voting than the simpler Markov chain (MC) model, which is simply values (the average of some nonmissing elements, either an MDP without actions [14]. the average by columns, or by rows, or by all), and uses The MDP-Based CF model in Shani et al. [14] can the regression model as the initialization for Expectation be viewed as approximating a partial observable MDP Maximization (EM) [85] iterations. According to Canny (POMDP) by using a finite rather than unbounded win- [52], the sparse factor analysis has better scalability than Pear- dow of past history to define the current state. As the son correlation-based CF and Personality Diagnosis (PD), a computational and representational complexity of POMDPs representative hybrid CF algorithm [86], and better accuracy is high, appropriate approaches to tackling these problems than singular value decomposition (SVD) [23]. Sparse factor must be developed, which are generally classified into three analysis also protects user privacy, as it supports computation broad strategies: value function approximation [92], policy on encrypted user data [52]. based optimization [84, 93], and stochastic sampling [94]. Vucetic and Obradovic [87] proposed a regression- The application of these strategies to CF tasks may be an based approach to CF tasks on numerical ratings data that interesting direction of future research. searches for similarities between items, builds a collection of simple linear models, and combines them efficiently to provide rating predictions for an active user. They used 4.5. Latent Semantic CF Models. A Latent semantic CF ordinary least squares to estimate the parameters of the linear technique relies on a statistical modeling technique that regression function. Their experimental results show the introduces latent class variables in a mixture model setting approach has good performance in addressing the sparsity, to discover user communities and prototypical interest prediction latency and numerical prediction problems of profiles. Conceptionally, it decomposes user preferences CF tasks. Lemire and Maclachlan [88] proposed slope one using overlapping user communities. The main advantages algorithms to make faster CF prediction than memory-based of this technique over standard memory-based methods are CF algorithms. its higher accuracy and scalability [7, 95].
  • 11. Advances in Artificial Intelligence 11 The aspect model, proposed by Hofmann and Puzicha cyclic. The probability component of a dependency network [83], is a probabilistic latent-space model, which models is a set of conditional distributions, one for each node individual ratings as a convex combination of rating factors. given its parents. Although less accurate than Bayesian The latent class variable is associated with each observed pair belief nets, dependency networks are faster in generating of {user, item}, with the assumption that users and items are predictions and require less time and memory to learn independent from each other given the latent class variable. [75]. Decision tree CF models treat collaborative filtering The performance of the aspect model is much better than the as a classification task and use decision tree as the clas- clustering model working on the EachMovie dataset [96]. sifier [103]. Horting is a graph-based technique in which A multinomial model is a simple probabilistic model for nodes are users and edges between nodes are degrees categorical data [9, 97] that assumes there is only one type of similarity between users [104]. Multiple multiplicative of user. A multinomial mixture model assumes that there are factor models (MMFs) are a class of causal, discrete latent multiple types of users underlying all profiles, and that the variable models combining factor distributions multiplica- rating variables are independent with each other and with tively and are able to readily accommodate missing data the user’s identity given the user’s type [98]. A user rating [105]. Probabilistic principal components analysis (pPCA) profile (URP) model [97] combines the intuitive appeal of the [52, 106] determines the principal axes of a set of observed multinomial mixture model and aspect model [83], with the data vectors through maximum-likelihood estimation of high-level generative semantics of Latent Dirichlet Allocation parameters in a latent variable model closely related to (LDA, a generative probabilistic model, in which each item factor analysis. Matrix factorization based CF algorithms is modeled as a finite mixture over an underlying set of have been proven to be effective to address the scalability users) [99]. URP performs better than the aspect model and and sparsity challenges of CF tasks [33, 34, 107]. Wang multinomial mixtures models for CF tasks. et al. showed how the development of collaborative filtering can gain benefits from information retrieval theories and models, and proposed probabilistic relevance CF models 4.6. Other Model-Based CF Techniques. For applications in [108, 109]. which ordering is more desirable than classifying, Cohen et al. [100] investigated a two-stage order learning CF approach to learning to order. In that approach, one first 5. Hybrid Collaborative Filtering Techniques learns a preference function by conventional means, and then orders a new set of instances by finding the total ordering that Hybrid CF systems combine CF with other recommendation best approximates the preference function, which returns a techniques (typically with content-based systems) to make confidence value reflecting how likely that one is preferred predictions or recommendations. to another. As the problem of finding the total ordering is Content-based recommender systems make recommen- NP-complete, a greedy-order algorithm is used to obtain dations by analyzing the content of textual information, an approximately optimal ordering function. Working on such as documents, URLs, news messages, web logs, item EachMovie [96], this order learning CF approach performs descriptions, and profiles about users’ tastes, preferences, and better than a nearest neighbor CF algorithm and a linear needs, and finding regularities in the content [110]. Many regression algorithm. elements contribute to the importance of the textual content, Association rule based CF algorithms are more often such as observed browsing features of the words or pages used for top-N recommendation tasks than prediction ones. (e.g., term frequency and inverse document frequency), and Sarwar et al. [27] describes their approach to using a similarity between items a user liked in the past [111]. A traditional association rule mining algorithm to find rules content-based recommender then uses heuristic methods or for developing top-N recommender systems. They find the classification algorithms to make recommendations [112]. top-N items by simply choosing all the rules that meet Content-based techniques have the start-up problem, in the thresholds for support and confidence values, sorting which they must have enough information to build a reliable items according to the confidence of the rules so that items classifier. Also, they are limited by the features explicitly asso- predicted by the rules that have a higher confidence value ciated with the objects they recommend (sometimes these are ranked higher, and finally selecting the first N highest features are hard to extract), while collaborative filtering ranked items as the recommended set [27]. Fu et al. [101] can make recommendations without any descriptive data. develop a system to recommend web pages by using an Also, content-based techniques have the overspecialization a priori algorithm to mine association rules over users’ problem, that is, they can only recommend items that score navigation histories. Leung et al. proposed a collaborative highly against a user’s profile or his/her rating history [21, filtering framework using fuzzy association rules and multi- 113]. level similarity [102]. Other recommender systems include demographic-based Other model-based CF techniques include a maximum recommender systems, which use user profile information entropy approach, which clusters the data first, and then in a such as gender, postcode, occupation, and so forth [114]; given cluster uses maximum entropy as an objective function utility-based recommender systems and knowledge-based rec- to form a conditional maximal entropy model to make ommender systems, both of which require knowledge about predictions [17]. A dependency network is a graphical model how a particular object satisfies the user needs [115, 116]. for probabilistic relationships, whose graph is potentially We will not discuss these systems in detail in this work.
  • 12. 12 Advances in Artificial Intelligence Hoping to avoid limitations of either recommender Table 5: Content-boosted CF and its variations (a) content data system and improve recommendation performance, hybrid and originally sparse rating data (b) pseudorating data filled by CF recommenders are combined by adding content-based content predictor (c) predictions from (weighted) Pearson CF on characteristics to CF models, adding CF characteristics to the pseudo rating data. content-based models, combining CF with content-based or (a) other systems, or combining different CF algorithms [21, 117]. Content information Rating matrix Age Sex Career zip I 1 I2 I3 I4 I5 U1 32 F writer 22904 4 5.1. Hybrid Recommenders Incorporating CF and Content- U2 27 M student 10022 2 4 3 Based Features. The content-boosted CF algorithm uses na¨ve ı Bayes as the content classifier, it then fills in the missing U3 24 M engineer 60402 1 values of the rating matrix with the predictions of the U4 50 F other 60804 3 3 3 3 content predictor to form a pseudo rating matrix, in which U5 28 M educator 85251 1 observed ratings are kept untouched and missing ratings (b) are replaced by the predictions of a content predictor. It then makes predictions over the resulting pseudo rat- Pseudo rating data ings matrix using a weighted Pearson correlation-based CF I1 I2 I3 I4 I5 algorithm, which gives a higher weight for the item that 2 3 4 3 2 more users rated, and gives a higher weight for the active 2 2 4 3 2 user [16] (see an illustration in Table 5). The content- 3 1 3 4 3 boosted CF recommender has improved prediction perfor- 3 3 3 3 3 mance over some pure content-based recommenders and 1 2 4 1 2 some pure memory-based CF algorithms. It also overcomes (c) the cold start problem and tackles the sparsity problem of CF tasks. Working on reasonably-sized subsets instead Pearson-CF prediction of the original rating data, Greinemr et al. used TAN- I1 I2 I3 I4 I5 ELR [31] as the content-predictor and directly applied the 2 3 4 2 3 Pearson correlation-based CF instead of a weighted one 3 4 2 2 3 on the pseudo rating matrix to make predictions, and 3 3 2 3 3 they achieved improved CF performance in terms of MAE [118]. 3 3 3 3 3 1 3 1 2 2 Ansari et al. [8] propose a Bayesian preference model that statistically integrates several types of information useful for making recommendations, such as user preferences, user 5.2. Hybrid Recommenders Combining CF and Other Rec- and item features, and expert evaluations. They use Markov ommender Systems. A weighted hybrid recommender com- chain Monte Carlo (MCMC) methods [119] for sampling- bines different recommendation techniques by their weights, based inference, which involve sampling parameter estima- which are computed from the results of all of the available tion from the full conditional distribution of parameters. recommendation techniques present in the system [115]. They achieved better performance than pure collaborative The combination can be linear, the weights can be adjustable filtering. [46], and weighted majority voting [110, 122] or weighted The recommender Fab, proposed by Balabanovi´ and c average voting [118] can be used. For example, the P- Shoham [117], maintains user profiles of interest in web Tango system [46] initially gives CF and content-based pages using content-based techniques, and uses CF tech- recommenders equal weight, but gradually adjusts the niques to identify profiles with similar tastes. It can then weighting as predictions about user ratings are confirmed or recommend documents across user profiles. Sarwar et al. disconfirmed. The strategy of the P-Tango system is similar [120] implemented a set of knowledge-based “filterbots” to boosting [123]. as artificial users using certain criteria. A straightforward A switching hybrid recommender switches between rec- example of a filterbot is a genrebot, which bases its opinion ommendation techniques using some criteria, such as con- solely on the genre of the item, for example, a “jazzbot” fidence levels for the recommendation techniques. When the would give a full mark to a CD simply because it is in the jazz CF system cannot make a recommendation with sufficient category, while it would give a low score to any other CD in confidence, then another recommender system such as a the database. Mooney and Roy [121] use the prediction from content-based system is attempted. Switching hybrid recom- the CF system as the input to a content-based recommender. menders also introduce the complexity of parameterization Condiff et al. [113] propose a Bayesian mixed-effects model for the switching criteria [115]. that integrates user ratings, user, and item features in a single Other hybrid recommenders in this category include unified framework. The CF system Ripper, proposed by Basu mixed hybrid recommenders [124], cascade hybrid recom- et al. [71], uses both user ratings and contents features to menders [115], meta-level recommenders [110, 115, 117, 125], produce recommendations. and so forth.
  • 13. Advances in Artificial Intelligence 13 Many papers empirically compared the performance of be broadly classified into the following broad categories: hybrid recommenders with the pure CF and content-based predictive accuracy metrics, such as Mean Absolute Error methods and found that hybrid recommenders may make (MAE) and its variations; classification accuracy metrics, such more accurate recommendations, especially for the new user as precision, recall, F1-measure, and ROC sensitivity; rank and new item situations where a regular CF algorithm cannot accuracy metrics, such as Pearson’s product-moment correla- make satisfactory recommendations. However, hybrid rec- tion, Kendall’s Tau, Mean Average Precision (MAP), half-life ommenders rely on external information that is usually not utility [9], and normalized distance-based performance metric available, and they generally have increased complexity of (NDPM) [128]. implementation [110, 115, 126]. We only introduce the commonly-used CF metrics MAE, NMAE, RMSE, and ROC sensitivity here. For other CF performance metrics of recommendation quality, see 5.3. Hybrid Recommenders Combining CF Algorithms. The [60]. There are other evaluations of recommender systems two major classes of CF approaches, memory-based and including usability evaluation [129] and so forth. model-based CF approaches, can be combined to form hybrid CF approaches. The recommendation performances of these algorithms are generally better than some pure 6.1. Mean Absolute Error (MAE) and Normalized Mean memory-based CF algorithms and model-based CF algo- Absolute Error (NMAE). Instead of classification accuracy rithms [22, 86]. or classification error, the most widely used metric in CF Probabilistic memory-based collaborative filtering (PMCF) research literature is Mean Absolute Error (MAE) [3, 60], combines memory-based and model-based techniques [22]. which computes the average of the absolute difference They use a mixture model built on the basis of a set between the predictions and true ratings of stored user profiles and use the posterior distribution of user ratings to make predictions. To address the new {i, j } pi, j − ri, j user problem, an active learning extension to the PMCF MAE = , (14) n system can be used to actively query a user for additional information when insufficient information is available. To where n is the total number of ratings over all users, pi, j is reduce the computation time, PMCF selects a small subset the predicted rating for user i on item j, and ri, j is the actual called profile space from the entire database of user ratings rating. The lower the MAE, the better the prediction. and gives predictions from the small profile space instead Different recommender systems may use different of the whole database. PMCF has better accuracy than the numerical rating scales. Normalized Mean Absolute Error Pearson correlation-based CF and the model-based CF using (NMAE) normalizes MAE to express errors as percentages na¨ve Bayes. ı of full scale [3]: Personality diagnosis (PD) is a representative hybrid CF approach that combines memory-based and model-based CF MAE algorithms and retains some advantages of both algorithms NMAE = , (15) rmax − rmin [86]. In PD, the active user is assumingly generated by choosing one of the other users uniformly at random and where rmax and rmin are the upper and lower bounds of the adding Gaussian noise to his or her ratings. Given the active ratings. user’s known ratings, we can calculate the probability that he or she is the same “personality type” as other users, and the probability he or she will like the new items. PD can also 6.2. Root Mean Squared Error (RMSE). Root Mean Squared be regarded as a clustering method with exactly one user per Error (RMSE) is becoming popular partly because it is cluster. Working on EachMovie [96] and CiteSeer [127], PD the Netflix prize [20] metric for movie recommendation makes better predictions than Pearson correlation-based and performance: vector similarity-based CF algorithms and the two model- based algorithms, Bayesian clustering and Bayesian network, 1 2 RMSE = pi, j − ri, j , (16) investigated by Breese et al. [9]. n {i, j } As an ensemble classifier is able to give more accurate prediction than a member classifier, a hybrid CF system that where n is the total number of ratings over all users, pi, j combines different CF algorithms using an ensemble scheme is the predicted rating for user i on item j, and ri, j is the will also be helpful to improve predictive performance of CF actual rating again. RMSE amplifies the contributions of the tasks [118]. absolute errors between the predictions and the true values. Although accuracy metrics have greatly helped the field 6. Evaluation Metrics of recommender systems, the recommendations that are most accurate are sometimes not the ones that are most The quality of a recommender system can be decided on useful to users, for example, users might prefer to be the result of evaluation. The type of metrics used depends recommended with items that are unfamiliar with them, on the type of CF applications. According to Herlocker rather than the old favorites they do not likely want again et al. [60], metrics evaluating recommendation systems can [130]. We therefore need to explore other evaluation metrics.