SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
LCBM: Statistics-based Parallel
Collaborative Filtering
Fabio Petroni 1, Leonardo Querzoni 1, Roberto Beraldi 1, Mario Paolucci 2
Cyber Intelligence and information Security
CIS Sapienza
1
Department of Computer Control and
Management Engineering Antonio Ruberti,
Sapienza University of Rome -
petroni|querzoni|beraldi@dis.uniroma1.it
2
Institute of Cognitive Sciences
and Technologies, CNR -
mario.paolucci@istc.cnr.it
Larnaca, Cyprus
22-23 May, 2014
Personalization
I Most of todays internet businesses deeply root their success in the
ability to provide users with strongly personalized experiences.
I Recommender Systems: are a subclass of information filtering
system that seek to predict the rating or preference that user
would give to an item.
2 of 21
Collaborative Filtering
I Collaborative Filtering (CF) represents today’s a widely adopted
strategy to build recommendation engines.
I CF analyzes the known preferences of a group of users to make
predictions of the unknown preferences for other users.
Filtering
filter the user data stream
satisfying user personal interests
Discovery
recommends new content to the
user that matches his interests
3 of 21
Challenges
I Key factors for recommendation e↵ectiveness:
amount of data available
larger data-sets improve quality
more than better algorithms
timeliness
timely deliver output to users
constitutes a business advantage
7 Most CF algorithms typically need to go through a very lengthy
training stage, incurring in prohibitive computational costs and
large time-to-prediction intervals when applied on large data sets.
7 This lack of e ciency is going to quickly limit the applicability of
these solutions at the current rates of data production growth.
4 of 21
Taxonomy Of CF Algorithms
MEMORY-
BASED
MODEL-
BASED
Collaborative
Filtering
neighborhood
models
user-oriented item-oriented
Pearson
Correlation
Similarity
Cosine
Similarity
cluster
models
Bayesian
networks
linear
regression
LATENT
FACTORS
MODELS
pLSA
neural
networks
Latent
Dirichlet
Allocation
MATRIX
FACTORIZATION
ALS SGD SVD
5 of 21
Memory-Based CF Algorithms
I Operate on the entire set of ratings to generate predictions:
B compute a similarity score, which reflects the distance between two
users or two items.
- Pearson Correlation Similarity
- Cosine Similarity
B Predictions are computed by looking at the ratings of the k most
similar users or items (k-nearest neighbor).
3 Simple design and implementation;
3 used in a lot of real-world systems.
7 Impose several scalability limitations;
7 their use is impractical when dealing with large amounts of data.
6 of 21
Model-Based CF Algorithms
I Use the set of available ratings to estimate or learn a model and
then apply this model to make rating predictions.
I The most successful are based on matrix factorization models.
7 of 21
Matrix Factorization 1/2
LOSS(P, Q) =
X
(Rij Pi Qj )2
I The most popular techniques to minimize LOSS are Alternating
Least Squares (ALS) and Stochastic Gradient Descent (SGD).
ALS
ALS alternates between keeping
P and Q fixed, solving from time
to time a least-squares problem.
SGD
SGD works by taking steps
proportional to the negative of the
gradient of the LOSS function.
I Both algorithms need several passes through the set of ratings.
8 of 21
Matrix Factorization 2/2
3 Provide high quality results;
3 SGD was chosen by the top three solutions of KDD-Cup 2011;
3 there exist parallel and distributed implementations.
7 High computational costs;
7 large time-to-prediction intervals;
7 especially when applied on large data sets.
SGD ALS LCBM
training time O(X · K · E) ⌦(K3
(N + M) + K2
X) O(X)
prediction time O(K) O(K) O(1)
memory usage O(K(N + M)) O(M2 + X) O(N + M)
In the table X is the number of collected ratings, K the number of hidden
features, E the iterations, N and M the number of users and items respectively.
9 of 21
Contributions
In this paper we present:
LCBM: Linear Classifier of Beta distributions Means
a novel collaborative filtering algorithm for binary ratings that:
3 is inherently parallelizable;
3 provides results whose quality is on-par with state-of-the-art
solutions;
3 in shorter time;
3 using less computational resources.
10 of 21
Overview
LCBM
Working phase
ratings
Training phase
Item rating PDF
inferenceratings per
item
User profiler
ratings per
user
standard
error
mean
>
predictions
threshold
I We consider:
B items as elements whose tendency to be rated positively/negatively
can be statistically characterized using a probability density function;
B users as entities with di↵erent tastes that rate the same items using
di↵erent criteria and that must thus be profiled;
I The algorithm needs two passes over the set of available ratings
to train the model (build users and items profiles).
11 of 21
Item Rating PDF Inference
I Beta distribution:
B X = the relative frequency of positive
votes that the item will receive.
B ↵ = OK + 1, = KO + 1
Item profile
MEAN = ↵
↵+ = OK+1
OK+KO+2
SE = 1
OK+KO+2
q
(OK+1)(KO+1)
(OK+KO)(OK+KO+3)
I Chebyshev’s inequality:
Pr(MEAN SE  X  MEAN+ SE) 1
1
2
0
0.5
1
1.5
2
2.5
3
0 0.2 0.4 0.6 0.8 1
PDF
X
I Example:
B OK = 8, KO = 3
B MEAN ⇡ 0.7
B SE ⇡ 0.04
– Pr(0.62  X  0.78) 0.75
12 of 21
User Profiler
0.1 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.65 0.7 0.8 0.85
discriminant function
OK errors KO errors
I Every rating is represented by a point with two attributes:
B a boolean value containing the rating (OK or KO).
B a key 2 [0, 1] that gives the point’s rank in the order;
I if value = OK ! key = MEAN + SE
I if value = KO ! key = MEAN SE
User profile
Quality threshold (QT) = the discriminant function of a 1D linear
classifier that separates the points with a minimal number of errors.
13 of 21
Working Phase
LCBM
Working phase
ratings
Training phase
Item rating PDF
inferenceratings per
item
User profiler
ratings per
user
standard
error
mean
>
predictions
threshold
I In order to produce a prediction the algorithm simply compares
QT values from the user profiler with MEAN values from the
item’s rating PDF inference block:
B if MEAN > QT ! prediction = OK
B if MEAN  QT ! prediction = KO
14 of 21
Experiments: Setting and The Data Sets
Dataset MovieLens 100k MovieLens 10M Netflix
users 943 69878 480189
items 1682 10677 17770
ratings 100000 10000054 100480507
I 5-fold cross-validation approach:
B This technique divides the dataset in 5 folds and then uses in turn
one of the folds as test set and the remaining ones as training set.
I We compared LCBM against CF
implementations in Apache Mahout:
B ParallelSGDFactorizer: lock-free and parallel implementation of SGD
B ALSWRFactorizer: ALS with Weighted- -Regularization
15 of 21
Experiments: Performance Metrics
I Confusion matrix: true positives (TP) false negatives (FN)
false positives (FP) true negatives (TN)
I Matthews correlation coe cient (MCC) 2 [ 1, 1]
MCC =
TP ⇥ TN FP ⇥ FN
p
(TP + FP) ⇥ (TP + FN) ⇥ (TN + FP) ⇥ (TN + FN)
B 1 ! perfect prediction;
B 0 ! no better than random prediction;
B 1 ! total disagreement between prediction and observation.
I time needed to run the test;
I the peak memory load during the test.
16 of 21
Results: Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
MovieLens (105
) MovieLens (107
) Netflix (108
)
MCC
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
0
0.1
0.2
0.3
0.4
0.5
0.6
MovieLens (105
) MovieLens (107
) Netflix (108
)
MCC
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
17 of 21
Results: Computational Resources
0
2
4
6
8
10
12
14
16
18
20
MovieLens (105
) MovieLens (107
) Netflix (108
)
memory(GB)
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
0
2
4
6
8
10
12
14
16
18
20
MovieLens (105
) MovieLens (107
) Netflix (108
)
memory(GB)
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
0
0.2
Zoom
0
0.2
Zoom
18 of 21
Results: Timeliness
0
500
1000
1500
2000
2500
3000
3500
4000
4500
MovieLens (105
) MovieLens (107
) Netflix (108
)
time(s)
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
0
500
1000
1500
2000
2500
3000
3500
4000
4500
MovieLens (105
) MovieLens (107
) Netflix (108
)
time(s)
number of ratings
LCBM
SGD K=256
SGD K=32
SGD K=8
ALS
0
10
Zoom
0
10
Zoom
19 of 21
Conclusions
I LCBM is a novel CF algorithm with binary ratings.
I It works by analyzing collected ratings to:
B infer a probability density function of the relative frequency of
positive votes that the item will receive;
B compute a personalized quality threshold for each user.
I These two information are used to predict missing ratings.
I LCBM is inherently parallelizable.
Future works:
I Handle multidimensional ratings.
I Provide recommendations: a list of items the user will like the
most.
20 of 21
Thank you!
Questions?
Fabio Petroni
Rome, Italy
Current position:
PhD Student in Computer Engineering, Sapienza University of Rome
Research Areas:
Recommendation Systems, Collaborative Filtering, Distributed Systems
petroni@dis.uniroma1.it
21 of 21

Más contenido relacionado

La actualidad más candente

Bivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm forBivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm foreSAT Publishing House
 
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET Journal
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course ProjectKhalilBergaoui
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABJournal For Research
 
Size measurement and estimation
Size measurement and estimationSize measurement and estimation
Size measurement and estimationLouis A. Poulin
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520IJRAT
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classificationBill Kromydas
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Sangamesh Ragate
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest pathredhatdb
 

La actualidad más candente (20)

Bivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm forBivariatealgebraic integerencoded arai algorithm for
Bivariatealgebraic integerencoded arai algorithm for
 
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment ProblemIRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
IRJET- Comparison for Max-Flow Min-Cut Algorithms for Optimal Assignment Problem
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
Fourier Transform Assignment Help
Fourier Transform Assignment HelpFourier Transform Assignment Help
Fourier Transform Assignment Help
 
Rn d presentation_gurulingannk
Rn d presentation_gurulingannkRn d presentation_gurulingannk
Rn d presentation_gurulingannk
 
03 hdrf presentation
03 hdrf presentation03 hdrf presentation
03 hdrf presentation
 
Neural Style Transfer in practice
Neural Style Transfer in practiceNeural Style Transfer in practice
Neural Style Transfer in practice
 
CFM Challenge - Course Project
CFM Challenge - Course ProjectCFM Challenge - Course Project
CFM Challenge - Course Project
 
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLABFAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
FAST AND EFFICIENT IMAGE COMPRESSION BASED ON PARALLEL COMPUTING USING MATLAB
 
Size measurement and estimation
Size measurement and estimationSize measurement and estimation
Size measurement and estimation
 
Paper id 37201520
Paper id 37201520Paper id 37201520
Paper id 37201520
 
Unit 3
Unit 3Unit 3
Unit 3
 
Slide1
Slide1Slide1
Slide1
 
Traffic sign classification
Traffic sign classificationTraffic sign classification
Traffic sign classification
 
Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)Colfax-Winograd-Summary _final (1)
Colfax-Winograd-Summary _final (1)
 
Sigmod11 outsource shortest path
Sigmod11 outsource shortest pathSigmod11 outsource shortest path
Sigmod11 outsource shortest path
 
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
 
Computation Assignment Help
Computation Assignment Help Computation Assignment Help
Computation Assignment Help
 
www.ijerd.com
www.ijerd.comwww.ijerd.com
www.ijerd.com
 

Similar a LCBM: Statistics-Based Parallel Collaborative Filtering

Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Surveymobilizer1000
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systemsyoualab
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringAllenWu
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM RecommendersYONG ZHENG
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyAlejandro Bellogin
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsFridtjof Melle
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels
 
An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...IJECEIAES
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTIRJET Journal
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisDr.Pooja Jain
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071CS, NcState
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxARIV4
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsAleksandr Chuklin
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsYONG ZHENG
 

Similar a LCBM: Statistics-Based Parallel Collaborative Filtering (20)

Collaborative Filtering Survey
Collaborative Filtering SurveyCollaborative Filtering Survey
Collaborative Filtering Survey
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
Survey of Recommendation Systems
Survey of Recommendation SystemsSurvey of Recommendation Systems
Survey of Recommendation Systems
 
A scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clusteringA scalable collaborative filtering framework based on co clustering
A scalable collaborative filtering framework based on co clustering
 
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
[CIKM 2014] Deviation-Based Contextual SLIM Recommenders
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
fmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensorsfmelleHumanActivityRecognitionWithMobileSensors
fmelleHumanActivityRecognitionWithMobileSensors
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 
An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...An enhanced kernel weighted collaborative recommended system to alleviate spa...
An enhanced kernel weighted collaborative recommended system to alleviate spa...
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
Application of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosisApplication of combined support vector machines in process fault diagnosis
Application of combined support vector machines in process fault diagnosis
 
2cee Master Cocomo20071
2cee Master Cocomo200712cee Master Cocomo20071
2cee Master Cocomo20071
 
Mining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docxMining Large Streams of User Data for PersonalizedRecommenda.docx
Mining Large Streams of User Data for PersonalizedRecommenda.docx
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval Metrics
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Dynamic Symbolic Database Application Testing
Dynamic Symbolic Database Application TestingDynamic Symbolic Database Application Testing
Dynamic Symbolic Database Application Testing
 
Tutorial: Context In Recommender Systems
Tutorial: Context In Recommender SystemsTutorial: Context In Recommender Systems
Tutorial: Context In Recommender Systems
 

Último

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Último (20)

Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

LCBM: Statistics-Based Parallel Collaborative Filtering

  • 1. LCBM: Statistics-based Parallel Collaborative Filtering Fabio Petroni 1, Leonardo Querzoni 1, Roberto Beraldi 1, Mario Paolucci 2 Cyber Intelligence and information Security CIS Sapienza 1 Department of Computer Control and Management Engineering Antonio Ruberti, Sapienza University of Rome - petroni|querzoni|beraldi@dis.uniroma1.it 2 Institute of Cognitive Sciences and Technologies, CNR - mario.paolucci@istc.cnr.it Larnaca, Cyprus 22-23 May, 2014
  • 2. Personalization I Most of todays internet businesses deeply root their success in the ability to provide users with strongly personalized experiences. I Recommender Systems: are a subclass of information filtering system that seek to predict the rating or preference that user would give to an item. 2 of 21
  • 3. Collaborative Filtering I Collaborative Filtering (CF) represents today’s a widely adopted strategy to build recommendation engines. I CF analyzes the known preferences of a group of users to make predictions of the unknown preferences for other users. Filtering filter the user data stream satisfying user personal interests Discovery recommends new content to the user that matches his interests 3 of 21
  • 4. Challenges I Key factors for recommendation e↵ectiveness: amount of data available larger data-sets improve quality more than better algorithms timeliness timely deliver output to users constitutes a business advantage 7 Most CF algorithms typically need to go through a very lengthy training stage, incurring in prohibitive computational costs and large time-to-prediction intervals when applied on large data sets. 7 This lack of e ciency is going to quickly limit the applicability of these solutions at the current rates of data production growth. 4 of 21
  • 5. Taxonomy Of CF Algorithms MEMORY- BASED MODEL- BASED Collaborative Filtering neighborhood models user-oriented item-oriented Pearson Correlation Similarity Cosine Similarity cluster models Bayesian networks linear regression LATENT FACTORS MODELS pLSA neural networks Latent Dirichlet Allocation MATRIX FACTORIZATION ALS SGD SVD 5 of 21
  • 6. Memory-Based CF Algorithms I Operate on the entire set of ratings to generate predictions: B compute a similarity score, which reflects the distance between two users or two items. - Pearson Correlation Similarity - Cosine Similarity B Predictions are computed by looking at the ratings of the k most similar users or items (k-nearest neighbor). 3 Simple design and implementation; 3 used in a lot of real-world systems. 7 Impose several scalability limitations; 7 their use is impractical when dealing with large amounts of data. 6 of 21
  • 7. Model-Based CF Algorithms I Use the set of available ratings to estimate or learn a model and then apply this model to make rating predictions. I The most successful are based on matrix factorization models. 7 of 21
  • 8. Matrix Factorization 1/2 LOSS(P, Q) = X (Rij Pi Qj )2 I The most popular techniques to minimize LOSS are Alternating Least Squares (ALS) and Stochastic Gradient Descent (SGD). ALS ALS alternates between keeping P and Q fixed, solving from time to time a least-squares problem. SGD SGD works by taking steps proportional to the negative of the gradient of the LOSS function. I Both algorithms need several passes through the set of ratings. 8 of 21
  • 9. Matrix Factorization 2/2 3 Provide high quality results; 3 SGD was chosen by the top three solutions of KDD-Cup 2011; 3 there exist parallel and distributed implementations. 7 High computational costs; 7 large time-to-prediction intervals; 7 especially when applied on large data sets. SGD ALS LCBM training time O(X · K · E) ⌦(K3 (N + M) + K2 X) O(X) prediction time O(K) O(K) O(1) memory usage O(K(N + M)) O(M2 + X) O(N + M) In the table X is the number of collected ratings, K the number of hidden features, E the iterations, N and M the number of users and items respectively. 9 of 21
  • 10. Contributions In this paper we present: LCBM: Linear Classifier of Beta distributions Means a novel collaborative filtering algorithm for binary ratings that: 3 is inherently parallelizable; 3 provides results whose quality is on-par with state-of-the-art solutions; 3 in shorter time; 3 using less computational resources. 10 of 21
  • 11. Overview LCBM Working phase ratings Training phase Item rating PDF inferenceratings per item User profiler ratings per user standard error mean > predictions threshold I We consider: B items as elements whose tendency to be rated positively/negatively can be statistically characterized using a probability density function; B users as entities with di↵erent tastes that rate the same items using di↵erent criteria and that must thus be profiled; I The algorithm needs two passes over the set of available ratings to train the model (build users and items profiles). 11 of 21
  • 12. Item Rating PDF Inference I Beta distribution: B X = the relative frequency of positive votes that the item will receive. B ↵ = OK + 1, = KO + 1 Item profile MEAN = ↵ ↵+ = OK+1 OK+KO+2 SE = 1 OK+KO+2 q (OK+1)(KO+1) (OK+KO)(OK+KO+3) I Chebyshev’s inequality: Pr(MEAN SE  X  MEAN+ SE) 1 1 2 0 0.5 1 1.5 2 2.5 3 0 0.2 0.4 0.6 0.8 1 PDF X I Example: B OK = 8, KO = 3 B MEAN ⇡ 0.7 B SE ⇡ 0.04 – Pr(0.62  X  0.78) 0.75 12 of 21
  • 13. User Profiler 0.1 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.65 0.7 0.8 0.85 discriminant function OK errors KO errors I Every rating is represented by a point with two attributes: B a boolean value containing the rating (OK or KO). B a key 2 [0, 1] that gives the point’s rank in the order; I if value = OK ! key = MEAN + SE I if value = KO ! key = MEAN SE User profile Quality threshold (QT) = the discriminant function of a 1D linear classifier that separates the points with a minimal number of errors. 13 of 21
  • 14. Working Phase LCBM Working phase ratings Training phase Item rating PDF inferenceratings per item User profiler ratings per user standard error mean > predictions threshold I In order to produce a prediction the algorithm simply compares QT values from the user profiler with MEAN values from the item’s rating PDF inference block: B if MEAN > QT ! prediction = OK B if MEAN  QT ! prediction = KO 14 of 21
  • 15. Experiments: Setting and The Data Sets Dataset MovieLens 100k MovieLens 10M Netflix users 943 69878 480189 items 1682 10677 17770 ratings 100000 10000054 100480507 I 5-fold cross-validation approach: B This technique divides the dataset in 5 folds and then uses in turn one of the folds as test set and the remaining ones as training set. I We compared LCBM against CF implementations in Apache Mahout: B ParallelSGDFactorizer: lock-free and parallel implementation of SGD B ALSWRFactorizer: ALS with Weighted- -Regularization 15 of 21
  • 16. Experiments: Performance Metrics I Confusion matrix: true positives (TP) false negatives (FN) false positives (FP) true negatives (TN) I Matthews correlation coe cient (MCC) 2 [ 1, 1] MCC = TP ⇥ TN FP ⇥ FN p (TP + FP) ⇥ (TP + FN) ⇥ (TN + FP) ⇥ (TN + FN) B 1 ! perfect prediction; B 0 ! no better than random prediction; B 1 ! total disagreement between prediction and observation. I time needed to run the test; I the peak memory load during the test. 16 of 21
  • 17. Results: Accuracy 0 0.1 0.2 0.3 0.4 0.5 0.6 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) MCC number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 0 0.1 0.2 0.3 0.4 0.5 0.6 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) MCC number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 17 of 21
  • 18. Results: Computational Resources 0 2 4 6 8 10 12 14 16 18 20 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) memory(GB) number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 0 2 4 6 8 10 12 14 16 18 20 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) memory(GB) number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 0 0.2 Zoom 0 0.2 Zoom 18 of 21
  • 19. Results: Timeliness 0 500 1000 1500 2000 2500 3000 3500 4000 4500 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) time(s) number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 0 500 1000 1500 2000 2500 3000 3500 4000 4500 MovieLens (105 ) MovieLens (107 ) Netflix (108 ) time(s) number of ratings LCBM SGD K=256 SGD K=32 SGD K=8 ALS 0 10 Zoom 0 10 Zoom 19 of 21
  • 20. Conclusions I LCBM is a novel CF algorithm with binary ratings. I It works by analyzing collected ratings to: B infer a probability density function of the relative frequency of positive votes that the item will receive; B compute a personalized quality threshold for each user. I These two information are used to predict missing ratings. I LCBM is inherently parallelizable. Future works: I Handle multidimensional ratings. I Provide recommendations: a list of items the user will like the most. 20 of 21
  • 21. Thank you! Questions? Fabio Petroni Rome, Italy Current position: PhD Student in Computer Engineering, Sapienza University of Rome Research Areas: Recommendation Systems, Collaborative Filtering, Distributed Systems petroni@dis.uniroma1.it 21 of 21