SlideShare a Scribd company logo
1 of 23
Download to read offline
Rus M. Mesas, Alejandro Bellogín
Universidad Autónoma de Madrid
Spain
RecSys, August 2017
Evaluating Decision-Aware
Recommender Systems
2
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
3
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
Method Precision Coverage Best?
R1 0.037 100%
R2 0.133 100%
R3 0.245 99.7% 
4
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
Method Precision Coverage Best?
R1 0.093 100% 
R2 0.094 97.8%
Method Precision Coverage Best?
R1 0.037 100%
R2 0.133 100%
R3 0.245 99.7% 
Method Precision Coverage Best?
R1 0.093 100%
R2 0.181 95.6% ?
R3 0.283 59.0% ?
R4 0.326 28.2% ?
5
Alejandro Bellogín – RecSys, August 2017
Main idea
▪ How to balance coverage and precision
▪ To force different coverage levels, we allow
recommenders to decide if a recommendation is
worthy of being presented to the user or not
Estimations
6
Alejandro Bellogín – RecSys, August 2017
Balancing coverage and precision
▪ [Herlocker et al 2004]: “there is no general
coverage metric that, at the same time, gives more
weight to relevant items when accounting for
coverage, and combines coverage and accuracy
measures”
▪ [Gunawardana & Shani 2015] leave the problem of
balancing coverage and precision as an open issue
in the area
7
Alejandro Bellogín – RecSys, August 2017
Combination metrics
8
Alejandro Bellogín – RecSys, August 2017
Our proposal: Correctness metric
▪ Adapted from Question Answering:
• Several questions to be answered by a system
• Each question has several options
• Only one option is correct
• If an answer is not given, it should not be considered as
an incorrect answer
• Hence, if two systems have the same number of correct
answers but one has failed less questions (it has decided
not to respond), it should be better than the other one
A. Peñas & Á. Rodrigo. 2011. A simple measure to assess non response. ACL.
9
Alejandro Bellogín – RecSys, August 2017
Correctness metric for recommendation
▪ Each recommendation algorithm is a system
▪ Each candidate item to be ranked is a question
▪ If an item is recommended, it could be relevant or
not
▪ The same set of items is presented to each system
Recommended list Precision@5 Correctness
10
Alejandro Bellogín – RecSys, August 2017
Correctness metrics for recommendation
▪ Four instantiations:
• Based on users
• Based on items
11
Alejandro Bellogín – RecSys, August 2017
What about the decision-aware
recommenders?
Estimations
12
Alejandro Bellogín – RecSys, August 2017
Decision-aware recommender systems
▪ Exploiting the confidence a system has on its own
recommendations
▪ Not completely new
• Significance weighting
• Support and confidence in case-based recommenders
▪ Focus on Collaborative Filtering algorithms
• Support of prediction score of nearest-neighbour
methods
• Uncertainty in prediction score of a probabilistic matrix
factorisation algorithm
13
Alejandro Bellogín – RecSys, August 2017
Estimating confidence in
decision-aware recommendation
▪ For user-based KNN
▪ For probabilistic MF
At least n (out of k)
neighbours have
participated in
rating estimation?
14
Alejandro Bellogín – RecSys, August 2017
Experimental setup
▪ Datasets
• MovieLens 100K, MovieLens 1M, Jester
• Random 5-fold training/test split
▪ Evaluation
• Generate a ranking with every item in the test set
• Metrics at cutoff 10: precision (P), user space coverage
(USC), item space coverage (ISC), correctness (UC, RUC,
IC, RIC), novelty (EPC), diversity (AggrDiv)
▪ Frameworks
• RankSys: evaluation metrics, KNN recommenders
• RiVal: data splitting
15
Alejandro Bellogín – RecSys, August 2017
Performance: prediction uncertainty
16
Alejandro Bellogín – RecSys, August 2017
Impact on novelty and diversity
▪ Prediction uncertainty
• More strict constraints (smaller uncertainty) decrease
novelty and diversity
17
Alejandro Bellogín – RecSys, August 2017
Conclusions
▪ We have proposed a family of metrics based on
the assumption that it is better to avoid a
recommendation rather than providing a bad
recommendation
▪ We have shown that a balance between precision,
coverage, diversity, and novelty is critical
▪ We have proposed two strategies to decide if an
item should be presented to the user
18
Alejandro Bellogín – RecSys, August 2017
Future work
▪ Extend the correctness metrics to combine other
evaluation dimensions
▪ Objective way to discriminate between systems:
which one is really the best one?
▪ Consider the psychological aspect of the
recommendation: the user is expecting to receive
N recommendations (better bad than none?)
19
Alejandro Bellogín – RecSys, August 2017
Thank you
Evaluating Decision-Aware
Recommender Systems
Rus M. Mesas, Alejandro Bellogín
Universidad Autónoma de Madrid
Spain
RecSys, August 2017
20
Alejandro Bellogín – RecSys, August 2017
Performance: prediction support
21
Alejandro Bellogín – RecSys, August 2017
Impact on novelty and diversity
▪ Prediction support
• Larger n decreases the
diversity and novelty of
the lists
• More popular items are
being recommended
22
Alejandro Bellogín – RecSys, August 2017
Motivation
▪ Typical evaluation: it is better to fail than avoiding
a recommendation
• Assumption: no returning an item is an advocate of that
item being considered as not relevant
▪ In this work: a recommender system may decide
not to recommend a specific item
• We need a metric where “no recommendation” does
not mean relevant or not relevant. If possible, it should
mean “better than not relevant”
23
Alejandro Bellogín – RecSys, August 2017
Definition of uncertainty for PMF
▪ PMF: probabilistic matrix factorisation using a
Bayesian approximation proposed in [Lim & Teh
2007]
▪ The standard deviation is derived using mean-field
variational inference:

More Related Content

Similar to Evaluating decision-aware recommender systems

Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22Matthias Schuurmans
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...Naveen Agarwal
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxDebdattaMandal3
 
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxQA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxTeshome62
 
925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptopRising Media, Inc.
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouExamSoft
 
Project excursion career_orientation
Project excursion career_orientationProject excursion career_orientation
Project excursion career_orientationMallikarjuna G D
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?Ganes Kesari
 
Systems thinking for social innovators
Systems thinking for social innovatorsSystems thinking for social innovators
Systems thinking for social innovatorssl2square
 
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...University of Twente
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his macRising Media, Inc.
 
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...Susan Hanley
 
Practice question how to address substance abuse disorders da
 Practice question how to address substance abuse disorders da Practice question how to address substance abuse disorders da
Practice question how to address substance abuse disorders daUMAR48665
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsProduct School
 
BSAD 310 Spring 2017 - CH 6
BSAD 310 Spring 2017 - CH 6BSAD 310 Spring 2017 - CH 6
BSAD 310 Spring 2017 - CH 6Janice Robinson
 
ARPA-E Evaluation 2017
ARPA-E Evaluation 2017ARPA-E Evaluation 2017
ARPA-E Evaluation 2017Tim Kirby
 
Leveraging Analytics to Deliver Value
Leveraging Analytics to Deliver ValueLeveraging Analytics to Deliver Value
Leveraging Analytics to Deliver ValueMarketo
 

Similar to Evaluating decision-aware recommender systems (20)

Estimating what does good look like
Estimating   what does good look likeEstimating   what does good look like
Estimating what does good look like
 
Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22Behind The Scenes Data Science Coolblue 2018-03-22
Behind The Scenes Data Science Coolblue 2018-03-22
 
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
JU Analytics Day Presentation by Naveen Agarwal, Creative Analytics Solutions...
 
week_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptxweek_10._validity_and_reliability_0.pptx
week_10._validity_and_reliability_0.pptx
 
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptxQA_Chapter_01_Dr_B_Dayal_Overview.pptx
QA_Chapter_01_Dr_B_Dayal_Overview.pptx
 
925 plenary rexer_using our laptop
925 plenary rexer_using our laptop925 plenary rexer_using our laptop
925 plenary rexer_using our laptop
 
HM404 Ab120916 ch12
HM404 Ab120916 ch12HM404 Ab120916 ch12
HM404 Ab120916 ch12
 
Psychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling YouPsychometrics 101: Know What Your Assessment Data is Telling You
Psychometrics 101: Know What Your Assessment Data is Telling You
 
Project excursion career_orientation
Project excursion career_orientationProject excursion career_orientation
Project excursion career_orientation
 
How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?How to Enter the Data Analytics Industry?
How to Enter the Data Analytics Industry?
 
Systems thinking for social innovators
Systems thinking for social innovatorsSystems thinking for social innovators
Systems thinking for social innovators
 
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
Data Quality: The Data Science struggle nobody mentions - Data Science MeetUp...
 
1555 track 1 huang_using his mac
1555 track 1 huang_using his mac1555 track 1 huang_using his mac
1555 track 1 huang_using his mac
 
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
SharePoint "Moneyball" - The Art and Science of Winning the SharePoint Metric...
 
Evia2017wcw
Evia2017wcwEvia2017wcw
Evia2017wcw
 
Practice question how to address substance abuse disorders da
 Practice question how to address substance abuse disorders da Practice question how to address substance abuse disorders da
Practice question how to address substance abuse disorders da
 
Better Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data DecisionsBetter Living Through Analytics - Strategies for Data Decisions
Better Living Through Analytics - Strategies for Data Decisions
 
BSAD 310 Spring 2017 - CH 6
BSAD 310 Spring 2017 - CH 6BSAD 310 Spring 2017 - CH 6
BSAD 310 Spring 2017 - CH 6
 
ARPA-E Evaluation 2017
ARPA-E Evaluation 2017ARPA-E Evaluation 2017
ARPA-E Evaluation 2017
 
Leveraging Analytics to Deliver Value
Leveraging Analytics to Deliver ValueLeveraging Analytics to Deliver Value
Leveraging Analytics to Deliver Value
 

More from Alejandro Bellogin

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Alejandro Bellogin
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsAlejandro Bellogin
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationAlejandro Bellogin
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationAlejandro Bellogin
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...Alejandro Bellogin
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013Alejandro Bellogin
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013Alejandro Bellogin
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyAlejandro Bellogin
 
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsUnderstanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsAlejandro Bellogin
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Alejandro Bellogin
 
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Alejandro Bellogin
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Alejandro Bellogin
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Alejandro Bellogin
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Alejandro Bellogin
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesAlejandro Bellogin
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamAlejandro Bellogin
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterAlejandro Bellogin
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Alejandro Bellogin
 

More from Alejandro Bellogin (19)

Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?Recommender Systems and Misinformation: The Problem or the Solution?
Recommender Systems and Misinformation: The Problem or the Solution?
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Implicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix FactorizationImplicit vs Explicit trust in Social Matrix Factorization
Implicit vs Explicit trust in Social Matrix Factorization
 
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluationRiVal - A toolkit to foster reproducibility in Recommender System evaluation
RiVal - A toolkit to foster reproducibility in Recommender System evaluation
 
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
HT2014 Tutorial: Evaluating Recommender Systems - Ensuring Replicability of E...
 
CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013CWI @ Contextual Suggestion track - TREC 2013
CWI @ Contextual Suggestion track - TREC 2013
 
CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013CWI @ Federated Web Track - TREC 2013
CWI @ Federated Web Track - TREC 2013
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
Understanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender SystemsUnderstanding Similarity Metrics in Neighbour-based Recommender Systems
Understanding Similarity Metrics in Neighbour-based Recommender Systems
 
Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?Artist popularity: do web and social music services agree?
Artist popularity: do web and social music services agree?
 
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
Improving Memory-Based Collaborative Filtering by Neighbour Selection based o...
 
Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...Performance prediction and evaluation in Recommender Systems: an Information ...
Performance prediction and evaluation in Recommender Systems: an Information ...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
Using Graph Partitioning Techniques for Neighbour Selection in User-Based Col...
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 
Predicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - SlidesPredicting performance in Recommender Systems - Slides
Predicting performance in Recommender Systems - Slides
 
Predicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slamPredicting performance in Recommender Systems - Poster slam
Predicting performance in Recommender Systems - Poster slam
 
Predicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - PosterPredicting performance in Recommender Systems - Poster
Predicting performance in Recommender Systems - Poster
 
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
Precision-oriented Evaluation of Recommender Systems: An Algorithmic Comparis...
 

Recently uploaded

4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptxmary850239
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1GloryAnnCastre1
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdfMr Bounab Samir
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Association for Project Management
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxkarenfajardo43
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 

Recently uploaded (20)

4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx4.11.24 Mass Incarceration and the New Jim Crow.pptx
4.11.24 Mass Incarceration and the New Jim Crow.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1Reading and Writing Skills 11 quarter 4 melc 1
Reading and Writing Skills 11 quarter 4 melc 1
 
MS4 level being good citizen -imperative- (1) (1).pdf
MS4 level   being good citizen -imperative- (1) (1).pdfMS4 level   being good citizen -imperative- (1) (1).pdf
MS4 level being good citizen -imperative- (1) (1).pdf
 
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
Team Lead Succeed – Helping you and your team achieve high-performance teamwo...
 
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptxGrade Three -ELLNA-REVIEWER-ENGLISH.pptx
Grade Three -ELLNA-REVIEWER-ENGLISH.pptx
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 

Evaluating decision-aware recommender systems

  • 1. Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017 Evaluating Decision-Aware Recommender Systems
  • 2. 2 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8%
  • 3. 3 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7% 
  • 4. 4 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision Method Precision Coverage Best? R1 0.093 100%  R2 0.094 97.8% Method Precision Coverage Best? R1 0.037 100% R2 0.133 100% R3 0.245 99.7%  Method Precision Coverage Best? R1 0.093 100% R2 0.181 95.6% ? R3 0.283 59.0% ? R4 0.326 28.2% ?
  • 5. 5 Alejandro Bellogín – RecSys, August 2017 Main idea ▪ How to balance coverage and precision ▪ To force different coverage levels, we allow recommenders to decide if a recommendation is worthy of being presented to the user or not Estimations
  • 6. 6 Alejandro Bellogín – RecSys, August 2017 Balancing coverage and precision ▪ [Herlocker et al 2004]: “there is no general coverage metric that, at the same time, gives more weight to relevant items when accounting for coverage, and combines coverage and accuracy measures” ▪ [Gunawardana & Shani 2015] leave the problem of balancing coverage and precision as an open issue in the area
  • 7. 7 Alejandro Bellogín – RecSys, August 2017 Combination metrics
  • 8. 8 Alejandro Bellogín – RecSys, August 2017 Our proposal: Correctness metric ▪ Adapted from Question Answering: • Several questions to be answered by a system • Each question has several options • Only one option is correct • If an answer is not given, it should not be considered as an incorrect answer • Hence, if two systems have the same number of correct answers but one has failed less questions (it has decided not to respond), it should be better than the other one A. Peñas & Á. Rodrigo. 2011. A simple measure to assess non response. ACL.
  • 9. 9 Alejandro Bellogín – RecSys, August 2017 Correctness metric for recommendation ▪ Each recommendation algorithm is a system ▪ Each candidate item to be ranked is a question ▪ If an item is recommended, it could be relevant or not ▪ The same set of items is presented to each system Recommended list Precision@5 Correctness
  • 10. 10 Alejandro Bellogín – RecSys, August 2017 Correctness metrics for recommendation ▪ Four instantiations: • Based on users • Based on items
  • 11. 11 Alejandro Bellogín – RecSys, August 2017 What about the decision-aware recommenders? Estimations
  • 12. 12 Alejandro Bellogín – RecSys, August 2017 Decision-aware recommender systems ▪ Exploiting the confidence a system has on its own recommendations ▪ Not completely new • Significance weighting • Support and confidence in case-based recommenders ▪ Focus on Collaborative Filtering algorithms • Support of prediction score of nearest-neighbour methods • Uncertainty in prediction score of a probabilistic matrix factorisation algorithm
  • 13. 13 Alejandro Bellogín – RecSys, August 2017 Estimating confidence in decision-aware recommendation ▪ For user-based KNN ▪ For probabilistic MF At least n (out of k) neighbours have participated in rating estimation?
  • 14. 14 Alejandro Bellogín – RecSys, August 2017 Experimental setup ▪ Datasets • MovieLens 100K, MovieLens 1M, Jester • Random 5-fold training/test split ▪ Evaluation • Generate a ranking with every item in the test set • Metrics at cutoff 10: precision (P), user space coverage (USC), item space coverage (ISC), correctness (UC, RUC, IC, RIC), novelty (EPC), diversity (AggrDiv) ▪ Frameworks • RankSys: evaluation metrics, KNN recommenders • RiVal: data splitting
  • 15. 15 Alejandro Bellogín – RecSys, August 2017 Performance: prediction uncertainty
  • 16. 16 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction uncertainty • More strict constraints (smaller uncertainty) decrease novelty and diversity
  • 17. 17 Alejandro Bellogín – RecSys, August 2017 Conclusions ▪ We have proposed a family of metrics based on the assumption that it is better to avoid a recommendation rather than providing a bad recommendation ▪ We have shown that a balance between precision, coverage, diversity, and novelty is critical ▪ We have proposed two strategies to decide if an item should be presented to the user
  • 18. 18 Alejandro Bellogín – RecSys, August 2017 Future work ▪ Extend the correctness metrics to combine other evaluation dimensions ▪ Objective way to discriminate between systems: which one is really the best one? ▪ Consider the psychological aspect of the recommendation: the user is expecting to receive N recommendations (better bad than none?)
  • 19. 19 Alejandro Bellogín – RecSys, August 2017 Thank you Evaluating Decision-Aware Recommender Systems Rus M. Mesas, Alejandro Bellogín Universidad Autónoma de Madrid Spain RecSys, August 2017
  • 20. 20 Alejandro Bellogín – RecSys, August 2017 Performance: prediction support
  • 21. 21 Alejandro Bellogín – RecSys, August 2017 Impact on novelty and diversity ▪ Prediction support • Larger n decreases the diversity and novelty of the lists • More popular items are being recommended
  • 22. 22 Alejandro Bellogín – RecSys, August 2017 Motivation ▪ Typical evaluation: it is better to fail than avoiding a recommendation • Assumption: no returning an item is an advocate of that item being considered as not relevant ▪ In this work: a recommender system may decide not to recommend a specific item • We need a metric where “no recommendation” does not mean relevant or not relevant. If possible, it should mean “better than not relevant”
  • 23. 23 Alejandro Bellogín – RecSys, August 2017 Definition of uncertainty for PMF ▪ PMF: probabilistic matrix factorisation using a Bayesian approximation proposed in [Lim & Teh 2007] ▪ The standard deviation is derived using mean-field variational inference: