SlideShare una empresa de Scribd logo
1 de 39
It Takes Two to Tango: an
Exploration of Domain Pairs for
Cross-Domain Collaborative Filtering
Shaghayegh Sahebi1 and Peter Brusilovsky1,2
1 Intelligent Systems Program, University of Pittsburgh
2 School of Information Sciences, University of Pittsburgh
@pawslab
Our Goals
• Explore added value of cross-domain
recommendations
– compared to single-domain recommenders
• Characterize useful auxiliary domains for a
target domain
– Or promising domain-pairs
It Takes Two to Tango 2
How We Got There: Ideas
• Using external information for better
recommendation (especially in cold-start)
• Using ratings/data from external domain (i.e.,
books rating to recommend movies) – does it
help?
• Some pairs can tango, some can’t. What’s the
secret?
• Canonical correlation could be the key
• Could we also use it as recommendation
approach?
It Takes Two to Tango 3
How We Got There: Papers
• Sahebi, S., Wongchokprasitti, C., and Brusilovsky, P.
(2010) Recommending research colloquia: a study of
several sources for user profiling. In: Proceedings of
the 1st International Workshop on Information
Heterogeneity and Fusion in Recommender Systems
(HetRec 2010) at RecSys 2010
• Sahebi, S. and Brusilovsky, P. (2013) Cross-Domain
Collaborative Recommendation in a Cold-Start Context:
The Impact of User Profile Size on the Quality of
Recommendation. In: Proceedings of UMAP 2013
• This paper
It Takes Two to Tango 4
Our Work
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
Whether the recommendation algorithm also matters in the
cross-domain recommendation results;
the data characteristics that affect the prediction error of
approaches;
the domain-pair characteristics that affect the amount of
recommendation improvements;
and the nature of suitable domain pairs.
It Takes Two to Tango 5
Canonical Correlation Analysis
• Multivariate statistical model
– interrelationships among sets of multiple dependent
and multiple independent variables
• Goal: produce the maximum correlation between the
dimensions
– linear combination representing the weighted sum of
two or more variables
– relationship between two linear composites: strength
of the relationship between the sets of variables
It Takes Two to Tango 6
Canonical Correlation Analysis (2)
b1
b2
m1
m2
0.7m2+0.3m1
0.2b2+0.8b1
It Takes Two to Tango 7
Canonical Correlation Analysis (3)
It Takes Two to Tango 8
Application of CCA to Cross-Domain
Recommenders
• Common users in two domains
• Dependent variables: items in target domain
• Independent variables: items in source domain
• Calculates components of each domain
– 2 sets of items
– most similar to each other based on user rating
behavior
• Determines how much the two components are
correlated to one another
It Takes Two to Tango 9
CCA-based Cross-Domain
Recommender (CD-CCA)
• Projection vectors wx and wy show:
– how the ratings in source domain (X) affect the
ratings in target domain
– how much this effect is
It Takes Two to Tango 10
CD-CCA (2)
• Estimate ratings in target domain (Y) by using:
– projection vectors (wx and wy);
– source domain ratings (X);
– and canonical correlation value (ρ)
It Takes Two to Tango 11
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
if the recommendation algorithm also matters in the cross-
domain recommendation results;
the data characteristics that affect the prediction error of
approaches;
the domain-pair characteristics that affect the amount of
recommendation improvements;
and the nature of suitable domain pairs.
It Takes Two to Tango 12
The Design
• Yelp academic dataset
– 21 categories (domains)
– ratings between 1 and 5
• Does it depends on a pair
– Evaluate cross-domain recommendation on all
meaningful pairs
• Does the algorithm matter?
– Compare 2 cross-domain and one single-domain
approaches
It Takes Two to Tango 13
Yelp Dataset
• A rich dataset containing a varied set of
domain characteristics
Min Max Mean Median
Number of
Users
9 11013 1064.09 424
Number of
Items
8 4435 406.89 252.5
Rating
Density
0.0017 0.1581 0.017 0.0084
It Takes Two to Tango 14
Which Pairs Can Tango?
• Exclude category pairs that
#common_users < #items
– 158 domain pairs
• Run Experiments twice per domain pair
– switching the source (independent) and target
(dependent) domains (variable sets)
It Takes Two to Tango 15
The Role of the Approach
• Single-domain setting (SD-SVD): using only target
domain’s ratings
– Does not consider information from source domain
• Cross-domain setting (CD-SVD): concatenating
source and target rating matrices
– Users information from the source domain, but maybe
not in the best way
• CD-CCA as the main approach
– Possibly, maximizing the value of source information
It Takes Two to Tango 16
Experimental Setup
• Baseline: SVD++
– Single-domain setting (SD-SVD): using only target
domain’s ratings
– cross-domain setting (CD-SVD): concatenating
source and target rating matrices
17
? ?
? ? ?
? ?
? ? ?
CommonUsers
Target Items Source Items All Items
Experimental Setup (2)
• 5-fold user-stratified cross-validation on target domain
– 80% of the users in training; 20% of the users in testing; 15% of train
as validation set (for finding parameters)
• to obtain a partial profile for each user
– add 20% of each test user's target ratings to training
It Takes Two to Tango 18
? ? ? ? ? ?
? ? ? ? ? ?
TrainUsersTestUsers
EvalUsers
Training Target Data
Testing Target Data
Results: Mixed Results for RMSE of
Domain Pairs
It Takes Two to Tango 19
RMSE of Approaches are Correlated
• If RMSE is low in single-domain, it is most
likely low for cross-domain, and vice versa
**: significant with p_value < 0.01
Correlation
(R-Values)
CD-CCA CD-SVD SD-SVD
CD-CCA - 0.7896** 0.7779**
CD-SVD 0.7896** - 0.9550**
SD-SVD 0.7779** 0.9550** -
It Takes Two to Tango 20
21
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
the data characteristics that affect the prediction error of
approaches;
the domain-pair characteristics that affect the amount of
recommendation improvements;
and the nature of suitable domain pairs.
It Takes Two to Tango 21
What is the Approach Effect on
Recommendation Results?
• Cross-domain collaborative filtering either
improves, or will not significantly change results
– CD-CCA >* SD-SVD in 77 domain pairs;
– CD-CCA >* CD-SVD in 74 domain pairs;
– CD-SVD >* SD-SVD in 9 domain pairs;
– In rest of the domain pairs: work similarly
• The algorithm matters: CD-CCA captures more
common information than CD-SVD
It Takes Two to Tango 22
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
if the recommendation algorithm also matters in the cross-
domain recommendation results;
the data characteristics that affect the prediction
error of approaches;
the domain-pair characteristics that affect the amount of
recommendation improvements;
and the nature of suitable domain pairs.
It Takes Two to Tango 23
What Data Characteristics Affect
Prediction Error?
• Study correlation of domain characteristics with
RMSE
– user space size, items space size, domain densities
*: significant with p_value < 0.05
Correlation
(R-Values)
User Size Target Item
Size
Source
Item Size
Target
Density
Source
Density
CD-CCA -0.1782* -0.1250 -0.1239 -0.0502 0.0515
CD-SVD -0.1745* -0.1445 -0.1274 -0.1346 -0.1161
SD-SVD -0.1455 -0.1225 - -0.1525 -
It Takes Two to Tango 24
What Data Characteristics Affect
Prediction Error?
• The more common users, the better the cross-
domain recommendations
– Other factors are insignificant
It Takes Two to Tango 25
26
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
if the recommendation algorithm also matters in the cross-
domain recommendation results;
the data characteristics that affect the prediction error of
approaches;
the domain-pair characteristics that affect the
amount of recommendation improvements;
and the nature of suitable domain pairs.
It Takes Two to Tango 26
What Data Characteristics Affect Cross-Domain
Recommendation Improvement?
It Takes Two to Tango 27
What Data Characteristics Affect Cross-Domain
Recommendation Improvement? (2)
• Additional domain characteristics:
– user size to item size ratio
– source item size to target item size ratio
– source density to target density ratio
– percentage of CCA correlation coefficients > 0.8,
0.9, and 0.95
• Improvement Ratio (IR)
It Takes Two to Tango 28
What Data Characteristics Affect Cross-Domain
Recommendation Improvement? (Single-
Domain Features)
Correlations
(R-value)
User Size Source
Item Size
Target
Item size
Source
Density
Target
Density
CD-CCA vs.
CD-SVD
0.3924*** 0.3292** 0.4332*** -0.4450*** -0.7313***
CD-CCA vs.
SD-SVD
0.3287*** 0.2825* 0.4206*** -0.4031*** -0.6973***
CD-SVD vs.
SD-SVD
0.3072 0.3989 0.916 -0.6881* -0.2070
***:significant with p_value < 0.001; **: significant with p_value < 0.01; *: significant with p_value
< 0.05 It Takes Two to Tango 29
What Data Characteristics Affect Cross-Domain
Recommendation Improvement? (Cross-Domain
Features)
Correlati
ons (R-
value)
User to
Target
Item
Ratio
User to
Source
Item
Ratio
% of CCA
> 0.8
% of CCA
> 0.9
% of CCA
> 0.95
Source
to Target
Density
Ratio
Source
to Target
Item Size
Ratio
CD-CCA
vs. CD-
SVD
0.0565 0.2805* 0.2603* 0.3563** 0.4000**
*
0.2723* -0.1711
CD-CCA
vs. SD-
SVD
-0.0659 0.2207 0.2503* 0.3633** 0.4155** 0.2096 -0.2620*
CD-SVD
vs. SD-
SVD
0.0646 -0.3506 0.5999 0.6579 0.6701* -0.4295 0.1343
It Takes Two to Tango 30
What Data Characteristics Affect Cross-Domain
Recommendation Improvement? (4)
• Correlation with improvement ratio:
– most positive correlation:
• source density
• percentage of CCA coefficients > 0.95
– Negative correlation:
• source-domain density
• Target domain density
• ratio of source item size to target item size
– Only “user size to target item size Ratio” is not
significant
It Takes Two to Tango 31
Propose to use Canonical Correlation of the domains
as the main factor for domain analysis
Propose a cross-domain recommender system based
on Canonical Correlation Analysis (CCA)
Analyze 158 domain pairs to find out:
if the recommendation algorithm also matters in the cross-
domain recommendation results;
the data characteristics that affect the prediction error of
approaches;
the domain-pair characteristics that affect the amount of
recommendation improvements;
and the nature of suitable domain pairs.
Are domain pairs with high correlation suitable
cross-domain pairs?
Do domain pairs with a high improvement ratio
have a high correlation factor?
32
What is the Nature of Good Domain-
Pair Choices?
• Are domain pairs with high correlation
suitable cross-domain pairs?
• Do all domain pairs with a high improvement
ratio have a high CCA correlation factor? (Or is
having high CCA enough?)
It Takes Two to Tango 33
Are domain pairs with high correlation
suitable cross-domain pairs?
• Look at category pairs in 10percentile higher
percentage of CCA correlation coefficients >
0.8
• Large CCA correlation affects improvement of
cross-domain recommenders in:
– “Food  Arts and Entertainment”
– “Arts and Entertainment Food”
– “Restaurants Food.”
It Takes Two to Tango 34
Are domain pairs with high correlation
suitable cross-domain pairs? (2)
• For some domain pairs CD-CCA works better than
CD-SVD.
• Domain pairs that are inherently closer to each
other, but CD-SVD doesn’t get it
– “Restaurants Nightlife” (and vice versa)
– “Event Planning Hotels & Travel” (and vice versa)
• Domain pairs with high CCA that don’t look
inherently similar
– “Shopping Arts & Entertainments”
– “Pets Nightlife”
It Takes Two to Tango 35
Is High CCA Enough?
• High IR and low CCA
– “Education Local Flavor”
• Source and target domains' item sizes and user sizes are low
– “Event Planning Active Life”
• high user size and target item size, low source to target item
size ratio and target and source sparsity
• High CCA and not significant improvement ratio
– “Home Services Professional Services” (and vice
versa)
It Takes Two to Tango 36
Conclusions
• Proposed to use Canonical Correlation of the
domains as the main factor for domain
analysis
• Proposed a cross-domain recommender
system based on Canonical Correlation
Analysis (CCA)
• Analyzed 158 domain pairs characteristics
with cross and single-domain
recommendation results
It Takes Two to Tango 37
Conclusions
• Number of common users is an important factor for
RMSE of cross-domain recommenders
• Canonical Correlations
– An important factor in increasing quality improvement
ratio and determining suitable domain pairs
• Other factors affect improvement ratio
– source and target domain densities, number of common
users, and number of items
• Although some domain pairs do not seem similar, they
might share hidden and useful information that can be
captured by CCA
• However relying only on CCA might not be enough
It Takes Two to Tango 38
It Takes Two to Tango 39
Thank You!
peterb@pitt.edu
shs106@pitt.edu

Más contenido relacionado

Similar a It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collaborative Filtering

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...Mladen Jovanovic
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...GUANGYUAN PIAO
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...Holistic Benchmarking of Big Linked Data
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label ClassificationYONG ZHENG
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Sarasi Sarangi
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comSimon Hughes
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationAlessandro Benedetti
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...OpenSource Connections
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationSease
 

Similar a It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collaborative Filtering (20)

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
PyCon Balkans 2018 // Recommender systems - collaborative filtering and dimen...
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
JIST2015-Computing the Semantic Similarity of Resources in DBpedia for Recomm...
 
Quality key users
Quality key usersQuality key users
Quality key users
 
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F... Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
Extending LargeRDFBench for Multi-Source Data at Scale for SPARQL Endpoint F...
 
Extended LargeRDFBench
Extended LargeRDFBenchExtended LargeRDFBench
Extended LargeRDFBench
 
[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification[WI 2014]Context Recommendation Using Multi-label Classification
[WI 2014]Context Recommendation Using Multi-label Classification
 
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph AlgorithmsNeo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
Neo4j Graph Data Science Training - June 9 & 10 - Slides #6 Graph Algorithms
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016Domainspecificsubgraph extraction ieee-bigdata2016
Domainspecificsubgraph extraction ieee-bigdata2016
 
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.comEnhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
Enhancing Enterprise Search with Machine Learning - Simon Hughes, Dice.com
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: An Open Source Approach for Search Quality Evaluation
 
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
Haystack 2019 - Rated Ranking Evaluator: an Open Source Approach for Search Q...
 
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality EvaluationRated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
Rated Ranking Evaluator: an Open Source Approach for Search Quality Evaluation
 

Último

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 

Último (20)

Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 

It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collaborative Filtering

  • 1. It Takes Two to Tango: an Exploration of Domain Pairs for Cross-Domain Collaborative Filtering Shaghayegh Sahebi1 and Peter Brusilovsky1,2 1 Intelligent Systems Program, University of Pittsburgh 2 School of Information Sciences, University of Pittsburgh @pawslab
  • 2. Our Goals • Explore added value of cross-domain recommendations – compared to single-domain recommenders • Characterize useful auxiliary domains for a target domain – Or promising domain-pairs It Takes Two to Tango 2
  • 3. How We Got There: Ideas • Using external information for better recommendation (especially in cold-start) • Using ratings/data from external domain (i.e., books rating to recommend movies) – does it help? • Some pairs can tango, some can’t. What’s the secret? • Canonical correlation could be the key • Could we also use it as recommendation approach? It Takes Two to Tango 3
  • 4. How We Got There: Papers • Sahebi, S., Wongchokprasitti, C., and Brusilovsky, P. (2010) Recommending research colloquia: a study of several sources for user profiling. In: Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2010) at RecSys 2010 • Sahebi, S. and Brusilovsky, P. (2013) Cross-Domain Collaborative Recommendation in a Cold-Start Context: The Impact of User Profile Size on the Quality of Recommendation. In: Proceedings of UMAP 2013 • This paper It Takes Two to Tango 4
  • 5. Our Work Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: Whether the recommendation algorithm also matters in the cross-domain recommendation results; the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. It Takes Two to Tango 5
  • 6. Canonical Correlation Analysis • Multivariate statistical model – interrelationships among sets of multiple dependent and multiple independent variables • Goal: produce the maximum correlation between the dimensions – linear combination representing the weighted sum of two or more variables – relationship between two linear composites: strength of the relationship between the sets of variables It Takes Two to Tango 6
  • 7. Canonical Correlation Analysis (2) b1 b2 m1 m2 0.7m2+0.3m1 0.2b2+0.8b1 It Takes Two to Tango 7
  • 8. Canonical Correlation Analysis (3) It Takes Two to Tango 8
  • 9. Application of CCA to Cross-Domain Recommenders • Common users in two domains • Dependent variables: items in target domain • Independent variables: items in source domain • Calculates components of each domain – 2 sets of items – most similar to each other based on user rating behavior • Determines how much the two components are correlated to one another It Takes Two to Tango 9
  • 10. CCA-based Cross-Domain Recommender (CD-CCA) • Projection vectors wx and wy show: – how the ratings in source domain (X) affect the ratings in target domain – how much this effect is It Takes Two to Tango 10
  • 11. CD-CCA (2) • Estimate ratings in target domain (Y) by using: – projection vectors (wx and wy); – source domain ratings (X); – and canonical correlation value (ρ) It Takes Two to Tango 11
  • 12. Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: if the recommendation algorithm also matters in the cross- domain recommendation results; the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. It Takes Two to Tango 12
  • 13. The Design • Yelp academic dataset – 21 categories (domains) – ratings between 1 and 5 • Does it depends on a pair – Evaluate cross-domain recommendation on all meaningful pairs • Does the algorithm matter? – Compare 2 cross-domain and one single-domain approaches It Takes Two to Tango 13
  • 14. Yelp Dataset • A rich dataset containing a varied set of domain characteristics Min Max Mean Median Number of Users 9 11013 1064.09 424 Number of Items 8 4435 406.89 252.5 Rating Density 0.0017 0.1581 0.017 0.0084 It Takes Two to Tango 14
  • 15. Which Pairs Can Tango? • Exclude category pairs that #common_users < #items – 158 domain pairs • Run Experiments twice per domain pair – switching the source (independent) and target (dependent) domains (variable sets) It Takes Two to Tango 15
  • 16. The Role of the Approach • Single-domain setting (SD-SVD): using only target domain’s ratings – Does not consider information from source domain • Cross-domain setting (CD-SVD): concatenating source and target rating matrices – Users information from the source domain, but maybe not in the best way • CD-CCA as the main approach – Possibly, maximizing the value of source information It Takes Two to Tango 16
  • 17. Experimental Setup • Baseline: SVD++ – Single-domain setting (SD-SVD): using only target domain’s ratings – cross-domain setting (CD-SVD): concatenating source and target rating matrices 17 ? ? ? ? ? ? ? ? ? ? CommonUsers Target Items Source Items All Items
  • 18. Experimental Setup (2) • 5-fold user-stratified cross-validation on target domain – 80% of the users in training; 20% of the users in testing; 15% of train as validation set (for finding parameters) • to obtain a partial profile for each user – add 20% of each test user's target ratings to training It Takes Two to Tango 18 ? ? ? ? ? ? ? ? ? ? ? ? TrainUsersTestUsers EvalUsers Training Target Data Testing Target Data
  • 19. Results: Mixed Results for RMSE of Domain Pairs It Takes Two to Tango 19
  • 20. RMSE of Approaches are Correlated • If RMSE is low in single-domain, it is most likely low for cross-domain, and vice versa **: significant with p_value < 0.01 Correlation (R-Values) CD-CCA CD-SVD SD-SVD CD-CCA - 0.7896** 0.7779** CD-SVD 0.7896** - 0.9550** SD-SVD 0.7779** 0.9550** - It Takes Two to Tango 20
  • 21. 21 Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. It Takes Two to Tango 21
  • 22. What is the Approach Effect on Recommendation Results? • Cross-domain collaborative filtering either improves, or will not significantly change results – CD-CCA >* SD-SVD in 77 domain pairs; – CD-CCA >* CD-SVD in 74 domain pairs; – CD-SVD >* SD-SVD in 9 domain pairs; – In rest of the domain pairs: work similarly • The algorithm matters: CD-CCA captures more common information than CD-SVD It Takes Two to Tango 22
  • 23. Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: if the recommendation algorithm also matters in the cross- domain recommendation results; the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. It Takes Two to Tango 23
  • 24. What Data Characteristics Affect Prediction Error? • Study correlation of domain characteristics with RMSE – user space size, items space size, domain densities *: significant with p_value < 0.05 Correlation (R-Values) User Size Target Item Size Source Item Size Target Density Source Density CD-CCA -0.1782* -0.1250 -0.1239 -0.0502 0.0515 CD-SVD -0.1745* -0.1445 -0.1274 -0.1346 -0.1161 SD-SVD -0.1455 -0.1225 - -0.1525 - It Takes Two to Tango 24
  • 25. What Data Characteristics Affect Prediction Error? • The more common users, the better the cross- domain recommendations – Other factors are insignificant It Takes Two to Tango 25
  • 26. 26 Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: if the recommendation algorithm also matters in the cross- domain recommendation results; the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. It Takes Two to Tango 26
  • 27. What Data Characteristics Affect Cross-Domain Recommendation Improvement? It Takes Two to Tango 27
  • 28. What Data Characteristics Affect Cross-Domain Recommendation Improvement? (2) • Additional domain characteristics: – user size to item size ratio – source item size to target item size ratio – source density to target density ratio – percentage of CCA correlation coefficients > 0.8, 0.9, and 0.95 • Improvement Ratio (IR) It Takes Two to Tango 28
  • 29. What Data Characteristics Affect Cross-Domain Recommendation Improvement? (Single- Domain Features) Correlations (R-value) User Size Source Item Size Target Item size Source Density Target Density CD-CCA vs. CD-SVD 0.3924*** 0.3292** 0.4332*** -0.4450*** -0.7313*** CD-CCA vs. SD-SVD 0.3287*** 0.2825* 0.4206*** -0.4031*** -0.6973*** CD-SVD vs. SD-SVD 0.3072 0.3989 0.916 -0.6881* -0.2070 ***:significant with p_value < 0.001; **: significant with p_value < 0.01; *: significant with p_value < 0.05 It Takes Two to Tango 29
  • 30. What Data Characteristics Affect Cross-Domain Recommendation Improvement? (Cross-Domain Features) Correlati ons (R- value) User to Target Item Ratio User to Source Item Ratio % of CCA > 0.8 % of CCA > 0.9 % of CCA > 0.95 Source to Target Density Ratio Source to Target Item Size Ratio CD-CCA vs. CD- SVD 0.0565 0.2805* 0.2603* 0.3563** 0.4000** * 0.2723* -0.1711 CD-CCA vs. SD- SVD -0.0659 0.2207 0.2503* 0.3633** 0.4155** 0.2096 -0.2620* CD-SVD vs. SD- SVD 0.0646 -0.3506 0.5999 0.6579 0.6701* -0.4295 0.1343 It Takes Two to Tango 30
  • 31. What Data Characteristics Affect Cross-Domain Recommendation Improvement? (4) • Correlation with improvement ratio: – most positive correlation: • source density • percentage of CCA coefficients > 0.95 – Negative correlation: • source-domain density • Target domain density • ratio of source item size to target item size – Only “user size to target item size Ratio” is not significant It Takes Two to Tango 31
  • 32. Propose to use Canonical Correlation of the domains as the main factor for domain analysis Propose a cross-domain recommender system based on Canonical Correlation Analysis (CCA) Analyze 158 domain pairs to find out: if the recommendation algorithm also matters in the cross- domain recommendation results; the data characteristics that affect the prediction error of approaches; the domain-pair characteristics that affect the amount of recommendation improvements; and the nature of suitable domain pairs. Are domain pairs with high correlation suitable cross-domain pairs? Do domain pairs with a high improvement ratio have a high correlation factor? 32
  • 33. What is the Nature of Good Domain- Pair Choices? • Are domain pairs with high correlation suitable cross-domain pairs? • Do all domain pairs with a high improvement ratio have a high CCA correlation factor? (Or is having high CCA enough?) It Takes Two to Tango 33
  • 34. Are domain pairs with high correlation suitable cross-domain pairs? • Look at category pairs in 10percentile higher percentage of CCA correlation coefficients > 0.8 • Large CCA correlation affects improvement of cross-domain recommenders in: – “Food  Arts and Entertainment” – “Arts and Entertainment Food” – “Restaurants Food.” It Takes Two to Tango 34
  • 35. Are domain pairs with high correlation suitable cross-domain pairs? (2) • For some domain pairs CD-CCA works better than CD-SVD. • Domain pairs that are inherently closer to each other, but CD-SVD doesn’t get it – “Restaurants Nightlife” (and vice versa) – “Event Planning Hotels & Travel” (and vice versa) • Domain pairs with high CCA that don’t look inherently similar – “Shopping Arts & Entertainments” – “Pets Nightlife” It Takes Two to Tango 35
  • 36. Is High CCA Enough? • High IR and low CCA – “Education Local Flavor” • Source and target domains' item sizes and user sizes are low – “Event Planning Active Life” • high user size and target item size, low source to target item size ratio and target and source sparsity • High CCA and not significant improvement ratio – “Home Services Professional Services” (and vice versa) It Takes Two to Tango 36
  • 37. Conclusions • Proposed to use Canonical Correlation of the domains as the main factor for domain analysis • Proposed a cross-domain recommender system based on Canonical Correlation Analysis (CCA) • Analyzed 158 domain pairs characteristics with cross and single-domain recommendation results It Takes Two to Tango 37
  • 38. Conclusions • Number of common users is an important factor for RMSE of cross-domain recommenders • Canonical Correlations – An important factor in increasing quality improvement ratio and determining suitable domain pairs • Other factors affect improvement ratio – source and target domain densities, number of common users, and number of items • Although some domain pairs do not seem similar, they might share hidden and useful information that can be captured by CCA • However relying only on CCA might not be enough It Takes Two to Tango 38
  • 39. It Takes Two to Tango 39 Thank You! peterb@pitt.edu shs106@pitt.edu

Notas del editor

  1. The value \rho shows the maximum canonical correlation that can be achieved by rotating the X and Y spaces in direction of wx and wy, respectively.
  2. Density = ratio of ratings to the number of possible ratings OR #Ratings/(#Users * #Items)
  3. Explanation of the figure: It shows the RMSE of algorithms on each domain-pair with errorbars (confidence interval = 95%) domain pairs on X axis, sorted based on RMSE of CD-CCA (to show the trend correlation between RMSEs of algorithms) The results show that in some domain pairs, cross-domain algorithms are performing better than single-domain and in some domains they don’t. Also, the trend shows that RMSE of different algorithms are correlated in domain-pairs
  4. Explanation of the figure: It shows the RMSE of algorithms on domain-pai rWITH SIGNIFICANT DIFFERENCE BETWEEN RMSE OF APPROACHES ONLY with errorbars (confidence interval = 95%) domain pairs on X axis, sorted based on RMSE of CD-CCA (to show the trend correlation between RMSEs of algorithms) The results show that if there is a significant difference between CD_CCA and SD_SVD (orCD_SVD), CD_CCA is always performing better OR CD_CCA is NEVER significantly worse than the other two
  5. We define improvement ratio to understand what domain characteristics result in more improvement of cross-domain vs. single-domain
  6. Restaurants Food means source domain is “Restaurants“ and target domain is “Food“