SlideShare una empresa de Scribd logo
1 de 1
Descargar para leer sin conexión
A Comparison of How Demographic Data Affects Recommendation
                                                                         Alan Said, Till Plumbaum, and Ernesto W. De Luca
                                                                                                                           ¨
                                                                                            DAI Lab @ Technische Universitat Berlin
                                                                         User Modeling, Adaptation and Personalization (UMAP 2011)


  Abstract                                                                  Data characteristics

 What?’: An assumption that demographic information holds implicit          There are very clear differences between the different demographic
 information about users’ taste and interests. This information can be      groups, e.g. the difference between the most “postive” city and the
 used to improve recommendation results.                                    most “negative” city is 12.3 on a scale of 0-100, women tend to have
 How?: A comparison of recommendation results when using differ-            100 ratings less than men. Similarly, number of ratings increases by
 ent demographic features (age, location and gender) which are com-         age up to young adulthood and decreases by age afterwards.
 monly available in online recommendation communities. The com-
 parison is performed through a simple method that extends stan-
 dard collaborative filtering algorithms to include the demographic                                 70.3

 features.                                                                                                59.2
                                                                                  58.0
 Results: Results imply that simple demographic data (age, sex, lo-                                              69.9


 cation) can be beneficial for recommendation.                                             60.8




                                                                                            69.6


  Dataset

 The data comes from German movie recommendation community
                                                                             (a) The top three and bottom (b) Average age, rating and
 Moviepilota and contains users, movies, the ratings by users on
                                                                             three German cities according number of friends for women
 movies, user related data such as age, location and gender.
                                                                             to average movie ratings.     and men.

                                                                                         Feature Number of users
                                                                                         City                4, 400
                                                                                         Age                 1, 292
                                                                                         Sex                 6, 583
                                                                                         Total              10, 000
                                                                               Table: Number of users with each specific feature.



                                                                            Approach

                                                                               A k-NN algorithm generates neighborhoods of most similar
                                                                               users.
                                                                               Similarity is based on usage-similarity (ratings) as well as
                                                                               group belonging. The similarity of users within the same group
                                                                               is multiplied by a factor proportional to the number of users.
                                                                               Now, users from the same demographic groups are more im-
                                                                               portant when finding the most popular items in the neighbor-
                                                                               hood (i.e. the items to recommend).
 Demographic relations from Moviepilot In our experiments, we
 chose to evaluate three basic demographic properties:
     Location - users from the same cities                                  Results
     Age - users born in the same decade
     Gender - users of the same gender
                                                                              Type P@1010K                P@10          %   MAP10K   MAP     %
                                                                              City 2.79E − 4 2.66E − 4 4.7% 3.89E − 3 3.81E − 3 2.2%
         Type                 # of ratings                  %                 Age 2.45E − 4 2.45E − 4 0.0% 3.93E − e 3.93E − 3 0.0%
         City                    991, 845                64%                  Sex 2.86E − 4 2.33E − 4 22.9% 4.22E − 3 3.82E − 3 10.4%
         Age                  1, 144, 761                74%                    Table: Results for different demographic features.
         Gender               1, 398, 732                91%
         Total                1, 539, 393               100%                 Our initial tests show that our assumption, “demographic data
 Table: The number and percentage of total ratings avail-                    has an impact on CF”, is true. Gender, especially, seems to have
                                                                             a large impact with a resulting increase of 10% for MAP (and
 able for every demographic property.
                                                                             22% Precision at 10) are very promising. Age seems, however,
    a
        http://www.moviepilot.com                                            not to have a very high impact.




http://www.dai-labor.de/en/irml                                                    <firstname>.<lastname>@dai-lab.de

Más contenido relacionado

Más de Alan Said

State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012Alan Said
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge OpeningAlan Said
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesAlan Said
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyAlan Said
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsAlan Said
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Alan Said
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationAlan Said
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesAlan Said
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceAlan Said
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityAlan Said
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsAlan Said
 

Más de Alan Said (11)

State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012State of RecSys: Recap of RecSys 2012
State of RecSys: Recap of RecSys 2012
 
RecSysChallenge Opening
RecSysChallenge OpeningRecSysChallenge Opening
RecSysChallenge Opening
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
Estimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User StudyEstimating the Magic Barrier of Recommender Systems: A User Study
Estimating the Magic Barrier of Recommender Systems: A User Study
 
Users and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender SystemsUsers and Noise: The Magic Barrier of Recommender Systems
Users and Noise: The Magic Barrier of Recommender Systems
 
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
Analyzing Weighting Schemes in Collaborative Filtering: Cold Start, Post Cold...
 
CaRR 2012 Opening Presentation
CaRR 2012 Opening PresentationCaRR 2012 Opening Presentation
CaRR 2012 Opening Presentation
 
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending MoviesPersonalizing Tags: A Folksonomy-like Approach for Recommending Movies
Personalizing Tags: A Folksonomy-like Approach for Recommending Movies
 
Inferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender PerformanceInferring Contextual User Profiles - Improving Recommender Performance
Inferring Contextual User Profiles - Improving Recommender Performance
 
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation QualityUsing Social- and Pseudo-Social Networks to Improve Recommendation Quality
Using Social- and Pseudo-Social Networks to Improve Recommendation Quality
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 

Último

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 

Último (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 

A Comparison of How Demographic Data Affects Recommendation

  • 1. A Comparison of How Demographic Data Affects Recommendation Alan Said, Till Plumbaum, and Ernesto W. De Luca ¨ DAI Lab @ Technische Universitat Berlin User Modeling, Adaptation and Personalization (UMAP 2011) Abstract Data characteristics What?’: An assumption that demographic information holds implicit There are very clear differences between the different demographic information about users’ taste and interests. This information can be groups, e.g. the difference between the most “postive” city and the used to improve recommendation results. most “negative” city is 12.3 on a scale of 0-100, women tend to have How?: A comparison of recommendation results when using differ- 100 ratings less than men. Similarly, number of ratings increases by ent demographic features (age, location and gender) which are com- age up to young adulthood and decreases by age afterwards. monly available in online recommendation communities. The com- parison is performed through a simple method that extends stan- dard collaborative filtering algorithms to include the demographic 70.3 features. 59.2 58.0 Results: Results imply that simple demographic data (age, sex, lo- 69.9 cation) can be beneficial for recommendation. 60.8 69.6 Dataset The data comes from German movie recommendation community (a) The top three and bottom (b) Average age, rating and Moviepilota and contains users, movies, the ratings by users on three German cities according number of friends for women movies, user related data such as age, location and gender. to average movie ratings. and men. Feature Number of users City 4, 400 Age 1, 292 Sex 6, 583 Total 10, 000 Table: Number of users with each specific feature. Approach A k-NN algorithm generates neighborhoods of most similar users. Similarity is based on usage-similarity (ratings) as well as group belonging. The similarity of users within the same group is multiplied by a factor proportional to the number of users. Now, users from the same demographic groups are more im- portant when finding the most popular items in the neighbor- hood (i.e. the items to recommend). Demographic relations from Moviepilot In our experiments, we chose to evaluate three basic demographic properties: Location - users from the same cities Results Age - users born in the same decade Gender - users of the same gender Type P@1010K P@10 % MAP10K MAP % City 2.79E − 4 2.66E − 4 4.7% 3.89E − 3 3.81E − 3 2.2% Type # of ratings % Age 2.45E − 4 2.45E − 4 0.0% 3.93E − e 3.93E − 3 0.0% City 991, 845 64% Sex 2.86E − 4 2.33E − 4 22.9% 4.22E − 3 3.82E − 3 10.4% Age 1, 144, 761 74% Table: Results for different demographic features. Gender 1, 398, 732 91% Total 1, 539, 393 100% Our initial tests show that our assumption, “demographic data Table: The number and percentage of total ratings avail- has an impact on CF”, is true. Gender, especially, seems to have a large impact with a resulting increase of 10% for MAP (and able for every demographic property. 22% Precision at 10) are very promising. Age seems, however, a http://www.moviepilot.com not to have a very high impact. http://www.dai-labor.de/en/irml <firstname>.<lastname>@dai-lab.de