SlideShare una empresa de Scribd logo
1 de 12
Differences in Distributions and
Their Effect on Recommendation
System Performance
Why Collaborative Filtering Doesn’t Scale
(portions reference Prismatic’s Silicon Valley talk)
History of Recommendation
Overfitting
Distribution
of All Items
Across Users
Distribution of
All Items Across
All Users in the
Future
Concrete Set of
Past Items
Across Users
Concrete Set of
Future Items
Across Users
Recommender Systems Dilemma
Set of All Items Possible
Set of Items Known to Users in the Future
Set of Items Known to Users in the
Past
Set of Items
Recommended By
Recommenders
Items Viewed
Or Liked in
the Future
Items Users
Viewed Or Rated
in the Past
Items Seen in Ground
Truth Without
Changes in Item
Access
??????
Collaborative Filtering in Music
• Construct correlations between items from set of past known items
• Generate estimated distribution for past users across all items
• Hope ‘errors’ relate to future user liked items
• Gap between distributions escalates with the scale of data
Resulting Biases
Huge number of items where 50%+ of users only ever saw 20 songs a
month out of 3 million
Massive gap between all items and known items distribution
Cross Validation ground truth assumes the 50%+ users only ever saw
that new top 20 songs for the new set
Results are supposed to be based on if users knew all sets
Continuous user testing assumes ‘all items seen’ distributions, but
only the set of recommended items are new items seen
User data itself is a biased subset of the whole
First Generation Problems
• Everyone likes The Beatles or Norah Jones
• Extremely frequent in biased data sets
• Since everyone listened to before, everyone gets recommended them
• Recommendations usually repeat the top 40 of the data collection
• Users might like novel recommendations, but that won’t ever be in
the evaluation set in cross validation – users never saw them
Problems Over Time
• The ground truth is heavily biased by recommendations controlling
the set of known items
• Machine learning – including collaborative filtering – learns the algorithm
distribution more than users preferences
• Performance Bias
• Future ground truth comes from those that stayed in the system
• They liked the system
• It doesn’t represent those that were unhappy and left
• Biases data to keep existing users happy without regard to ex-users
• In extreme cases, even new users are discarded
Best Solution So Far
Past Data Idealized Future Distribution
Idealized Function Feature Value => Rating
Best Solution So Far
• Requires all Items be categorized and quantized
• Requires accuracy and general agreement on these values
• (Socially Defined versus Absolute)
• At least all features are present in all sets
• Transforms recommendation into optimization and personalization
• Set of items with highest score for a user
• Ability to predict poor performing product or agent solutions
• Better able to incorporate additional data
• Prediction is usually linear time over the number of items
Evaluation Adjustments
• No Replacement for Real World A/B testing
• Machine Learning for evaluation, not just the question
• Hidden dependencies and ‘cheating’
Learned Algorithm Model Training
Evaluation
Model
Model
Training
Business
Objective
Ground Truth
Distribution Problems in Recommender Systems

Más contenido relacionado

Similar a Distribution Problems in Recommender Systems

Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...bogwonch
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviewsmaranlar
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmVaibhav Varshney
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket datamniranjanmurthy
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Dakiry
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation systemAkashPatil334
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricEdward Baker
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptxDr.Shweta
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentationnirvdrum
 

Similar a Distribution Problems in Recommender Systems (20)

Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...
IFIP Summer School 2015 - Using Authorization Logic to Capture User Policies ...
 
Product Recommendations Enhanced with Reviews
Product Recommendations Enhanced with ReviewsProduct Recommendations Enhanced with Reviews
Product Recommendations Enhanced with Reviews
 
Recommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic AlgorithmRecommendation engine Using Genetic Algorithm
Recommendation engine Using Genetic Algorithm
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
case based recommendation approach for market basket data
case based recommendation approach for market basket datacase based recommendation approach for market basket data
case based recommendation approach for market basket data
 
Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”Олександр Обєдніков “Рекомендательные системы”
Олександр Обєдніков “Рекомендательные системы”
 
Use of data science in recommendation system
Use of data science in  recommendation systemUse of data science in  recommendation system
Use of data science in recommendation system
 
Measuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metricMeasuring Impact: Towards a data citation metric
Measuring Impact: Towards a data citation metric
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
Josh Aberant - Data-Driven Digital Growth
Josh Aberant - Data-Driven Digital GrowthJosh Aberant - Data-Driven Digital Growth
Josh Aberant - Data-Driven Digital Growth
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Thesis Presentation
Thesis PresentationThesis Presentation
Thesis Presentation
 
Fashiondatasc
FashiondatascFashiondatasc
Fashiondatasc
 

Último

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 

Último (20)

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

Distribution Problems in Recommender Systems

  • 1. Differences in Distributions and Their Effect on Recommendation System Performance Why Collaborative Filtering Doesn’t Scale (portions reference Prismatic’s Silicon Valley talk)
  • 3. Overfitting Distribution of All Items Across Users Distribution of All Items Across All Users in the Future Concrete Set of Past Items Across Users Concrete Set of Future Items Across Users
  • 4. Recommender Systems Dilemma Set of All Items Possible Set of Items Known to Users in the Future Set of Items Known to Users in the Past Set of Items Recommended By Recommenders Items Viewed Or Liked in the Future Items Users Viewed Or Rated in the Past Items Seen in Ground Truth Without Changes in Item Access ??????
  • 5. Collaborative Filtering in Music • Construct correlations between items from set of past known items • Generate estimated distribution for past users across all items • Hope ‘errors’ relate to future user liked items • Gap between distributions escalates with the scale of data
  • 6. Resulting Biases Huge number of items where 50%+ of users only ever saw 20 songs a month out of 3 million Massive gap between all items and known items distribution Cross Validation ground truth assumes the 50%+ users only ever saw that new top 20 songs for the new set Results are supposed to be based on if users knew all sets Continuous user testing assumes ‘all items seen’ distributions, but only the set of recommended items are new items seen User data itself is a biased subset of the whole
  • 7. First Generation Problems • Everyone likes The Beatles or Norah Jones • Extremely frequent in biased data sets • Since everyone listened to before, everyone gets recommended them • Recommendations usually repeat the top 40 of the data collection • Users might like novel recommendations, but that won’t ever be in the evaluation set in cross validation – users never saw them
  • 8. Problems Over Time • The ground truth is heavily biased by recommendations controlling the set of known items • Machine learning – including collaborative filtering – learns the algorithm distribution more than users preferences • Performance Bias • Future ground truth comes from those that stayed in the system • They liked the system • It doesn’t represent those that were unhappy and left • Biases data to keep existing users happy without regard to ex-users • In extreme cases, even new users are discarded
  • 9. Best Solution So Far Past Data Idealized Future Distribution Idealized Function Feature Value => Rating
  • 10. Best Solution So Far • Requires all Items be categorized and quantized • Requires accuracy and general agreement on these values • (Socially Defined versus Absolute) • At least all features are present in all sets • Transforms recommendation into optimization and personalization • Set of items with highest score for a user • Ability to predict poor performing product or agent solutions • Better able to incorporate additional data • Prediction is usually linear time over the number of items
  • 11. Evaluation Adjustments • No Replacement for Real World A/B testing • Machine Learning for evaluation, not just the question • Hidden dependencies and ‘cheating’ Learned Algorithm Model Training Evaluation Model Model Training Business Objective Ground Truth