SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Adam Warski

Recommendation
systems

5/11/2013 Warszawa-JUG

Introduction using Mahout

@adamwarski
Agenda

❖

What is a recommender?!

❖

Types of recommenders, input data!

❖

Recommendations with Mahout: single node!

❖

Recommendations with Mahout: multiple nodes

5/11/2013 Warszawa-JUG

@adamwarski
Who Am I?
❖

Day: coding @ SoftwareMill!

❖

Afternoon: playgrounds, Duplos, etc.!

❖

Evenings: blogging, open-source!
❖

Original author of Hibernate Envers!

❖

ElasticMQ, Veripacks, MacWire!

❖

http://www.warski.org

5/11/2013 Warszawa-JUG

@adamwarski
Me + recommenders
❖

Yap.TV!

❖

Recommends shows to users
basing on their favorites!
❖
❖

❖

from Yap!
from Facebook!

Some business rules

5/11/2013 Warszawa-JUG

@adamwarski
Some background

5/11/2013 Warszawa-JUG

@adamwarski
Information …

❖

Retrieval!
❖

❖

given a query, select items!

Filtering!
❖

given a user, filter out irrelevant items

5/11/2013 Warszawa-JUG

@adamwarski
What can be recommended?

❖

Items to users!

❖

Items to items!

❖

Users to users!

❖

Users to items

5/11/2013 Warszawa-JUG

@adamwarski
Input data

❖

(user, item, rating) tuples!
❖

❖

rating: e.g. 1-5 stars!

(user, item) tuples!
❖

unary

5/11/2013 Warszawa-JUG

@adamwarski
Input data
❖

Implicit!
❖
❖

reads!

❖
❖

clicks!

watching a video!

Explicits!
❖

explicit rating!

❖

favorites

5/11/2013 Warszawa-JUG

@adamwarski
Prediction vs recommendation
❖

Recommendations!
❖
❖

❖

suggestions!
top-N!

Predictions!
❖

ratings

5/11/2013 Warszawa-JUG

@adamwarski
Collaborative Filtering

❖

Something more than search keywords!
❖
❖

❖

tastes!
quality!

Collaborative: data from many users

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders

❖

Define features and feature values!

❖

Describe each item as a vector of features

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders
❖

User vector!
❖
❖

❖

e.g. counting user tags!
sum of item vectors!

Prediction: cosine between two vectors!
❖

profile!

❖

item

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Measure similarity between users!

❖

Prediction: weighted combination of existing ratings

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Domain must be scoped so that there’s agreement!

❖

Individual preferences should be stable

5/11/2013 Warszawa-JUG

@adamwarski
User similarity

❖

Many different metrics!
!

❖

Pearson correlation:

5/11/2013 Warszawa-JUG

@adamwarski
User neighbourhood
❖

Threshold on similarity!

❖

Top-N neighbours!
❖

❖

25-100: good starting point!

Variations:!
❖

trust networks, e.g. friends in a social network

5/11/2013 Warszawa-JUG

@adamwarski
Predictions in UU-CF

5/11/2013 Warszawa-JUG

@adamwarski
Item-Item CF

❖

Similar to UU-CF, only with items!

❖

Item similarity!
❖

Pearson correlation on item ratings!

❖

co-occurences

5/11/2013 Warszawa-JUG

@adamwarski
Evaluating recommenders
❖

How to tell if a recommender is good?!
❖
❖

❖

compare implementations!
are the recommendations ok?!

Business metrics!
❖

what leads to increased sales

5/11/2013 Warszawa-JUG

@adamwarski
Leave one out

❖

Remove one preference!

❖

Rebuild the model!

❖

See if it is recommended!

❖

Or, see what is the predicted rating

5/11/2013 Warszawa-JUG

@adamwarski
Some metrics

❖

Mean absolute error:!

❖

Mean squared error

5/11/2013 Warszawa-JUG

@adamwarski
Precision/recall
❖

Precision: % of recommended items that are relevant!

❖

Recall: % of relevant items that are recommended!

❖

Precision@N, recall@N
Recall

Ideal recs

Precision
Recs
5/11/2013 Warszawa-JUG

All items
@adamwarski
Problems
❖

Novelty: hard to measure!
❖

❖

leave one out: punishes recommenders which
recommend new items!

Test on humans!
❖

A/B testing

5/11/2013 Warszawa-JUG

@adamwarski
Diversity/serendipity
❖

Increase diversity:!
❖
❖

❖

top-n!
as items come in, remove the ones that are too similar
to prior items!

Increase serendipity:!
❖

downgrade popular items

5/11/2013 Warszawa-JUG

@adamwarski
Netflix challenge

❖

$1m for the best recommender!

❖

Used mean error!

❖

Winning algorithm won on predicting low ratings

5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node

5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node
❖

User-User CF!

❖

Item-Item CF!

❖

Various metrics!
❖
❖

Pearson!

❖
❖

Euclidean!

Log-likelihood!

Support for evaluation

5/11/2013 Warszawa-JUG

@adamwarski
Let’s code

5/11/2013 Warszawa-JUG

@adamwarski
Mahout multi-node
❖

Based on Hadoop!

❖

Pre-computed recommendations!

❖

Item-based recommenders!
❖

❖

Co-occurrence matrix!

Matrix decomposition!
❖

Alternating Least Squares

5/11/2013 Warszawa-JUG

@adamwarski
Running Mahout w/ Hadoop
hadoop jar mahout-core-0.8-job.jar !
! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob !
! --booleanData !
! --similarityClassname SIMILARITY_LOGLIKELIHOOD !
! --output output !
! --input input/data.dat

5/11/2013 Warszawa-JUG

@adamwarski
Links
❖

http://www.warski.org!

❖

http://mahout.apache.org/!

❖

http://grouplens.org/!

❖

http://graphlab.org/graphchi/!

❖

https://class.coursera.org/recsys-001/class!

❖

https://github.com/adamw/mahout-pres

5/11/2013 Warszawa-JUG

@adamwarski
Thanks!

❖

Questions?!

❖

Stickers ->!

❖

adam@warski.org

5/11/2013 Warszawa-JUG

@adamwarski

Más contenido relacionado

Más de Adam Warski

The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
Adam Warski
 

Más de Adam Warski (9)

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazur
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queues
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS server
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
 
Scala Macros
Scala MacrosScala Macros
Scala Macros
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable Extensions
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Recommendation systems with Mahout: introduction