SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Adam Warski

Recommendation
systems

5/11/2013 Warszawa-JUG

Introduction using Mahout

@adamwarski
Agenda

❖

What is a recommender?!

❖

Types of recommenders, input data!

❖

Recommendations with Mahout: single node!

❖

Recommendations with Mahout: multiple nodes

5/11/2013 Warszawa-JUG

@adamwarski
Who Am I?
❖

Day: coding @ SoftwareMill!

❖

Afternoon: playgrounds, Duplos, etc.!

❖

Evenings: blogging, open-source!
❖

Original author of Hibernate Envers!

❖

ElasticMQ, Veripacks, MacWire!

❖

http://www.warski.org

5/11/2013 Warszawa-JUG

@adamwarski
Me + recommenders
❖

Yap.TV!

❖

Recommends shows to users
basing on their favorites!
❖
❖

❖

from Yap!
from Facebook!

Some business rules

5/11/2013 Warszawa-JUG

@adamwarski
Some background

5/11/2013 Warszawa-JUG

@adamwarski
Information …

❖

Retrieval!
❖

❖

given a query, select items!

Filtering!
❖

given a user, filter out irrelevant items

5/11/2013 Warszawa-JUG

@adamwarski
What can be recommended?

❖

Items to users!

❖

Items to items!

❖

Users to users!

❖

Users to items

5/11/2013 Warszawa-JUG

@adamwarski
Input data

❖

(user, item, rating) tuples!
❖

❖

rating: e.g. 1-5 stars!

(user, item) tuples!
❖

unary

5/11/2013 Warszawa-JUG

@adamwarski
Input data
❖

Implicit!
❖
❖

reads!

❖
❖

clicks!

watching a video!

Explicits!
❖

explicit rating!

❖

favorites

5/11/2013 Warszawa-JUG

@adamwarski
Prediction vs recommendation
❖

Recommendations!
❖
❖

❖

suggestions!
top-N!

Predictions!
❖

ratings

5/11/2013 Warszawa-JUG

@adamwarski
Collaborative Filtering

❖

Something more than search keywords!
❖
❖

❖

tastes!
quality!

Collaborative: data from many users

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders

❖

Define features and feature values!

❖

Describe each item as a vector of features

5/11/2013 Warszawa-JUG

@adamwarski
Content-based recommenders
❖

User vector!
❖
❖

❖

e.g. counting user tags!
sum of item vectors!

Prediction: cosine between two vectors!
❖

profile!

❖

item

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Measure similarity between users!

❖

Prediction: weighted combination of existing ratings

5/11/2013 Warszawa-JUG

@adamwarski
User-User CF

❖

Domain must be scoped so that there’s agreement!

❖

Individual preferences should be stable

5/11/2013 Warszawa-JUG

@adamwarski
User similarity

❖

Many different metrics!
!

❖

Pearson correlation:

5/11/2013 Warszawa-JUG

@adamwarski
User neighbourhood
❖

Threshold on similarity!

❖

Top-N neighbours!
❖

❖

25-100: good starting point!

Variations:!
❖

trust networks, e.g. friends in a social network

5/11/2013 Warszawa-JUG

@adamwarski
Predictions in UU-CF

5/11/2013 Warszawa-JUG

@adamwarski
Item-Item CF

❖

Similar to UU-CF, only with items!

❖

Item similarity!
❖

Pearson correlation on item ratings!

❖

co-occurences

5/11/2013 Warszawa-JUG

@adamwarski
Evaluating recommenders
❖

How to tell if a recommender is good?!
❖
❖

❖

compare implementations!
are the recommendations ok?!

Business metrics!
❖

what leads to increased sales

5/11/2013 Warszawa-JUG

@adamwarski
Leave one out

❖

Remove one preference!

❖

Rebuild the model!

❖

See if it is recommended!

❖

Or, see what is the predicted rating

5/11/2013 Warszawa-JUG

@adamwarski
Some metrics

❖

Mean absolute error:!

❖

Mean squared error

5/11/2013 Warszawa-JUG

@adamwarski
Precision/recall
❖

Precision: % of recommended items that are relevant!

❖

Recall: % of relevant items that are recommended!

❖

Precision@N, recall@N
Recall

Ideal recs

Precision
Recs
5/11/2013 Warszawa-JUG

All items
@adamwarski
Problems
❖

Novelty: hard to measure!
❖

❖

leave one out: punishes recommenders which
recommend new items!

Test on humans!
❖

A/B testing

5/11/2013 Warszawa-JUG

@adamwarski
Diversity/serendipity
❖

Increase diversity:!
❖
❖

❖

top-n!
as items come in, remove the ones that are too similar
to prior items!

Increase serendipity:!
❖

downgrade popular items

5/11/2013 Warszawa-JUG

@adamwarski
Netflix challenge

❖

$1m for the best recommender!

❖

Used mean error!

❖

Winning algorithm won on predicting low ratings

5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node

5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
5/11/2013 Warszawa-JUG

@adamwarski
Mahout single-node
❖

User-User CF!

❖

Item-Item CF!

❖

Various metrics!
❖
❖

Pearson!

❖
❖

Euclidean!

Log-likelihood!

Support for evaluation

5/11/2013 Warszawa-JUG

@adamwarski
Let’s code

5/11/2013 Warszawa-JUG

@adamwarski
Mahout multi-node
❖

Based on Hadoop!

❖

Pre-computed recommendations!

❖

Item-based recommenders!
❖

❖

Co-occurrence matrix!

Matrix decomposition!
❖

Alternating Least Squares

5/11/2013 Warszawa-JUG

@adamwarski
Running Mahout w/ Hadoop
hadoop jar mahout-core-0.8-job.jar !
! org.apache.mahout.cf.taste.hadoop.item.RecommenderJob !
! --booleanData !
! --similarityClassname SIMILARITY_LOGLIKELIHOOD !
! --output output !
! --input input/data.dat

5/11/2013 Warszawa-JUG

@adamwarski
Links
❖

http://www.warski.org!

❖

http://mahout.apache.org/!

❖

http://grouplens.org/!

❖

http://graphlab.org/graphchi/!

❖

https://class.coursera.org/recsys-001/class!

❖

https://github.com/adamw/mahout-pres

5/11/2013 Warszawa-JUG

@adamwarski
Thanks!

❖

Questions?!

❖

Stickers ->!

❖

adam@warski.org

5/11/2013 Warszawa-JUG

@adamwarski

Más contenido relacionado

Más de Adam Warski

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?Adam Warski
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazurAdam Warski
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcingAdam Warski
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesAdam Warski
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkAdam Warski
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverAdam Warski
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh realityAdam Warski
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable ExtensionsAdam Warski
 

Más de Adam Warski (9)

What have the annotations done to us?
What have the annotations done to us?What have the annotations done to us?
What have the annotations done to us?
 
Hejdyk maleńka część mazur
Hejdyk maleńka część mazurHejdyk maleńka część mazur
Hejdyk maleńka część mazur
 
Slick eventsourcing
Slick eventsourcingSlick eventsourcing
Slick eventsourcing
 
Evaluating persistent, replicated message queues
Evaluating persistent, replicated message queuesEvaluating persistent, replicated message queues
Evaluating persistent, replicated message queues
 
The no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection FrameworkThe no-framework Scala Dependency Injection Framework
The no-framework Scala Dependency Injection Framework
 
ElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS serverElasticMQ: a fully asynchronous, Akka-based SQS server
ElasticMQ: a fully asynchronous, Akka-based SQS server
 
The ideal module system and the harsh reality
The ideal module system and the harsh realityThe ideal module system and the harsh reality
The ideal module system and the harsh reality
 
Scala Macros
Scala MacrosScala Macros
Scala Macros
 
CDI Portable Extensions
CDI Portable ExtensionsCDI Portable Extensions
CDI Portable Extensions
 

Último

CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)Wonjun Hwang
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024Lorenzo Miniero
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxFIDO Alliance
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...ScyllaDB
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!Memoori
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptxFIDO Alliance
 

Último (20)

Overview of Hyperledger Foundation
Overview of Hyperledger FoundationOverview of Hyperledger Foundation
Overview of Hyperledger Foundation
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024WebRTC and SIP not just audio and video @ OpenSIPS 2024
WebRTC and SIP not just audio and video @ OpenSIPS 2024
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
Event-Driven Architecture Masterclass: Integrating Distributed Data Stores Ac...
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 

Recommendation systems with Mahout: introduction