SlideShare a Scribd company logo
1 of 38
Download to read offline
Next Directions in Mahout’s Recommenders
Sebastian Schelter, Apache Software Foundation
Bay Area Mahout Meetup
NextDirectionsinMahout’sRecommenders
2/38
About me
PhD student at the Database Systems and Information
Management Group of Technische Universit¨at Berlin
Member of the Apache Software Foundation, committer on
Mahout and Giraph
currently interning at IBM Research Almaden
NextDirectionsinMahout’sRecommenders
3/38
Next Directions?
Mahout in Action is the prime source of
information for using Mahout in practice.
As it is more than two years old
(and only covers Mahout 0.5), it is
missing a lot of recent developments.
This talk describes what has been added to the recommenders
of Mahout since then and gives suggestions on directions for
future versions of Mahout.
Collaborative Filtering 101
NextDirectionsinMahout’sRecommenders
5/38
Collaborative Filtering
Problem: Given a user’s interactions with items, guess which
other items would be highly preferred
Collaborative Filtering: infer recommendations from patterns
found in the historical user-item interactions
data can be explicit feedback (ratings) or implicit feedback
(clicks, pageviews), represented in the interaction matrix A





item1 · · · item3 · · ·
user1 3 · · · 4 · · ·
user2 − · · · 4 · · ·
user3 5 · · · 1 · · ·
· · · · · · · · · · · · · · ·





NextDirectionsinMahout’sRecommenders
6/38
Neighborhood Methods
User-based:
for each user, compute a ”jury” of users with similar taste
pick the recommendations from the ”jury’s” items
Item-based:
for each item, compute a set of items with similar
interaction pattern
pick the recommendations from those similar items
NextDirectionsinMahout’sRecommenders
7/38
Neighborhood Methods
item-based variant most popular:
simple and intuitively understandable
additionally gives non-personalized, per-item
recommendations (people who like X might also like Y)
recommendations for new users without model retraining
comprehensible explanations (we recommend Y because
you liked X)
NextDirectionsinMahout’sRecommenders
8/38
Latent factor models
Idea: interactions are deeply influenced by a set of factors
that are very specific to the domain (e.g. amount of action
or complexity of characters in movies)
these factors are in general not obvious and need to be
inferred from the interaction data
both users and items can be described in terms of these factors
NextDirectionsinMahout’sRecommenders
9/38
Matrix factorization
Computing a latent factor model: approximately factor A
into the product of two rank k feature matrices U and M such
that A ≈ UM.
U models the latent features of the users, M models the latent
features of the items
dot product ui mj in the latent feature space predicts strength
of interaction between user i and item j
≈ ×
A
u × i
U
u × k
M
k × i
Single machine recommenders
NextDirectionsinMahout’sRecommenders
11/38
Taste
based on Sean Owen’s Taste framework (started in 2005)
mature and stable codebase
Recommender implementations encapsulate recommender
algorithms
DataModel implementations handle interaction data in
memory, files, databases, key-value stores
but focus was mostly on neighborhood methods
lack of implementations for latent factor models
little support for scientific usecases (e.g. recommender
contests)
NextDirectionsinMahout’sRecommenders
12/38
Collaboration
MyMedialite, scientific library of recom-
mender system algorithms
http://www.mymedialite.net/
Mahout now features a couple of popular latent factor models,
mostly ported by Zeno Gantner.
NextDirectionsinMahout’sRecommenders
13/38
Lots of different Factorizers for our SVDRecommender
RatingSGDFactorizer, biased matrix factorization
Koren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09
SVDPlusPlusFactorizer, SVD++
Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08
ALSWRFactorizer, matrix factorization using Alternating
Least Squares
Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
ParallelSGDFactorizer, parallel version of biased matrix
factorization (contributed by Peng Cheng)
Tak´acs et. al.: Scalable Collaborative Filtering Approaches for Large Recommender Systems, JMLR ’09
Niu et al.: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, NIPS ’11
NextDirectionsinMahout’sRecommenders
14/38
Next directions
better tooling for cross-validation and hold-out tests (e.g.
time-based splits of interactions)
memory-efficient DataModel implementations tailored to
specific usecases (e.g. matrix factorization with SGD)
better support for computing recommendations for
”anonymous” users
online recommenders
NextDirectionsinMahout’sRecommenders
15/38
Usage
researchers at TU Berlin and CWI Amsterdam
regularly use Mahout for their recommender research
published at international conferences
”Bayrischer Rundfunk”, one of Germany’s largest public
TV broadcasters, uses Mahout to help users discover TV
content in its online media library
Berlin-based company plista runs a live contest for the
best news recommender algorithm and provides
Mahout-based ”skeleton code” to participants
The Dutch Institute of Sound and Vision runs a
webplatform that uses Mahout for recommending content
from its archive of Dutch audio-visual heritage collections
of the 20th century
Parallel processing
NextDirectionsinMahout’sRecommenders
17/38
Distribution
difficult environment:
data is partitioned and stored in a distributed filesystem
algorithms must be expressed in MapReduce
our distributed implementations focus on two popular methods
item-based collaborative filtering
matrix factorization with Alternating Least Squares
Scalable neighborhood methods
NextDirectionsinMahout’sRecommenders
19/38
Cooccurrences
start with a simplified view:
imagine interaction matrix A was
binary
→ we look at cooccurrences only
item similarity computation becomes matrix multiplication
S = A A
scale-out of the item-based approach reduces to finding an
efficient way to compute this item similarity matrix
NextDirectionsinMahout’sRecommenders
20/38
Parallelizing S = A A
standard approach of computing item cooccurrences requires
random access to both users and items
foreach item f do
foreach user i who interacted with f do
foreach item j that i also interacted with do
Sfj = Sfj + 1
→ not efficiently parallelizable on partitioned data
row outer product formulation of matrix multiplication is
efficiently parallelizable on a row-partitioned A
S = A A =
i∈A
ai ai
mappers compute the outer products of rows of A, emit the
results row-wise, reducers sum these up to form S
NextDirectionsinMahout’sRecommenders
21/38
Parallel similarity computation
much more details in the implementation
support for various similarity measures
various optimizations (e.g. for symmetric similarity
measures)
downsampling of skewed interaction data
in-depth description available in:
Sebastian Schelter, Christoph Boden, Volker Markl:
Scalable Similarity-Based Neighborhood Methods with
MapReduce
ACM RecSys 2012
NextDirectionsinMahout’sRecommenders
22/38
Implementation in Mahout
o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJob
computes the top-k pairwise similarities for each row of a
matrix using some similarity measure
o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob
computes the top-k similar items per item using
RowSimilarityJob
o.a.m.cf.taste.hadoop.item.RecommenderJob
computes recommendations and similar items using
RowSimilarityJob
NextDirectionsinMahout’sRecommenders
23/38
Scalable Neighborhood Methods: Experiments
Setup
6 machines running Java 7 and Hadoop 1.0.4
two 4-core Opteron CPUs, 32 GB memory and four 1 TB
disk drives per machine
Results
Yahoo Songs dataset (700M datapoints, 1.8M users, 136K
items), similarity computation takes less than 100 minutes
Scalable matrix factorization
NextDirectionsinMahout’sRecommenders
25/38
Alternating Least Squares
ALS rotates between fixing U and M. When U is fixed, the
system recomputes M by solving a least-squares problem per
item, and vice versa.
easy to parallelize, as all users (and vice versa, items) can be
recomputed independently
additionally, ALS can be applied to usecases with implicit data
(pageviews, clicks)
≈ ×
A
u × i
U
u × k
M
k × i
NextDirectionsinMahout’sRecommenders
26/38
Scalable Matrix Factorization: Implementation
Recompute user feature matrix U using a broadcast-join:
1. Run a map-only job using multithreaded mappers
2. load item-feature matrix M into memory from HDFS to
share it among the individual mappers
3. mappers read the interaction histories of the users
4. multithreaded: solve a least squares problem per user to
recompute its feature vector
user histories A user features U
item features M
Map
Hash-Join + Re-computation
localfwdlocalfwdlocalfwd
Map
Hash-Join + Re-computation
Map
Hash-Join + Re-computation
broadcast
machine1machine2machine3
NextDirectionsinMahout’sRecommenders
27/38
Implementation in Mahout
o.a.m.cf.taste.hadoop.als.ParallelALSFactorizationJob
different solvers for explicit and implicit data
Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08
Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08
o.a.m.cf.taste.hadoop.als.RecommenderJob computes
recommendations from a factorization
in-depth description available in:
Sebastian Schelter, Christoph Boden, Martin Schenck,
Alexander Alexandrov, Volker Markl:
Distributed Matrix Factorization with MapReduce using a
series of Broadcast-Joins
to appear at ACM RecSys 2013
NextDirectionsinMahout’sRecommenders
28/38
Scalable Matrix Factorization: Experiments
Cluster: 26 machines, two 4-core Opteron CPUs, 32 GB
memory and four 1 TB disk drives each
Hadoop Configuration: reuse JVMs, used JBlas as solver,
run multithreaded mappers
Datasets: Netflix (0.5M users, 100M datapoints), Yahoo
Songs (1.8M users, 700M datapoints), Bigflix (25M users, 5B
datapoints)
0
50
100
150
number of features r
avg.durationperjob(seconds)
(U
)10
(M
)10
(U
)20
(M
)20
(U
)50
(M
)50
(U
)100
(M
)100
Yahoo Songs
Netflix
5 10 15 20 25
0
100
200
300
400
500
600
number of machines
avg.durationperjob(seconds)
Bigflix (M)
Bigflix (U)
NextDirectionsinMahout’sRecommenders
29/38
Next directions
better tooling for cross-validation and hold-out tests (e.g.
to find parameters for ALS)
integration of more efficient solver libraries like JBlas
should be easier to modify and adjust the MapReduce
code
NextDirectionsinMahout’sRecommenders
30/38
A selection of users
Mendeley, a data platform for researchers (2.5M users,
50M research articles): Mendeley Suggest for discovering
relevant research publications
Researchgate, the world’s largest social network for
researchers (3M users)
a German online retailer with several million customers
across Europe
German online market places for real estate and
pre-owned cars with millions of users
Deployment -
NextDirectionsinMahout’sRecommenders
32/38
”Small data, low load”
use GenericItembasedRecommender or
GenericUserbasedRecommender, feed it with interaction
data stored in a file, database or key-value store
have it load the interaction data in memory and compute
recommendations on request
collect new interactions into your files or database and
periodically refresh the recommender
In order to improve performance, try to:
have your recommender look at fewer interactions by
using SamplingCandidateItemsStrategy
cache computed similarities with a CachingItemSimilarity
NextDirectionsinMahout’sRecommenders
33/38
”Medium data, high load”
Assumption: interaction data still fits into main memory
use a recommender that is able to leverage a
precomputed model, e.g. GenericItembasedRecommender
or SVDRecommender
load the interaction data and the model in memory and
compute recommendations on request
collect new interactions into your files or database and
periodically recompute the model and refresh the
recommender
use BatchItemSimilarities or ParallelSGDFactorizer for
precomputing the model using multiple threads on a single
machine
NextDirectionsinMahout’sRecommenders
34/38
”Lots of data, high load”
Assumption: interaction data does not fit into main memory
use a recommender that is able to leverage a
precomputed model, e.g. GenericItembasedRecommender
or SVDRecommender
keep the interaction data in a (potentially partitioned)
database or in a key-value store
load the model into memory, the recommender will only
use one (cacheable) query per recommendation request to
retrieve the user’s interaction history
collect new interactions into your files or database and
periodically recompute the model offline
use ItemSimilarityJob or ParallelALSFactorizationJob to
precompute the model with Hadoop
NextDirectionsinMahout’sRecommenders
35/38
”Precompute everything”
use RecommenderJob to precompute recommendations
for all users with Hadoop
directly serve those recommendations
successfully employed by Mendeley for their research paper
recommender ”Suggest”
allowed them to run their recommender infrastructure serving
2 million users for less than $100 per month in AWS
NextDirectionsinMahout’sRecommenders
36/38
Next directions
”Search engine based recommender infrastructure”
(work in progress driven by Pat Ferrel)
use RowSimilarityJob to find anomalously co-occuring
items using Hadoop
index those item pairs with a distributed search engine
such as Apache Solr
query based on a user’s interaction history and the search
engine will answer with recommendations
gives us an easy-to-use, scalable serving layer for free
(Apache Solr)
allows complex recommendation queries containing filters,
geo-location, etc.
NextDirectionsinMahout’sRecommenders
37/38
The shape of things to come
MapReduce is not well suited for certain ML usecases, e.g.
when the algorithms to apply are iterative and the dataset fits
into the aggregate main memory of the cluster
Mahout always stated that it is not tied to Hadoop, however
there were no production-quality alternatives in the past
With the advent of YARN and the maturing of alternative
systems, this situation is changing and we should embrace this
change
Personally, I would love to see an experimental port of our
distributed recommenders to another Apache-supported
system such Spark or Giraph
Thanks for listening!
Follow me on twitter at http://twitter.com/sscdotopen
Join Mahout’s mailinglists at http://s.apache.org/mahout-lists
picture on slide 3 by Tim Abott, http://www.flickr.com/photos/theabbott/
picture on slide 21 by Crimson Diabolics, http://crimsondiabolics.deviantart.com/

More Related Content

What's hot

Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsSmitha Mysore Lokesh
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringChangsung Moon
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperChangsung Moon
 
Speeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkSpeeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkHans De Sterck
 
Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systemsrenataghisloti
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithmsnextlib
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Gianmario Spacagna
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2Gokulks007
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetCrossing Minds
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative FilteringTayfun Sen
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014StampedeCon
 
[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016Grigoris C
 

What's hot (20)

Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Recommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative FilteringRecommender Systems: Advances in Collaborative Filtering
Recommender Systems: Advances in Collaborative Filtering
 
Summary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paperSummary of a Recommender Systems Survey paper
Summary of a Recommender Systems Survey paper
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Speeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in SparkSpeeding up Distributed Big Data Recommendation in Spark
Speeding up Distributed Big Data Recommendation in Spark
 
Hybrid recommender systems
Hybrid recommender systemsHybrid recommender systems
Hybrid recommender systems
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...Robust and declarative machine learning pipelines for predictive buying at Ba...
Robust and declarative machine learning pipelines for predictive buying at Ba...
 
Machine learning module 2
Machine learning module 2Machine learning module 2
Machine learning module 2
 
Recommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right DatasetRecommender Systems from A to Z – The Right Dataset
Recommender Systems from A to Z – The Right Dataset
 
Collaborative Filtering
Collaborative FilteringCollaborative Filtering
Collaborative Filtering
 
kdd2015
kdd2015kdd2015
kdd2015
 
Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014Making Machine Learning Work in Practice - StampedeCon 2014
Making Machine Learning Work in Practice - StampedeCon 2014
 
[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016[Eestec] Machine Learning online seminar 1, 12 2016
[Eestec] Machine Learning online seminar 1, 12 2016
 

Viewers also liked

Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Cataldo Musto
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyChris Johnson
 
발표자료 11장
발표자료 11장발표자료 11장
발표자료 11장Juhui Park
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...Spark Summit
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineNYC Predictive Analytics
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsChris Johnson
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
Fast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsFast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsDavid Zibriczky
 
Test Automation
Test AutomationTest Automation
Test AutomationTomas Riha
 
Application of technology acceptance model to wi fi user at economics and bus...
Application of technology acceptance model to wi fi user at economics and bus...Application of technology acceptance model to wi fi user at economics and bus...
Application of technology acceptance model to wi fi user at economics and bus...Alexander Decker
 
API Test Automation Tips and Tricks
API Test Automation Tips and TricksAPI Test Automation Tips and Tricks
API Test Automation Tips and Trickstesthive
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...MapR Technologies
 
Scalable Collaborative Filtering for Commerce Recommendation
Scalable Collaborative Filtering for Commerce RecommendationScalable Collaborative Filtering for Commerce Recommendation
Scalable Collaborative Filtering for Commerce RecommendationYiqun Hu
 
Laws of test automation framework
Laws of test automation frameworkLaws of test automation framework
Laws of test automation frameworkvodqancr
 
JUnit 5 - from Lambda to Alpha and beyond
JUnit 5 - from Lambda to Alpha and beyondJUnit 5 - from Lambda to Alpha and beyond
JUnit 5 - from Lambda to Alpha and beyondSam Brannen
 
Web API Test Automation using Frisby & Node.js
Web API Test Automation using Frisby  & Node.jsWeb API Test Automation using Frisby  & Node.js
Web API Test Automation using Frisby & Node.jsChi Lang Le Vu Tran
 
JNR: Java Native Runtime
JNR: Java Native RuntimeJNR: Java Native Runtime
JNR: Java Native RuntimeYuichi Sakuraba
 

Viewers also liked (20)

Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
발표자료 11장
발표자료 11장발표자료 11장
발표자료 11장
 
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
A More Scaleable Way of Making Recommendations with MLlib-(Xiangrui Meng, Dat...
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
Building a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engineBuilding a Recommendation Engine - An example of a product recommendation engine
Building a Recommendation Engine - An example of a product recommendation engine
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Fast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender SystemsFast ALS-Based Matrix Factorization for Recommender Systems
Fast ALS-Based Matrix Factorization for Recommender Systems
 
Test Automation
Test AutomationTest Automation
Test Automation
 
Application of technology acceptance model to wi fi user at economics and bus...
Application of technology acceptance model to wi fi user at economics and bus...Application of technology acceptance model to wi fi user at economics and bus...
Application of technology acceptance model to wi fi user at economics and bus...
 
API Test Automation Tips and Tricks
API Test Automation Tips and TricksAPI Test Automation Tips and Tricks
API Test Automation Tips and Tricks
 
Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...Parallel and Iterative Processing for Machine Learning Recommendations with S...
Parallel and Iterative Processing for Machine Learning Recommendations with S...
 
Selenium Webdriver
Selenium WebdriverSelenium Webdriver
Selenium Webdriver
 
Scalable Collaborative Filtering for Commerce Recommendation
Scalable Collaborative Filtering for Commerce RecommendationScalable Collaborative Filtering for Commerce Recommendation
Scalable Collaborative Filtering for Commerce Recommendation
 
Frisby Api automation
Frisby Api automationFrisby Api automation
Frisby Api automation
 
Laws of test automation framework
Laws of test automation frameworkLaws of test automation framework
Laws of test automation framework
 
JUnit 5 - from Lambda to Alpha and beyond
JUnit 5 - from Lambda to Alpha and beyondJUnit 5 - from Lambda to Alpha and beyond
JUnit 5 - from Lambda to Alpha and beyond
 
Web API Test Automation using Frisby & Node.js
Web API Test Automation using Frisby  & Node.jsWeb API Test Automation using Frisby  & Node.js
Web API Test Automation using Frisby & Node.js
 
JNR: Java Native Runtime
JNR: Java Native RuntimeJNR: Java Native Runtime
JNR: Java Native Runtime
 

Similar to Next directions in Mahout's recommenders

A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...wajrcs
 
Modeling Object Oriented Applications by Using Dynamic Information for the I...
Modeling Object Oriented Applications by Using Dynamic  Information for the I...Modeling Object Oriented Applications by Using Dynamic  Information for the I...
Modeling Object Oriented Applications by Using Dynamic Information for the I...IOSR Journals
 
Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1victor tang
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryTim Menzies
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...ijaia
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemDing Li
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationEvgeny Frolov
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReducesscdotopen
 
16-model-compare-hilda
16-model-compare-hilda16-model-compare-hilda
16-model-compare-hildaDezhi Fang
 
Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.SAFAD ISMAIL
 
MSR populations talk v2.key
MSR populations talk v2.keyMSR populations talk v2.key
MSR populations talk v2.keyMatthew Chalmers
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine LearningJanani C
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemMilind Gokhale
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial designShinwoo Jang
 
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningA Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningIRJET Journal
 

Similar to Next directions in Mahout's recommenders (20)

50120140505004
5012014050500450120140505004
50120140505004
 
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
 
Modeling Object Oriented Applications by Using Dynamic Information for the I...
Modeling Object Oriented Applications by Using Dynamic  Information for the I...Modeling Object Oriented Applications by Using Dynamic  Information for the I...
Modeling Object Oriented Applications by Using Dynamic Information for the I...
 
Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1Modularizing Arcihtectures Using Dendrograms1
Modularizing Arcihtectures Using Dendrograms1
 
Experiments on Design Pattern Discovery
Experiments on Design Pattern DiscoveryExperiments on Design Pattern Discovery
Experiments on Design Pattern Discovery
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
Understanding the Applicability of Linear & Non-Linear Models Using a Case-Ba...
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Low rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference informationLow rank models for recommender systems with limited preference information
Low rank models for recommender systems with limited preference information
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
 
16-model-compare-hilda
16-model-compare-hilda16-model-compare-hilda
16-model-compare-hilda
 
Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.Control of Photo Sharing on Online Social Network.
Control of Photo Sharing on Online Social Network.
 
MSR populations talk v2.key
MSR populations talk v2.keyMSR populations talk v2.key
MSR populations talk v2.key
 
Parallel Machine Learning
Parallel Machine LearningParallel Machine Learning
Parallel Machine Learning
 
IOSR Journals
IOSR JournalsIOSR Journals
IOSR Journals
 
Collaborative Filtering Recommendation System
Collaborative Filtering Recommendation SystemCollaborative Filtering Recommendation System
Collaborative Filtering Recommendation System
 
Surrogate modeling for industrial design
Surrogate modeling for industrial designSurrogate modeling for industrial design
Surrogate modeling for industrial design
 
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine LearningA Survey on Recommendation System based on Knowledge Graph and Machine Learning
A Survey on Recommendation System based on Knowledge Graph and Machine Learning
 

More from sscdotopen

Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Sparksscdotopen
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahoutsscdotopen
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filteringsscdotopen
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraphsscdotopen
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processingsscdotopen
 

More from sscdotopen (6)

Co-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and SparkCo-occurrence Based Recommendations with Mahout, Scala and Spark
Co-occurrence Based Recommendations with Mahout, Scala and Spark
 
Bringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to MahoutBringing Algebraic Semantics to Mahout
Bringing Algebraic Semantics to Mahout
 
Latent factor models for Collaborative Filtering
Latent factor models for Collaborative FilteringLatent factor models for Collaborative Filtering
Latent factor models for Collaborative Filtering
 
Large Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache GiraphLarge Scale Graph Processing with Apache Giraph
Large Scale Graph Processing with Apache Giraph
 
Introducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph ProcessingIntroducing Apache Giraph for Large Scale Graph Processing
Introducing Apache Giraph for Large Scale Graph Processing
 
mahout-cf
mahout-cfmahout-cf
mahout-cf
 

Recently uploaded

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Next directions in Mahout's recommenders

  • 1. Next Directions in Mahout’s Recommenders Sebastian Schelter, Apache Software Foundation Bay Area Mahout Meetup
  • 2. NextDirectionsinMahout’sRecommenders 2/38 About me PhD student at the Database Systems and Information Management Group of Technische Universit¨at Berlin Member of the Apache Software Foundation, committer on Mahout and Giraph currently interning at IBM Research Almaden
  • 3. NextDirectionsinMahout’sRecommenders 3/38 Next Directions? Mahout in Action is the prime source of information for using Mahout in practice. As it is more than two years old (and only covers Mahout 0.5), it is missing a lot of recent developments. This talk describes what has been added to the recommenders of Mahout since then and gives suggestions on directions for future versions of Mahout.
  • 5. NextDirectionsinMahout’sRecommenders 5/38 Collaborative Filtering Problem: Given a user’s interactions with items, guess which other items would be highly preferred Collaborative Filtering: infer recommendations from patterns found in the historical user-item interactions data can be explicit feedback (ratings) or implicit feedback (clicks, pageviews), represented in the interaction matrix A      item1 · · · item3 · · · user1 3 · · · 4 · · · user2 − · · · 4 · · · user3 5 · · · 1 · · · · · · · · · · · · · · · · · ·     
  • 6. NextDirectionsinMahout’sRecommenders 6/38 Neighborhood Methods User-based: for each user, compute a ”jury” of users with similar taste pick the recommendations from the ”jury’s” items Item-based: for each item, compute a set of items with similar interaction pattern pick the recommendations from those similar items
  • 7. NextDirectionsinMahout’sRecommenders 7/38 Neighborhood Methods item-based variant most popular: simple and intuitively understandable additionally gives non-personalized, per-item recommendations (people who like X might also like Y) recommendations for new users without model retraining comprehensible explanations (we recommend Y because you liked X)
  • 8. NextDirectionsinMahout’sRecommenders 8/38 Latent factor models Idea: interactions are deeply influenced by a set of factors that are very specific to the domain (e.g. amount of action or complexity of characters in movies) these factors are in general not obvious and need to be inferred from the interaction data both users and items can be described in terms of these factors
  • 9. NextDirectionsinMahout’sRecommenders 9/38 Matrix factorization Computing a latent factor model: approximately factor A into the product of two rank k feature matrices U and M such that A ≈ UM. U models the latent features of the users, M models the latent features of the items dot product ui mj in the latent feature space predicts strength of interaction between user i and item j ≈ × A u × i U u × k M k × i
  • 11. NextDirectionsinMahout’sRecommenders 11/38 Taste based on Sean Owen’s Taste framework (started in 2005) mature and stable codebase Recommender implementations encapsulate recommender algorithms DataModel implementations handle interaction data in memory, files, databases, key-value stores but focus was mostly on neighborhood methods lack of implementations for latent factor models little support for scientific usecases (e.g. recommender contests)
  • 12. NextDirectionsinMahout’sRecommenders 12/38 Collaboration MyMedialite, scientific library of recom- mender system algorithms http://www.mymedialite.net/ Mahout now features a couple of popular latent factor models, mostly ported by Zeno Gantner.
  • 13. NextDirectionsinMahout’sRecommenders 13/38 Lots of different Factorizers for our SVDRecommender RatingSGDFactorizer, biased matrix factorization Koren et al.: Matrix Factorization Techniques for Recommender Systems, IEEE Computer ’09 SVDPlusPlusFactorizer, SVD++ Koren: Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, KDD ’08 ALSWRFactorizer, matrix factorization using Alternating Least Squares Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08 Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08 ParallelSGDFactorizer, parallel version of biased matrix factorization (contributed by Peng Cheng) Tak´acs et. al.: Scalable Collaborative Filtering Approaches for Large Recommender Systems, JMLR ’09 Niu et al.: Hogwild!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, NIPS ’11
  • 14. NextDirectionsinMahout’sRecommenders 14/38 Next directions better tooling for cross-validation and hold-out tests (e.g. time-based splits of interactions) memory-efficient DataModel implementations tailored to specific usecases (e.g. matrix factorization with SGD) better support for computing recommendations for ”anonymous” users online recommenders
  • 15. NextDirectionsinMahout’sRecommenders 15/38 Usage researchers at TU Berlin and CWI Amsterdam regularly use Mahout for their recommender research published at international conferences ”Bayrischer Rundfunk”, one of Germany’s largest public TV broadcasters, uses Mahout to help users discover TV content in its online media library Berlin-based company plista runs a live contest for the best news recommender algorithm and provides Mahout-based ”skeleton code” to participants The Dutch Institute of Sound and Vision runs a webplatform that uses Mahout for recommending content from its archive of Dutch audio-visual heritage collections of the 20th century
  • 17. NextDirectionsinMahout’sRecommenders 17/38 Distribution difficult environment: data is partitioned and stored in a distributed filesystem algorithms must be expressed in MapReduce our distributed implementations focus on two popular methods item-based collaborative filtering matrix factorization with Alternating Least Squares
  • 19. NextDirectionsinMahout’sRecommenders 19/38 Cooccurrences start with a simplified view: imagine interaction matrix A was binary → we look at cooccurrences only item similarity computation becomes matrix multiplication S = A A scale-out of the item-based approach reduces to finding an efficient way to compute this item similarity matrix
  • 20. NextDirectionsinMahout’sRecommenders 20/38 Parallelizing S = A A standard approach of computing item cooccurrences requires random access to both users and items foreach item f do foreach user i who interacted with f do foreach item j that i also interacted with do Sfj = Sfj + 1 → not efficiently parallelizable on partitioned data row outer product formulation of matrix multiplication is efficiently parallelizable on a row-partitioned A S = A A = i∈A ai ai mappers compute the outer products of rows of A, emit the results row-wise, reducers sum these up to form S
  • 21. NextDirectionsinMahout’sRecommenders 21/38 Parallel similarity computation much more details in the implementation support for various similarity measures various optimizations (e.g. for symmetric similarity measures) downsampling of skewed interaction data in-depth description available in: Sebastian Schelter, Christoph Boden, Volker Markl: Scalable Similarity-Based Neighborhood Methods with MapReduce ACM RecSys 2012
  • 22. NextDirectionsinMahout’sRecommenders 22/38 Implementation in Mahout o.a.m.math.hadoop.similarity.cooccurrence.RowSimilarityJob computes the top-k pairwise similarities for each row of a matrix using some similarity measure o.a.m.cf.taste.hadoop.similarity.item.ItemSimilarityJob computes the top-k similar items per item using RowSimilarityJob o.a.m.cf.taste.hadoop.item.RecommenderJob computes recommendations and similar items using RowSimilarityJob
  • 23. NextDirectionsinMahout’sRecommenders 23/38 Scalable Neighborhood Methods: Experiments Setup 6 machines running Java 7 and Hadoop 1.0.4 two 4-core Opteron CPUs, 32 GB memory and four 1 TB disk drives per machine Results Yahoo Songs dataset (700M datapoints, 1.8M users, 136K items), similarity computation takes less than 100 minutes
  • 25. NextDirectionsinMahout’sRecommenders 25/38 Alternating Least Squares ALS rotates between fixing U and M. When U is fixed, the system recomputes M by solving a least-squares problem per item, and vice versa. easy to parallelize, as all users (and vice versa, items) can be recomputed independently additionally, ALS can be applied to usecases with implicit data (pageviews, clicks) ≈ × A u × i U u × k M k × i
  • 26. NextDirectionsinMahout’sRecommenders 26/38 Scalable Matrix Factorization: Implementation Recompute user feature matrix U using a broadcast-join: 1. Run a map-only job using multithreaded mappers 2. load item-feature matrix M into memory from HDFS to share it among the individual mappers 3. mappers read the interaction histories of the users 4. multithreaded: solve a least squares problem per user to recompute its feature vector user histories A user features U item features M Map Hash-Join + Re-computation localfwdlocalfwdlocalfwd Map Hash-Join + Re-computation Map Hash-Join + Re-computation broadcast machine1machine2machine3
  • 27. NextDirectionsinMahout’sRecommenders 27/38 Implementation in Mahout o.a.m.cf.taste.hadoop.als.ParallelALSFactorizationJob different solvers for explicit and implicit data Zhou et al.: Large-Scale Parallel Collaborative Filtering for the Netflix Prize, AAIM ’08 Hu et al.: Collaborative Filtering for Implicit Feedback Datasets, ICDM ’08 o.a.m.cf.taste.hadoop.als.RecommenderJob computes recommendations from a factorization in-depth description available in: Sebastian Schelter, Christoph Boden, Martin Schenck, Alexander Alexandrov, Volker Markl: Distributed Matrix Factorization with MapReduce using a series of Broadcast-Joins to appear at ACM RecSys 2013
  • 28. NextDirectionsinMahout’sRecommenders 28/38 Scalable Matrix Factorization: Experiments Cluster: 26 machines, two 4-core Opteron CPUs, 32 GB memory and four 1 TB disk drives each Hadoop Configuration: reuse JVMs, used JBlas as solver, run multithreaded mappers Datasets: Netflix (0.5M users, 100M datapoints), Yahoo Songs (1.8M users, 700M datapoints), Bigflix (25M users, 5B datapoints) 0 50 100 150 number of features r avg.durationperjob(seconds) (U )10 (M )10 (U )20 (M )20 (U )50 (M )50 (U )100 (M )100 Yahoo Songs Netflix 5 10 15 20 25 0 100 200 300 400 500 600 number of machines avg.durationperjob(seconds) Bigflix (M) Bigflix (U)
  • 29. NextDirectionsinMahout’sRecommenders 29/38 Next directions better tooling for cross-validation and hold-out tests (e.g. to find parameters for ALS) integration of more efficient solver libraries like JBlas should be easier to modify and adjust the MapReduce code
  • 30. NextDirectionsinMahout’sRecommenders 30/38 A selection of users Mendeley, a data platform for researchers (2.5M users, 50M research articles): Mendeley Suggest for discovering relevant research publications Researchgate, the world’s largest social network for researchers (3M users) a German online retailer with several million customers across Europe German online market places for real estate and pre-owned cars with millions of users
  • 32. NextDirectionsinMahout’sRecommenders 32/38 ”Small data, low load” use GenericItembasedRecommender or GenericUserbasedRecommender, feed it with interaction data stored in a file, database or key-value store have it load the interaction data in memory and compute recommendations on request collect new interactions into your files or database and periodically refresh the recommender In order to improve performance, try to: have your recommender look at fewer interactions by using SamplingCandidateItemsStrategy cache computed similarities with a CachingItemSimilarity
  • 33. NextDirectionsinMahout’sRecommenders 33/38 ”Medium data, high load” Assumption: interaction data still fits into main memory use a recommender that is able to leverage a precomputed model, e.g. GenericItembasedRecommender or SVDRecommender load the interaction data and the model in memory and compute recommendations on request collect new interactions into your files or database and periodically recompute the model and refresh the recommender use BatchItemSimilarities or ParallelSGDFactorizer for precomputing the model using multiple threads on a single machine
  • 34. NextDirectionsinMahout’sRecommenders 34/38 ”Lots of data, high load” Assumption: interaction data does not fit into main memory use a recommender that is able to leverage a precomputed model, e.g. GenericItembasedRecommender or SVDRecommender keep the interaction data in a (potentially partitioned) database or in a key-value store load the model into memory, the recommender will only use one (cacheable) query per recommendation request to retrieve the user’s interaction history collect new interactions into your files or database and periodically recompute the model offline use ItemSimilarityJob or ParallelALSFactorizationJob to precompute the model with Hadoop
  • 35. NextDirectionsinMahout’sRecommenders 35/38 ”Precompute everything” use RecommenderJob to precompute recommendations for all users with Hadoop directly serve those recommendations successfully employed by Mendeley for their research paper recommender ”Suggest” allowed them to run their recommender infrastructure serving 2 million users for less than $100 per month in AWS
  • 36. NextDirectionsinMahout’sRecommenders 36/38 Next directions ”Search engine based recommender infrastructure” (work in progress driven by Pat Ferrel) use RowSimilarityJob to find anomalously co-occuring items using Hadoop index those item pairs with a distributed search engine such as Apache Solr query based on a user’s interaction history and the search engine will answer with recommendations gives us an easy-to-use, scalable serving layer for free (Apache Solr) allows complex recommendation queries containing filters, geo-location, etc.
  • 37. NextDirectionsinMahout’sRecommenders 37/38 The shape of things to come MapReduce is not well suited for certain ML usecases, e.g. when the algorithms to apply are iterative and the dataset fits into the aggregate main memory of the cluster Mahout always stated that it is not tied to Hadoop, however there were no production-quality alternatives in the past With the advent of YARN and the maturing of alternative systems, this situation is changing and we should embrace this change Personally, I would love to see an experimental port of our distributed recommenders to another Apache-supported system such Spark or Giraph
  • 38. Thanks for listening! Follow me on twitter at http://twitter.com/sscdotopen Join Mahout’s mailinglists at http://s.apache.org/mahout-lists picture on slide 3 by Tim Abott, http://www.flickr.com/photos/theabbott/ picture on slide 21 by Crimson Diabolics, http://crimsondiabolics.deviantart.com/