SlideShare una empresa de Scribd logo
1 de 117
Descargar para leer sin conexión
Apache Mahout – Tutorial (2014)
Cataldo Musto, Ph.D.

Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale

Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014

08/01/2014
Outline
• What is Mahout ?
– Overview

• How to use Mahout ?
– Hands-on session
Part 1

What is Mahout?
Goal
• Mahout is a Java library
– Implementing Machine Learning techniques
Goal
• Mahout is a Java library
– Implementing Machine Learning techniques
•
•
•
•

Clustering
Classification
Recommendation
Frequent ItemSet
What can we do?
• Currently Mahout supports mainly four use cases:
– Recommendation - takes users' behavior and tries to find
items users might like.
– Clustering - takes e.g. text documents and groups them
into groups of topically related documents.
– Classification - learns from existing categorized documents
what documents of a specific category look like and is able
to assign unlabelled documents to the (hopefully) correct
category.
– Frequent itemset mining - takes a set of item groups
(terms in a query session, shopping cart content) and
identifies, which individual items usually appear together.
Why Mahout?
• Mahout is not the only ML framework
– Weka (http://www.cs.waikato.ac.nz/ml/weka/)
– R (http://www.r-project.org/)

• Why do we prefer Mahout?
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation

–Scalable
Why Mahout?
• Why do we prefer Mahout?
– Apache License
– Good Community
– Good Documentation

–Scalable
• Based on Hadoop (not mandatory!)
Why do we need a scalable framework?

Big Data!
Use Cases
Use Cases

Recommendation Engine on Foursquare
Use Cases

User Interest Modeling on Twitter
Use Cases

Pattern Mining on Yahoo! (as anti-spam)
Algorithms
• Recommendation
– User-based Collaborative Filtering
– Item-based Collaborative Filtering
– SlopeOne Recommenders
– Singular Value Decomposition-based CF
Algorithms
• Clustering
–
–
–
–
–

Canopy
K-Means
Fuzzy K-Means
Latent Dirichlet Allocation (LDA)
MinHash Clustering

– Draft algorithms
• Hierarchical Clustering
• (and much more…)
Algorithms
• Classification
–
–
–
–

Logistic Regression
Bayes
Random Forests
Hidden Markov Models

– Draft algorithms
•
•
•
•
•

Support Vector Machines
Perceptrons
Neural Networks
Restricted Boltzmann Machines
(and much more…)
Algorithms
• Other
– Evolutionary Algorithms
• Genetic Algorithms

– Dimensionality Reduction techniques
• Principal Component Analysis (PCA)
• Singular Value Decomposition (SVD)

– Frequent ItemSet Pattern Mining
– (and much more.)
Mahout in the
Apache Software Foundation
Mahout in the
Apache Software Foundation

Original Mahout Project
Mahout in the
Apache Software Foundation

Taste: collaborative filtering framework
Mahout in the
Apache Software Foundation

Lucene: information retrieval software library
Mahout in the
Apache Software Foundation

Hadoop: framework for distributed storage and programming based on MapReduce
General Architecture

Three-tiers architecture
(Application, Algorithms and Shared Libraries)
General Architecture

Data Storage and Shared Libraries
General Architecture

Business Logic
General Architecture

External Applications invoking Mahout APIs
In this tutorial we will focus on

Recommendation
Recommendation
• Mahout implements a Collaborative Filtering
framework
– Popularized by Amazon and others
– Uses historical data (ratings, clicks, and purchases) to
provide recommendations
• User-based: Recommend items by finding similar users. This is
often harder to scale because of the dynamic nature of users.
• Item-based: Calculate similarity between items and make
recommendations. Items usually don't change much, so this often
can be computed offline.
• Slope-One: A very fast and simple item-based recommendation
approach applicable when users have given ratings (and not just
boolean preferences).
Slope-One Recommender
• Brief Recap
• Sample Rating database (from Wikipedia)
Slope-One Recommender
•
•
•
•

Brief Recap
Sample Rating database (from Wikipedia)
Is a variant of item-based CF
Compute predictions by taking into account
the average differences between item’s ratings
Slope-One Recommender

• Step One: simplification

X

– Just two items
– Goal: to compute Lucy’s rating for item A
– Algorithm: compute the average difference between
ratings fro Item A and B
– difference: +0.5 for Item A
– Rating(Lucy,ItemA) = Rating(ItemB) + difference = 2.5
Slope-One Recommender

X

• Step One: simplification

– Just two items
– Goal: to compute Lucy’s rating for item A
– Algorithm: compute the average difference between
ratings fro Item A and C
– difference: +3 for Item A
– Rating(Lucy,ItemA) = Rating(ItemC) + difference = 8
Slope-One Recommender

• Step One: combining partial computations
– Lucy’s rating for item A is the weighted sum of the
estimation based on item B and the estimation
based on item C
Slope-One Recommender

• Step One: combining partial computations
– Weight: #users that voted both items
• John and Mark voted both Item A and Item B
• Just John voted both Item A and Item C
(comin’ back to Mahout)
Recommendation - Architecture
Recommendation - Architecture
Inceptive Idea:
A Java/J2EE application
invokes a Mahout
Recommender whose
DataModel is based on a set
of User Preferences that are
built on the ground of a
physical DataStore
Recommendation - Architecture

Physical Storage (database, files, etc.)
Recommendation - Architecture

Data Model

Physical Storage (database, files, etc.)
Recommendation - Architecture

Recommender

Data Model

Physical Storage (database, files, etc.)
Recommendation - Architecture
External Application

Recommender

Data Model

Physical Storage (database, files, etc.)
Recommendation in Mahout
• Input: raw data (user preferences)
• Output: preferences estimation
• Step 1
– Mapping raw data into a DataModel Mahout-compliant

• Step 2
– Tuning recommender components
• Similarity measure, neighborhood, etc.

• Step 3
– Computing rating estimations

• Step 4
– Evaluating recommendation
Recommendation Components
• Mahout key abstractions are implemented through
Java interfaces :
– DataModel interface
• Methods for mapping raw data to a Mahout-compliant form

– UserSimilarity interface
• Methods to calculate the degree of correlation between two users

– ItemSimilarity interface
• Methods to calculate the degree of correlation between two items

– UserNeighborhood interface
• Methods to define the concept of ‘neighborhood’

– Recommender interface
• Methods to implement the recommendation step itself
Recommendation Components
• Mahout key abstractions are implemented
through Java interfaces :
– example: DataModel interface
• Each methods for mapping raw data to a Mahoutcompliant form is an implementation of the generic
interface
• e.g. MySQLJDBCDataModel feeds a DataModel from a
MySQL database
• (and so on)
Components: DataModel
• A DataModel is the interface to draw information about user
preferences.
• Which sources is it possible to draw?
– Database
• MySQLJDBCDataModel, PostgreSQLDataModel
• NoSQL databases supported: MongoDBDataModel, CassandraDataModel

– External Files
• FileDataModel

– Generic (preferences directly feed through Java code)
• GenericDataModel

(They are all implementations of the DataModel interface)
Components: DataModel
• GenericDataModel
– Feed through Java calls

• FileDataModel
– CSV (Comma Separated Values)

• JDBCDataModel
– JDBC Driver
– Standard database structure
FileDataModel – CSV input
Components: DataModel
• Regardless the source, they all share a
common implementation.
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray
Components: DataModel
• Basic object: Preference
– Preference is a triple (user,item,score)
– Stored in UserPreferenceArray

• Two implementations
– GenericUserPreferenceArray
• It stores numerical preference, as well.

– BooleanUserPreferenceArray
• It skips numerical preference values.
Components: UserSimilarity
• UserSimilarity defines a notion of similarity between two
Users.
– (respectively) ItemSimilarity defines a notion of similarity
between two Items.

• Which definition of similarity are available?
–
–
–
–
–

Pearson Correlation
Spearman Correlation
Euclidean Distance
Tanimoto Coefficient
LogLikelihood Similarity

– Already implemented!
Example: TanimotoDistance
Example: CosineSimilarity
Different Similarity definitions
influence neighborhood formation
Pearson’s vs. Euclidean distance
Pearson’s vs. Euclidean distance
Components: UserNeighborhood
• Which definition of neighborhood are available?
– Nearest N users
• The first N users with the highest similarity are labeled as ‘neighbors’

– Thresholds
• Users whose similarity is above a threshold are labeled as ‘neighbors’

– Already implemented!
Components: Recommender
• Given a DataModel, a definition of similarity between
users (items) and a definition of neighborhood, a
recommender produces as output an estimation of
relevance for each unseen item
• Which recommendation algorithms are implemented?
–
–
–
–

User-based CF
Item-based CF
SVD-based CF
SlopeOne CF

(and much more…)
Recommendation Engines
Recap
• Many implementations of a CF-based recommender!
– 6 different recommendation algorithms
– 2 different neighborhood definitions
– 5 different similarity definitions

• Evaluation fo the different implementations is actually
very time-consuming
– The strength of Mahout lies in that it is possible to save
time in the evaluation of the different combinations of the
parameters!
– Standard interface for the evaluation of a Recommender
System
Evaluation
• Mahout provides classes for the evaluation of
a recommender system
– Prediction-based measures
• Mean Average Error
• RMSE (Root Mean Square Error)

– IR-based measures
• Precision, Recall, F1-measure, F1@n
• NDCG (ranking measure)
Evaluation
• Prediction-based Measures
– Class: AverageAbsoluteDifferenceEvaluator
– Method: evaluate()
– Parameters:
•
•
•
•

Recommender implementation
DataModel implementation
TrainingSet size (e.g. 70%)
% of the data to use in the evaluation (smaller % for
fast prototyping)
Evaluation
• IR-based Measures
– Class: GenericRecommenderIRStatsEvaluator
– Method: evaluate()
– Parameters:
•
•
•
•

Recommender implementation
DataModel implementation
Relevance Threshold (mean+standard deviation)
% of the data to use in the evaluation (smaller % for
fast prototyping)
Part 2

How to use Mahout?
Hands-on
Download Mahout
• Download
– The latest Mahout release is 0.8
– Available at:
http://apache.fastbull.org/mahout/0.8/mahoutdistribution-0.8.zip
- Extract all the libraries and include them in a new
NetBeans (Eclipse) project

• Requirement: Java 1.6.x or greater.
• Hadoop is not mandatory!
Important

JavaDoc
https://builds.apache.org/job/Mahout-Quality/javadoc/
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about preferences (how
many preferences, on which items the user
has expressed ratings, etc.)
Exercise 1
• Create a Preference object
• Set preferences through some simple Java call
• Print some statistics about preferences (how
many preferences, on which items the user has
expressed ratings, etc.)
• Hints about objects to be used:
– Preference
– GenericUserPreferenceArray
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
}
Exercise 1: preferences
import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
import org.apache.mahout.cf.taste.model.Preference;
import org.apache.mahout.cf.taste.model.PreferenceArray;
class CreatePreferenceArray {
private CreatePreferenceArray() {
}
public static void main(String[] args) {
PreferenceArray user1Prefs = new GenericUserPreferenceArray(2);

user1Prefs.setUserID(0, 1L);
user1Prefs.setItemID(0, 101L);
user1Prefs.setValue(0, 2.0f);
user1Prefs.setItemID(1, 102L);
user1Prefs.setValue(1, 3.0f);
Preference pref = user1Prefs.get(1);
System.out.println(pref);
}
}

Score 2 for Item 101
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple
Java calls
• Print some statistics about data (how many
users, how many items, maximum ratings,
etc.)
Exercise 2
• Create a DataModel
• Feed the DataModel through some simple Java
calls
• Print some statistics about data (how many users,
how many items, maximum ratings, etc.)
• Hints about objects to be used:
– FastByIdMap
– Model
Exercise 2: data model
• PreferenceArray stores the preferences of a
single user
• Where do the preferences of all the users are
stored?
– An HashMap? No.
– Mahout introduces data structures optimized for
recommendation tasks
• HashMap are replaced by FastByIDMap
Exercise 2: data model
import
import
import
import
import

org.apache.mahout.cf.taste.impl.common.FastByIDMap;
org.apache.mahout.cf.taste.impl.model.GenericDataModel;
org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray;
org.apache.mahout.cf.taste.model.DataModel;
org.apache.mahout.cf.taste.model.PreferenceArray;

class CreateGenericDataModel {
private CreateGenericDataModel() {
}

public static void main(String[] args) {
FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>();
PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10);
prefsForUser1.setUserID(0, 1L);
prefsForUser1.setItemID(0, 101L);
prefsForUser1.setValue(0, 3.0f);
prefsForUser1.setItemID(1, 102L);
prefsForUser1.setValue(1, 4.5f);
preferences.put(1L, prefsForUser1);
DataModel model = new GenericDataModel(preferences);
System.out.println(model);
}
}
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!
Exercise 3
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!

• Hints about objects to be used:
– FileDataModel
– PearsonCorrelationSimilarity,
TanimotoCoefficientSimilarity, etc.
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}

FileDataModel
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}

Similarity Definitions
Exercise 3: similarity
import org.apache.mahout.cf.taste.impl.similarity.*;
import org.apache.mahout.cf.taste.impl.model.*;
import org.apache.mahout.cf.taste.impl.model.file.FileDatModel;

class Example3_Similarity {
public static void main(String[] args) throws Exception {
// Istanzia il DataModel e crea alcune statistiche
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity pearson = new PearsonCorrelationSimilarity(model);
UserSimilarity euclidean = new EuclideanDistanceSimilarity(model);
System.out.println("Pearson:"+pearson.userSimilarity(1, 2));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2));
System.out.println("Pearson:"+pearson.userSimilarity(1, 3));
System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3));
}
}

Output
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!

• Generate neighboorhood
• Generate recommendations
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!

• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!
Exercise 4
• Create a DataModel
• Feed the DataModel through a CSV file
• Calculate similarities between users
– CSV file should contain enough data!

• Generate neighboorhood
• Generate recommendations
– Compare different combinations of parameters!

• Hints about objects to be used:
– NearestNUserNeighborhood
– GenericUserBasedRecommender
• Parameters: data model, neighborhood, similarity measure
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {

FileDataModel
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}

2 neighbours
Exercise 4: First Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("intro.csv"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 1);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}

Top-1 Recommendation
for User 1
Exercise 5: MovieLens Recommender
• Download the GroupLens dataset (100k)
– Its format is already Mahout compliant
– http://files.grouplens.org/datasets/movielens/ml100k.zip

• Preparatory Exercise: repeat exercise 3 (similarity
calculations) with a bigger dataset
• Next: now we can run the recommendation
framework against a state-of-the-art dataset
Exercise 5: MovieLens Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(1, 20);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}
Exercise 5: MovieLens Recommender
import
import
import
import
import
import
import
import

org.apache.mahout.cf.taste.impl.model.file.*;
org.apache.mahout.cf.taste.impl.neighborhood.*;
org.apache.mahout.cf.taste.impl.recommender.*;
org.apache.mahout.cf.taste.impl.similarity.*;
org.apache.mahout.cf.taste.model.*;
org.apache.mahout.cf.taste.neighborhood.*;
org.apache.mahout.cf.taste.recommender.*;
org.apache.mahout.cf.taste.similarity.*;

class RecommenderIntro {
private RecommenderIntro() {
}
public static void main(String[] args) throws Exception {
DataModel model = new FileDataModel(new File("ua.base"));
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
Recommender recommender = new GenericUserBasedRecommender(
model, neighborhood, similarity);
List<RecommendedItem> recommendations = recommender.recommend(10, 50);
for (RecommendedItem recommendation : recommendations) {
System.out.println(recommendation);
}
}
}

We can play with
parameters!
Exercise 5: MovieLens Recommender
• Analyze Recommender behavior with different
combination of parameters
– Do the recommendations change with a different
similarity measure?
– Do the recommendations change with different
neighborhood sizes?
– Which one is the best one?
• …. Let’s go to the next exercise!
Exercise 6: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, MAE, Precision
Exercise 6: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: RMSE, MAE
• Hints: useful classes
– Implementations of RecommenderEvaluator
interface
• AverageAbsoluteDifferenceRecommenderEvaluator
• RMSRecommenderEvaluator
Exercise 6: Recommender Evaluation
• Further Hints:
– Use RandomUtils.useTestSeed()to ensure
the consistency among different evaluation runs
– Invoke the evaluate() method
• Parameters
– RecommenderBuilder: recommender istance (as in previous
exercises.
– DataModelBuilder: specific criterion for training
– Split Training-Test: double value (e.g. 0.7 for 70%)
– Amount of data to use in the evaluation: double value (e.g 1.0 for
100%)
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}

Ensures the consistency between
different evaluation runs.

public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}

70%training
(evaluation on the whole
dataset)
Exercise 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}

Recommendation Engine
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
}
}

We can add more measures
Example 6: evaluation
class EvaluatorIntro {
private EvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderEvaluator rmse = new RMSEEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0);
System.out.println(score);
System.out.println(rmse);
}
}
Mahout Strengths
• Fast-prototyping and evaluation
– To evaluate a different configuration of the same
algorithm we just need to update a parameter and
run again.

– Example
• Different Neighborhood Size
5 minutes to look for the
best configuration 
Exercise 7: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: Precision, Recall
Exercise 7: Recommender Evaluation
• Evaluate different CF recommender
configurations on MovieLens data
• Metrics: Precision, Recall
• Hints: useful classes
– GenericRecommenderIRStatsEvaluator
– Evaluate() method
• Same parameters of exercise 6
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

Precision@5 , Recall@5, etc.
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

Precision@5 , Recall@5, etc.
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

Set Neighborhood to 500
Exercise 7: IR-based evaluation
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
UserSimilarity similarity = new EuclideanDistanceSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

Set Euclidean Distance
Exercise 8: item-based recommender
• Mahout provides Java classes for building an
item-based recommender system
– Amazon-like
– Recommendations are based on similarities
among items (generally pre-computed offline)
– Evaluate it with the MovieLens dataset!
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

ItemSimilarity
Example 8: item-based recommender
class IREvaluatorIntro {
private IREvaluatorIntro() {
}
public static void main(String[] args) throws Exception {
RandomUtils.useTestSeed();
DataModel model = new FileDataModel(new File("ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
// Build the same recommender for testing that we did last time:
RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
@Override
public Recommender buildRecommender(DataModel model) throws TasteException {
ItemSimilarity similarity = new PearsonCorrelationSimilarity(model);
return new GenericItemBasedRecommender(model, similarity);
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5,
GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0);
System.out.println(stats.getPrecision());
System.out.println(stats.getRecall());
System.out.println(stats.getF1());
}
}

No Neighborhood definition for itembased recommenders
Exercise 9: Recommender Evaluation
• Write a class that automatically runs
evaluation with different parameters
– e.g. fixed neighborhood sizes from an Array of
values
– Print the best scores and the configuration
Exercise 10: more datasets!
• Find the best configuration for several
datasets
– Download datasets from
http://mahout.apache.org/users/basics/collections.html

– Write classes to transform input data in a
Mahout-compliant form
– Extend exercise 9!
End. Do you want more?
Do you want more?
• Recommendation
– Deploy of a Mahout-based Web Recommender
– Integration with Hadoop
– Integration of content-based information
– Custom similarities, Custom recommenders, Rescoring functions

• Classification, Clustering and Pattern Mining

Más contenido relacionado

La actualidad más candente

Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation SystemsTrieu Nguyen
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systemsKapil Garg
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFYusuke Yamamoto
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation SystemsRobin Reni
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Recommendation system
Recommendation systemRecommendation system
Recommendation systemAkshat Thakar
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsLei Guo
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender SystemsT212
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender SystemsRoelof van Zwol
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introductionLiang Xiang
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systemsFalitokiniaina Rabearison
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender SystemsDavid Zibriczky
 
Recommender systems
Recommender systemsRecommender systems
Recommender systemsTamer Rezk
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmssamairaakram
 

La actualidad más candente (20)

Introduction to Recommendation Systems
Introduction to Recommendation SystemsIntroduction to Recommendation Systems
Introduction to Recommendation Systems
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Movie lens recommender systems
Movie lens recommender systemsMovie lens recommender systems
Movie lens recommender systems
 
Collaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CFCollaborative Filtering 1: User-based CF
Collaborative Filtering 1: User-based CF
 
Recommendation Systems
Recommendation SystemsRecommendation Systems
Recommendation Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Recommendation system
Recommendation systemRecommendation system
Recommendation system
 
Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Matrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender SystemsMatrix Factorization Techniques For Recommender Systems
Matrix Factorization Techniques For Recommender Systems
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
Interactive Recommender Systems
Interactive Recommender SystemsInteractive Recommender Systems
Interactive Recommender Systems
 
Recommender system introduction
Recommender system   introductionRecommender system   introduction
Recommender system introduction
 
[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems[Final]collaborative filtering and recommender systems
[Final]collaborative filtering and recommender systems
 
Mahout
MahoutMahout
Mahout
 
Recommender Systems
Recommender SystemsRecommender Systems
Recommender Systems
 
An introduction to Recommender Systems
An introduction to Recommender SystemsAn introduction to Recommender Systems
An introduction to Recommender Systems
 
Recommender systems
Recommender systemsRecommender systems
Recommender systems
 
Recommender system
Recommender systemRecommender system
Recommender system
 
Heap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithmsHeap Sort in Design and Analysis of algorithms
Heap Sort in Design and Analysis of algorithms
 

Destacado

Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahoutsscdotopen
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - RecommendationCataldo Musto
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningVarad Meru
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle CompetitionsDataRobot
 

Destacado (9)

Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahout
 
Tutorial Mahout - Recommendation
Tutorial Mahout - RecommendationTutorial Mahout - Recommendation
Tutorial Mahout - Recommendation
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions10 R Packages to Win Kaggle Competitions
10 R Packages to Win Kaggle Competitions
 

Similar a Apache Mahout Tutorial - Recommendation - 2013/2014

Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender systemKaren Li
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender systemStanley Wang
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSpark Summit
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsMaya Hristakeva
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.ASHISH JAGTAP
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyKris Jack
 
Recommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs FulfillmentRecommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs FulfillmentAkansha Kumar, Ph.D.
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfAlphaIssaghaDiallo
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...S. Diana Hu
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...Joaquin Delgado PhD.
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation SystemsRumman Chowdhury
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsAravindharamanan S
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionPerumalPitchandi
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsNavisro Analytics
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningS. Diana Hu
 

Similar a Apache Mahout Tutorial - Recommendation - 2013/2014 (20)

Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Filtering content bbased crs
Filtering content bbased crsFiltering content bbased crs
Filtering content bbased crs
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Sparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
 
Sparking Science up with Research Recommendations
Sparking Science up with Research RecommendationsSparking Science up with Research Recommendations
Sparking Science up with Research Recommendations
 
IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.IOTA 2016 Social Recomender System Presentation.
IOTA 2016 Social Recomender System Presentation.
 
Modern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in MendeleyModern Perspectives on Recommender Systems and their Applications in Mendeley
Modern Perspectives on Recommender Systems and their Applications in Mendeley
 
Recommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs FulfillmentRecommendation Systems : Selection vs Fulfillment
Recommendation Systems : Selection vs Fulfillment
 
IntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdfIntroductionRecommenderSystems_Petroni.pdf
IntroductionRecommenderSystems_Petroni.pdf
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Demystifying Recommendation Systems
Demystifying Recommendation SystemsDemystifying Recommendation Systems
Demystifying Recommendation Systems
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Item basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithmsItem basedcollaborativefilteringrecommendationalgorithms
Item basedcollaborativefilteringrecommendationalgorithms
 
Lecture Notes on Recommender System Introduction
Lecture Notes on Recommender System IntroductionLecture Notes on Recommender System Introduction
Lecture Notes on Recommender System Introduction
 
Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

Más de Cataldo Musto

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...Cataldo Musto
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationCataldo Musto
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Cataldo Musto
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Cataldo Musto
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Cataldo Musto
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Cataldo Musto
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Cataldo Musto
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsCataldo Musto
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Cataldo Musto
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeCataldo Musto
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemCataldo Musto
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Cataldo Musto
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...Cataldo Musto
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfCataldo Musto
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Cataldo Musto
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesCataldo Musto
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsCataldo Musto
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?Cataldo Musto
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Cataldo Musto
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkCataldo Musto
 

Más de Cataldo Musto (20)

MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...MyrrorBot: a Digital Assistant Based on Holistic User Models forPersonalize...
MyrrorBot: a Digital Assistant Based on Holistic User Models for Personalize...
 
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical EvaluationFairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
Fairness and Popularity Bias in Recommender Systems: an Empirical Evaluation
 
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
Intelligenza Artificiale e Social Media - Monitoraggio della Farnesina e La M...
 
Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...Exploring the Effects of Natural Language Justifications in Food Recommender ...
Exploring the Effects of Natural Language Justifications in Food Recommender ...
 
Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...Exploiting Distributional Semantics Models for Natural Language Context-aware...
Exploiting Distributional Semantics Models for Natural Language Context-aware...
 
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
Towards a Knowledge-aware Food Recommender System Exploiting Holistic User Mo...
 
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
Towards Queryable User Profiles: Introducing Conversational Agents in a Platf...
 
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph EmbeddingsHybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
Hybrid Semantics aware Recommendations Exploiting Knowledge Graph Embeddings
 
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...Natural Language Justifications for Recommender Systems Exploiting Text Summa...
Natural Language Justifications for Recommender Systems Exploiting Text Summa...
 
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA RispondeL'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
L'IA per l'Empowerment del Cittadino: Hate Map, Myrror, PA Risponde
 
Explanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender SystemExplanation Strategies - Advances in Content-based Recommender System
Explanation Strategies - Advances in Content-based Recommender System
 
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
Justifying Recommendations through Aspect-based Sentiment Analysis of Users R...
 
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...ExpLOD: un framework per la generazione di spiegazioni per recommender system...
ExpLOD: un framework per la generazione di spiegazioni per recommender system...
 
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified SelfMyrror: una piattaforma per Holistic User Modeling e Quantified Self
Myrror: una piattaforma per Holistic User Modeling e Quantified Self
 
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...Semantic Holistic User Modeling for Personalized Access to Digital Content an...
Semantic Holistic User Modeling for Personalized Access to Digital Content an...
 
Holistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart CitiesHolistic User Modeling for Personalized Services in Smart Cities
Holistic User Modeling for Personalized Services in Smart Cities
 
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital FootprintsA Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
A Framework for Holistic User Modeling Merging Heterogeneous Digital Footprints
 
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
eHealth, mHealth in Otorinolaringoiatria: innovazioni dirompenti o disastrose?
 
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
Semantics-aware Recommender Systems Exploiting Linked Open Data and Graph-bas...
 
Il Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social NetworkIl Linguaggio dell'Odio sui Social Network
Il Linguaggio dell'Odio sui Social Network
 

Último

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Último (20)

Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Apache Mahout Tutorial - Recommendation - 2013/2014

  • 1. Apache Mahout – Tutorial (2014) Cataldo Musto, Ph.D. Corso di Accesso Intelligente all’Informazione ed Elaborazione del Linguaggio Naturale Università degli Studi di Bari – Dipartimento di Informatica – A.A. 2013/2014 08/01/2014
  • 2. Outline • What is Mahout ? – Overview • How to use Mahout ? – Hands-on session
  • 3. Part 1 What is Mahout?
  • 4. Goal • Mahout is a Java library – Implementing Machine Learning techniques
  • 5. Goal • Mahout is a Java library – Implementing Machine Learning techniques • • • • Clustering Classification Recommendation Frequent ItemSet
  • 6. What can we do? • Currently Mahout supports mainly four use cases: – Recommendation - takes users' behavior and tries to find items users might like. – Clustering - takes e.g. text documents and groups them into groups of topically related documents. – Classification - learns from existing categorized documents what documents of a specific category look like and is able to assign unlabelled documents to the (hopefully) correct category. – Frequent itemset mining - takes a set of item groups (terms in a query session, shopping cart content) and identifies, which individual items usually appear together.
  • 7. Why Mahout? • Mahout is not the only ML framework – Weka (http://www.cs.waikato.ac.nz/ml/weka/) – R (http://www.r-project.org/) • Why do we prefer Mahout?
  • 8. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation
  • 9. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable
  • 10. Why Mahout? • Why do we prefer Mahout? – Apache License – Good Community – Good Documentation –Scalable • Based on Hadoop (not mandatory!)
  • 11. Why do we need a scalable framework? Big Data!
  • 14. Use Cases User Interest Modeling on Twitter
  • 15. Use Cases Pattern Mining on Yahoo! (as anti-spam)
  • 16. Algorithms • Recommendation – User-based Collaborative Filtering – Item-based Collaborative Filtering – SlopeOne Recommenders – Singular Value Decomposition-based CF
  • 17. Algorithms • Clustering – – – – – Canopy K-Means Fuzzy K-Means Latent Dirichlet Allocation (LDA) MinHash Clustering – Draft algorithms • Hierarchical Clustering • (and much more…)
  • 18. Algorithms • Classification – – – – Logistic Regression Bayes Random Forests Hidden Markov Models – Draft algorithms • • • • • Support Vector Machines Perceptrons Neural Networks Restricted Boltzmann Machines (and much more…)
  • 19. Algorithms • Other – Evolutionary Algorithms • Genetic Algorithms – Dimensionality Reduction techniques • Principal Component Analysis (PCA) • Singular Value Decomposition (SVD) – Frequent ItemSet Pattern Mining – (and much more.)
  • 20. Mahout in the Apache Software Foundation
  • 21. Mahout in the Apache Software Foundation Original Mahout Project
  • 22. Mahout in the Apache Software Foundation Taste: collaborative filtering framework
  • 23. Mahout in the Apache Software Foundation Lucene: information retrieval software library
  • 24. Mahout in the Apache Software Foundation Hadoop: framework for distributed storage and programming based on MapReduce
  • 26. General Architecture Data Storage and Shared Libraries
  • 29. In this tutorial we will focus on Recommendation
  • 30. Recommendation • Mahout implements a Collaborative Filtering framework – Popularized by Amazon and others – Uses historical data (ratings, clicks, and purchases) to provide recommendations • User-based: Recommend items by finding similar users. This is often harder to scale because of the dynamic nature of users. • Item-based: Calculate similarity between items and make recommendations. Items usually don't change much, so this often can be computed offline. • Slope-One: A very fast and simple item-based recommendation approach applicable when users have given ratings (and not just boolean preferences).
  • 31. Slope-One Recommender • Brief Recap • Sample Rating database (from Wikipedia)
  • 32. Slope-One Recommender • • • • Brief Recap Sample Rating database (from Wikipedia) Is a variant of item-based CF Compute predictions by taking into account the average differences between item’s ratings
  • 33. Slope-One Recommender • Step One: simplification X – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and B – difference: +0.5 for Item A – Rating(Lucy,ItemA) = Rating(ItemB) + difference = 2.5
  • 34. Slope-One Recommender X • Step One: simplification – Just two items – Goal: to compute Lucy’s rating for item A – Algorithm: compute the average difference between ratings fro Item A and C – difference: +3 for Item A – Rating(Lucy,ItemA) = Rating(ItemC) + difference = 8
  • 35. Slope-One Recommender • Step One: combining partial computations – Lucy’s rating for item A is the weighted sum of the estimation based on item B and the estimation based on item C
  • 36. Slope-One Recommender • Step One: combining partial computations – Weight: #users that voted both items • John and Mark voted both Item A and Item B • Just John voted both Item A and Item C
  • 37. (comin’ back to Mahout)
  • 39. Recommendation - Architecture Inceptive Idea: A Java/J2EE application invokes a Mahout Recommender whose DataModel is based on a set of User Preferences that are built on the ground of a physical DataStore
  • 40. Recommendation - Architecture Physical Storage (database, files, etc.)
  • 41. Recommendation - Architecture Data Model Physical Storage (database, files, etc.)
  • 42. Recommendation - Architecture Recommender Data Model Physical Storage (database, files, etc.)
  • 43. Recommendation - Architecture External Application Recommender Data Model Physical Storage (database, files, etc.)
  • 44. Recommendation in Mahout • Input: raw data (user preferences) • Output: preferences estimation • Step 1 – Mapping raw data into a DataModel Mahout-compliant • Step 2 – Tuning recommender components • Similarity measure, neighborhood, etc. • Step 3 – Computing rating estimations • Step 4 – Evaluating recommendation
  • 45. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – DataModel interface • Methods for mapping raw data to a Mahout-compliant form – UserSimilarity interface • Methods to calculate the degree of correlation between two users – ItemSimilarity interface • Methods to calculate the degree of correlation between two items – UserNeighborhood interface • Methods to define the concept of ‘neighborhood’ – Recommender interface • Methods to implement the recommendation step itself
  • 46. Recommendation Components • Mahout key abstractions are implemented through Java interfaces : – example: DataModel interface • Each methods for mapping raw data to a Mahoutcompliant form is an implementation of the generic interface • e.g. MySQLJDBCDataModel feeds a DataModel from a MySQL database • (and so on)
  • 47. Components: DataModel • A DataModel is the interface to draw information about user preferences. • Which sources is it possible to draw? – Database • MySQLJDBCDataModel, PostgreSQLDataModel • NoSQL databases supported: MongoDBDataModel, CassandraDataModel – External Files • FileDataModel – Generic (preferences directly feed through Java code) • GenericDataModel (They are all implementations of the DataModel interface)
  • 48. Components: DataModel • GenericDataModel – Feed through Java calls • FileDataModel – CSV (Comma Separated Values) • JDBCDataModel – JDBC Driver – Standard database structure
  • 50. Components: DataModel • Regardless the source, they all share a common implementation. • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray
  • 51. Components: DataModel • Basic object: Preference – Preference is a triple (user,item,score) – Stored in UserPreferenceArray • Two implementations – GenericUserPreferenceArray • It stores numerical preference, as well. – BooleanUserPreferenceArray • It skips numerical preference values.
  • 52. Components: UserSimilarity • UserSimilarity defines a notion of similarity between two Users. – (respectively) ItemSimilarity defines a notion of similarity between two Items. • Which definition of similarity are available? – – – – – Pearson Correlation Spearman Correlation Euclidean Distance Tanimoto Coefficient LogLikelihood Similarity – Already implemented!
  • 58. Components: UserNeighborhood • Which definition of neighborhood are available? – Nearest N users • The first N users with the highest similarity are labeled as ‘neighbors’ – Thresholds • Users whose similarity is above a threshold are labeled as ‘neighbors’ – Already implemented!
  • 59. Components: Recommender • Given a DataModel, a definition of similarity between users (items) and a definition of neighborhood, a recommender produces as output an estimation of relevance for each unseen item • Which recommendation algorithms are implemented? – – – – User-based CF Item-based CF SVD-based CF SlopeOne CF (and much more…)
  • 61. Recap • Many implementations of a CF-based recommender! – 6 different recommendation algorithms – 2 different neighborhood definitions – 5 different similarity definitions • Evaluation fo the different implementations is actually very time-consuming – The strength of Mahout lies in that it is possible to save time in the evaluation of the different combinations of the parameters! – Standard interface for the evaluation of a Recommender System
  • 62. Evaluation • Mahout provides classes for the evaluation of a recommender system – Prediction-based measures • Mean Average Error • RMSE (Root Mean Square Error) – IR-based measures • Precision, Recall, F1-measure, F1@n • NDCG (ranking measure)
  • 63. Evaluation • Prediction-based Measures – Class: AverageAbsoluteDifferenceEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation TrainingSet size (e.g. 70%) % of the data to use in the evaluation (smaller % for fast prototyping)
  • 64. Evaluation • IR-based Measures – Class: GenericRecommenderIRStatsEvaluator – Method: evaluate() – Parameters: • • • • Recommender implementation DataModel implementation Relevance Threshold (mean+standard deviation) % of the data to use in the evaluation (smaller % for fast prototyping)
  • 65. Part 2 How to use Mahout? Hands-on
  • 66. Download Mahout • Download – The latest Mahout release is 0.8 – Available at: http://apache.fastbull.org/mahout/0.8/mahoutdistribution-0.8.zip - Extract all the libraries and include them in a new NetBeans (Eclipse) project • Requirement: Java 1.6.x or greater. • Hadoop is not mandatory!
  • 68. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.)
  • 69. Exercise 1 • Create a Preference object • Set preferences through some simple Java call • Print some statistics about preferences (how many preferences, on which items the user has expressed ratings, etc.) • Hints about objects to be used: – Preference – GenericUserPreferenceArray
  • 70. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } }
  • 71. Exercise 1: preferences import org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; import org.apache.mahout.cf.taste.model.Preference; import org.apache.mahout.cf.taste.model.PreferenceArray; class CreatePreferenceArray { private CreatePreferenceArray() { } public static void main(String[] args) { PreferenceArray user1Prefs = new GenericUserPreferenceArray(2); user1Prefs.setUserID(0, 1L); user1Prefs.setItemID(0, 101L); user1Prefs.setValue(0, 2.0f); user1Prefs.setItemID(1, 102L); user1Prefs.setValue(1, 3.0f); Preference pref = user1Prefs.get(1); System.out.println(pref); } } Score 2 for Item 101
  • 72. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.)
  • 73. Exercise 2 • Create a DataModel • Feed the DataModel through some simple Java calls • Print some statistics about data (how many users, how many items, maximum ratings, etc.) • Hints about objects to be used: – FastByIdMap – Model
  • 74. Exercise 2: data model • PreferenceArray stores the preferences of a single user • Where do the preferences of all the users are stored? – An HashMap? No. – Mahout introduces data structures optimized for recommendation tasks • HashMap are replaced by FastByIDMap
  • 75. Exercise 2: data model import import import import import org.apache.mahout.cf.taste.impl.common.FastByIDMap; org.apache.mahout.cf.taste.impl.model.GenericDataModel; org.apache.mahout.cf.taste.impl.model.GenericUserPreferenceArray; org.apache.mahout.cf.taste.model.DataModel; org.apache.mahout.cf.taste.model.PreferenceArray; class CreateGenericDataModel { private CreateGenericDataModel() { } public static void main(String[] args) { FastByIDMap<PreferenceArray> preferences = new FastByIDMap<PreferenceArray>(); PreferenceArray prefsForUser1 = new GenericUserPreferenceArray(10); prefsForUser1.setUserID(0, 1L); prefsForUser1.setItemID(0, 101L); prefsForUser1.setValue(0, 3.0f); prefsForUser1.setItemID(1, 102L); prefsForUser1.setValue(1, 4.5f); preferences.put(1L, prefsForUser1); DataModel model = new GenericDataModel(preferences); System.out.println(model); } }
  • 76. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data!
  • 77. Exercise 3 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Hints about objects to be used: – FileDataModel – PearsonCorrelationSimilarity, TanimotoCoefficientSimilarity, etc.
  • 78. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } }
  • 79. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } FileDataModel
  • 80. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Similarity Definitions
  • 81. Exercise 3: similarity import org.apache.mahout.cf.taste.impl.similarity.*; import org.apache.mahout.cf.taste.impl.model.*; import org.apache.mahout.cf.taste.impl.model.file.FileDatModel; class Example3_Similarity { public static void main(String[] args) throws Exception { // Istanzia il DataModel e crea alcune statistiche DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity pearson = new PearsonCorrelationSimilarity(model); UserSimilarity euclidean = new EuclideanDistanceSimilarity(model); System.out.println("Pearson:"+pearson.userSimilarity(1, 2)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 2)); System.out.println("Pearson:"+pearson.userSimilarity(1, 3)); System.out.println("Euclidean:"+euclidean.userSimilarity(1, 3)); } } Output
  • 82. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations
  • 83. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters!
  • 84. Exercise 4 • Create a DataModel • Feed the DataModel through a CSV file • Calculate similarities between users – CSV file should contain enough data! • Generate neighboorhood • Generate recommendations – Compare different combinations of parameters! • Hints about objects to be used: – NearestNUserNeighborhood – GenericUserBasedRecommender • Parameters: data model, neighborhood, similarity measure
  • 85. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 86. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { FileDataModel DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 87. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } 2 neighbours
  • 88. Exercise 4: First Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("intro.csv")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(2, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 1); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } Top-1 Recommendation for User 1
  • 89. Exercise 5: MovieLens Recommender • Download the GroupLens dataset (100k) – Its format is already Mahout compliant – http://files.grouplens.org/datasets/movielens/ml100k.zip • Preparatory Exercise: repeat exercise 3 (similarity calculations) with a bigger dataset • Next: now we can run the recommendation framework against a state-of-the-art dataset
  • 90. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(1, 20); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } }
  • 91. Exercise 5: MovieLens Recommender import import import import import import import import org.apache.mahout.cf.taste.impl.model.file.*; org.apache.mahout.cf.taste.impl.neighborhood.*; org.apache.mahout.cf.taste.impl.recommender.*; org.apache.mahout.cf.taste.impl.similarity.*; org.apache.mahout.cf.taste.model.*; org.apache.mahout.cf.taste.neighborhood.*; org.apache.mahout.cf.taste.recommender.*; org.apache.mahout.cf.taste.similarity.*; class RecommenderIntro { private RecommenderIntro() { } public static void main(String[] args) throws Exception { DataModel model = new FileDataModel(new File("ua.base")); UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); Recommender recommender = new GenericUserBasedRecommender( model, neighborhood, similarity); List<RecommendedItem> recommendations = recommender.recommend(10, 50); for (RecommendedItem recommendation : recommendations) { System.out.println(recommendation); } } } We can play with parameters!
  • 92. Exercise 5: MovieLens Recommender • Analyze Recommender behavior with different combination of parameters – Do the recommendations change with a different similarity measure? – Do the recommendations change with different neighborhood sizes? – Which one is the best one? • …. Let’s go to the next exercise!
  • 93. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE, Precision
  • 94. Exercise 6: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: RMSE, MAE • Hints: useful classes – Implementations of RecommenderEvaluator interface • AverageAbsoluteDifferenceRecommenderEvaluator • RMSRecommenderEvaluator
  • 95. Exercise 6: Recommender Evaluation • Further Hints: – Use RandomUtils.useTestSeed()to ensure the consistency among different evaluation runs – Invoke the evaluate() method • Parameters – RecommenderBuilder: recommender istance (as in previous exercises. – DataModelBuilder: specific criterion for training – Split Training-Test: double value (e.g. 0.7 for 70%) – Amount of data to use in the evaluation: double value (e.g 1.0 for 100%)
  • 96. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } Ensures the consistency between different evaluation runs. public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  • 97. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } }
  • 98. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } 70%training (evaluation on the whole dataset)
  • 99. Exercise 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } Recommendation Engine
  • 100. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); } } We can add more measures
  • 101. Example 6: evaluation class EvaluatorIntro { private EvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator(); RecommenderEvaluator rmse = new RMSEEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; double score = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); double rmse = evaluator.evaluate(recommenderBuilder, null, model, 0.7, 1.0); System.out.println(score); System.out.println(rmse); } }
  • 102. Mahout Strengths • Fast-prototyping and evaluation – To evaluate a different configuration of the same algorithm we just need to update a parameter and run again. – Example • Different Neighborhood Size
  • 103. 5 minutes to look for the best configuration 
  • 104. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall
  • 105. Exercise 7: Recommender Evaluation • Evaluate different CF recommender configurations on MovieLens data • Metrics: Precision, Recall • Hints: useful classes – GenericRecommenderIRStatsEvaluator – Evaluate() method • Same parameters of exercise 6
  • 106. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  • 107. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(100, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Precision@5 , Recall@5, etc.
  • 108. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new PearsonCorrelationSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Neighborhood to 500
  • 109. Exercise 7: IR-based evaluation class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { UserSimilarity similarity = new EuclideanDistanceSimilarity(model); UserNeighborhood neighborhood = new NearestNUserNeighborhood(500, similarity, model); return new GenericUserBasedRecommender(model, neighborhood, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } Set Euclidean Distance
  • 110. Exercise 8: item-based recommender • Mahout provides Java classes for building an item-based recommender system – Amazon-like – Recommendations are based on similarities among items (generally pre-computed offline) – Evaluate it with the MovieLens dataset!
  • 111. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } }
  • 112. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } ItemSimilarity
  • 113. Example 8: item-based recommender class IREvaluatorIntro { private IREvaluatorIntro() { } public static void main(String[] args) throws Exception { RandomUtils.useTestSeed(); DataModel model = new FileDataModel(new File("ua.base")); RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator(); // Build the same recommender for testing that we did last time: RecommenderBuilder recommenderBuilder = new RecommenderBuilder() { @Override public Recommender buildRecommender(DataModel model) throws TasteException { ItemSimilarity similarity = new PearsonCorrelationSimilarity(model); return new GenericItemBasedRecommender(model, similarity); } }; IRStatistics stats = evaluator.evaluate(recommenderBuilder, null, model, null, 5, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1,0); System.out.println(stats.getPrecision()); System.out.println(stats.getRecall()); System.out.println(stats.getF1()); } } No Neighborhood definition for itembased recommenders
  • 114. Exercise 9: Recommender Evaluation • Write a class that automatically runs evaluation with different parameters – e.g. fixed neighborhood sizes from an Array of values – Print the best scores and the configuration
  • 115. Exercise 10: more datasets! • Find the best configuration for several datasets – Download datasets from http://mahout.apache.org/users/basics/collections.html – Write classes to transform input data in a Mahout-compliant form – Extend exercise 9!
  • 116. End. Do you want more?
  • 117. Do you want more? • Recommendation – Deploy of a Mahout-based Web Recommender – Integration with Hadoop – Integration of content-based information – Custom similarities, Custom recommenders, Rescoring functions • Classification, Clustering and Pattern Mining