Intro to Mahout

Agenda and such…

 What is ML (Machine Learning)
 ML Common Use Cases
 Mahout Overview
 Algorithms in Mahout
 Mahout Commercial Use
 Mahout Summary

What is ML

“Machine Learning is programming
computers to optimize a performance
criterion using example data or past
experience”

 Intro. To Machine Learning by E. Alpaydin

ML Common Use Cases

 Recommendation

ML Common Use Cases

 Classification

ML Common Use Cases

 Clustering

Mahout Overview – What ?

A mahout is a person who keeps and drives
an elephant


 A scalable machine learning library


 Began life at 2008 as a subproject of
Apache’s Lucene project
 On 2010 Mahout became a top-level
Apache project in its own right
 Implemented in Java
 Built upon Apache’s Hadoop (Look ! An
Elephant !)

Mahout Overview – Why ?

 Many open source ML libraries either:
 Lack community
 Lack documentation and examples
 Lack scalability
 Lack the Apache license
 Are research oriented
 Not well tested
 Not built over existing production quality
libraries


 Scalability
 Scalable to reasonably large datasets (core
algorithms implemented in Map/Reduce,
runnable on Hadoop)
 Scalable to support your business case
(Apache License)
 Scalable community


 Built over existing production quality
libraries

Mahout Overview – Use Cases

 Mahout currently supports mainly four
use cases:
1. Recommendation
2. Clustering
3. Classification
4. Frequent Itemset Mining

Mahout Overview - Technical

 System Requirements
 Linux (or Cygwin on Windows)
 Java 1.6.x or greater
 Maven 2.0.11 or greater to build the source
code
 Hadoop 0.2 or greater*

* Not all algorithms are implemented to work on Hadoop clusters

Algorithms in Mahout

 We’ll focus on one example:
 Collaborative Filtering (Recommenders)

 Yet there are many (many !!) more, you
can find them all on
https://cwiki.apache.org/confluence/dis
play/MAHOUT/Algorithms

Algorithms Examples –
Recommendation

 Help users find items they might like
based on historical preferences

 Based on example by Sebastian Schelter in “Distributed Itembased
Collaborative Filtering with Apache Mahout”

Recommendation

Alice 5 1 4

Bob ? 2 5

Peter 4 3 2

Recommendation

 Algorithm
 Neighborhood-based approach
 Works by finding similarly rated items in the
user-item-matrix (e.g. cosine, Pearson-
Correlation, Tanimoto Coefficient)
 Estimates a user's preference towards an
item by looking at his/her preferences
towards similar items

Recommendation

 Prediction: Estimate Bob's preference
towards “The Matrix”
1. Look at all items that
 a) are similar to “The Matrix“
 b) have been rated by Bob
=> “Alien“, “Inception“
2. Estimate the unknown preference with a
weighted sum

Recommendation

 MapReduce phase 1
 Map – Make user the key
(Alice, Matrix, 5) Alice (Matrix, 5)
(Alice, Alien, 1) Alice (Alien, 1)
(Alice, Inception, 4) Alice (Inception, 4)
(Bob, Alien, 2) Bob (Alien, 2)
(Bob, Inception, 5) Bob (Inception, 5)
(Peter, Matrix, 4) Peter (Matrix, 4)
(Peter, Alien, 3) Peter (Alien, 3)
(Peter, Inception, 2) Peter (Inception, 2)

Recommendation

 Reduce – Create inverted index
Alice (Matrix, 5)
Alice (Alien, 1)
Alice (Inception, 4) Alice (Matrix, 5) (Alien, 1) (Inception, 4)
Bob (Alien, 2) Bob (Alien, 2) (Inception, 5)
Bob (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2)
Peter (Matrix, 4)
Peter (Alien, 3)
Peter (Inception, 2)

Recommendation

 Map – Isolate all co-occurred ratings (all
cases where a user rated both items)
Matrix, Alien (5,1)
Matrix, Alien (4,3)
Alice (Matrix, 5) (Alien, 1) (Inception, 4) Alien, Inception (1,4)
Bob (Alien, 2) (Inception, 5) Alien, Inception (2,5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2) Alien, Inception (3,2)
Matrix, Inception (4,2)

Recommendation

 Reduce – Compute similarities

Matrix, Alien (5,1)
Matrix, Alien (4,3)
Alien, Inception (1,4) Matrix, Alien (-0.47)
Alien, Inception (2,5) Matrix, Inception (0.47)
Alien, Inception (3,2) Alien, Inception(-0.63)

Recommendation

Alice 5 1 4

Bob 1.5 2 5

Peter 4 3 2

Mahout Commercial Use

 Commercial use

Mahout Resources

 Mahout website - http://mahout.apache.org/
 Introducing Apache Mahout –
http://www.ibm.com/developerworks/java/lib
rary/j-mahout/
 “Mahout In Action” by Sean Owen and Robin
Anil

Mahout Summary

 ML is all over the web today
 Mahout is about scalable machine
learning
 Mahout has functionality for many of
today’s common machine learning tasks
 MapReduce magic in
action

Mahout Summary

Thank you and good night

Intro to Mahout

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Intro to Mahout

Similar a Intro to Mahout (20)

Más de Uri Lavi

Más de Uri Lavi (9)

Último

Último (20)

Intro to Mahout

Notas del editor