SlideShare a Scribd company logo
1 of 55
Download to read offline
Mahout
Algorithms
Mahmut Karakaya
Agenda
- Introduction
- Collaborative Filtering
- Map/Reduce
- Clustering
- Demo
What mahout means
Elephant rider in Hindi
What Apache Mahout is
- Java, Hadoop
- Collaborative Filtering
- Mahout In Action
- user@mahout.apache.org
- 0.9 (1-Feb-2014)
Who uses Mahout
Mahout in Apache Foundation
overstock.com saves $2m a year
Judd Bagley Saum Noursalehi
Others
- Weka (Machine Learning Library)
- Lenskit (Grouplens)
- EasyRec (RestAPI)
- Write yourself:)
Need to know ML?
Need to know ML?
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapred.input.dir=input/input.txt 
-Dmapred.output.dir=output
--usersFile input/users.txt --booleanData
Data Model (u,i,r)
Similarity
Cosine Similarity
Cosine Similarity
Collaborative Filtering
- Data format = userId, itemId, rating
- Create Model + Predict
Item Based - Similarity Matrix (Item-Item)
Item Based - Predict
- Weighted Sum:
r^(3,1) = 2 * 0.91 + ...
Item Based
Item Based.. Why in Mahout
- Generic recommender like User Based
- User Based similarity matrix is heavier
Singular Value Decomposition (SVD)
SVDRecommeder
Factorization
Factorizer
Singular Value Decomposition (SVD)
m * n → m * k + n * k
10M → 100K + 10K
Lets say; m=10K
n = 1K
k=10
Singular Value Decomposition (SVD)
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=1
SVD k=3 λ=0.1 a=40 c.a=10
SVD.. Why in Mahout
- Won Netflix Prize
- Parallelizable by row, column
Map / Reduce Mapper
1.txt 2.txt
Hello Hello
Hello
Map / Reduce Mapper
Map / Reduce Mapper
Map1 Map2
Hello,1 Hello,1
Hello,1
Map / Reduce Reducer
Map / Reduce Reducer
Hello,3
Map / Reduce ItemBased
Map / Reduce ItemBased
hadoop.jar mahout-core-0.8-job.jar 
org.apache.mahout.cf.taste.hadoop.item.
RecommenderJob 
-Dmapred.input.dir=input/input.txt 
-Dmapred.output.dir=output
--usersFile input/users.txt --booleanData
Map / Reduce ItemBased
Map / Reduce ItemBased
Map / Reduce ItemBased
Map 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Reduce 1
Map / Reduce ItemBased
Map 2
Map / Reduce ItemBased
Reduce 2
Map / Reduce ItemBased
Map / Reduce.. Why in Mahout
Clustering
- KMeans Clustering (SM,MR)
- Fuzzy kMeans (SM,MR)
- Canopy Clustering (SM,MR)
- Dirichlet (SM,MR)
Kmeans
Kmeans
Clustering Evaluation
Clustering Intra Distance
Clustering Inter Distance
Clustering.. Why in Mahout
- Sparsity
- ~10m of 11m users registered 1 Sony product
Clustering.. Why in Mahout
- Group Recommendation
- Cluster Based Recommendation
Create WishList Experience
- Mahout (SVD)
- Play
- Heroku
- MongoLab
- Rest
http://recommenderplaybbs.herokuapp.com/
Thank you

More Related Content

What's hot

MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
Takahiro Inoue
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
Hortonworks
 
Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with Palladium
PyData
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
DataWorks Summit
 
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
camp_drupal_ua
 

What's hot (19)

Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Mahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud PlatformMahout Workshop on Google Cloud Platform
Mahout Workshop on Google Cloud Platform
 
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing ModelMongoDB & Hadoop: Flexible Hourly Batch Processing Model
MongoDB & Hadoop: Flexible Hourly Batch Processing Model
 
New features in Pig 0.11
New features in Pig 0.11New features in Pig 0.11
New features in Pig 0.11
 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
 
Robert Meyer- pypet
Robert Meyer- pypetRobert Meyer- pypet
Robert Meyer- pypet
 
Introduction to Apache Pig
Introduction to Apache PigIntroduction to Apache Pig
Introduction to Apache Pig
 
Dr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with PalladiumDr. Andreas Lattner- Setting up predictive services with Palladium
Dr. Andreas Lattner- Setting up predictive services with Palladium
 
Leveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL EnvironmentLeveraging Hadoop in your PostgreSQL Environment
Leveraging Hadoop in your PostgreSQL Environment
 
Hive User Meeting August 2009 Facebook
Hive User Meeting August 2009 FacebookHive User Meeting August 2009 Facebook
Hive User Meeting August 2009 Facebook
 
Intro To Cascading
Intro To CascadingIntro To Cascading
Intro To Cascading
 
Analytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table FunctionsAnalytical Queries with Hive: SQL Windowing and Table Functions
Analytical Queries with Hive: SQL Windowing and Table Functions
 
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning PerformanceAnirudh Koul. 30 Golden Rules of Deep Learning Performance
Anirudh Koul. 30 Golden Rules of Deep Learning Performance
 
Using R with Hadoop
Using R with HadoopUsing R with Hadoop
Using R with Hadoop
 
20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark20130912 YTC_Reynold Xin_Spark and Shark
20130912 YTC_Reynold Xin_Spark and Shark
 
Hadoop spark performance comparison
Hadoop spark performance comparisonHadoop spark performance comparison
Hadoop spark performance comparison
 
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
Andriy Podanenko.Drupal database api.DrupalCamp Kyiv 2011
 
Foreign Data Wrapper Enhancements
Foreign Data Wrapper EnhancementsForeign Data Wrapper Enhancements
Foreign Data Wrapper Enhancements
 
Postgresql Federation
Postgresql FederationPostgresql Federation
Postgresql Federation
 

Viewers also liked

Java WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRsJava WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRs
Hernan Rengifo
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
Hadi Mohammadzadeh
 
Unidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De ClasesUnidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De Clases
Sergio Sanchez
 
Modelos de Base de Datos
Modelos de Base de DatosModelos de Base de Datos
Modelos de Base de Datos
Axel Mérida
 

Viewers also liked (20)

Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
K fold validation
K fold validationK fold validation
K fold validation
 
Recomendación con Mahout sobre Cassandra
Recomendación con Mahout sobre CassandraRecomendación con Mahout sobre Cassandra
Recomendación con Mahout sobre Cassandra
 
Filtros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de RecomendaciónFiltros Colaborativos y Sistemas de Recomendación
Filtros Colaborativos y Sistemas de Recomendación
 
Java WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRsJava WebServices JaxWS - JaxRs
Java WebServices JaxWS - JaxRs
 
Final Presentation for Pattern Recognition
Final Presentation for Pattern RecognitionFinal Presentation for Pattern Recognition
Final Presentation for Pattern Recognition
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Modelo del dominio
Modelo del dominioModelo del dominio
Modelo del dominio
 
Text mining, By Hadi Mohammadzadeh
Text mining, By Hadi MohammadzadehText mining, By Hadi Mohammadzadeh
Text mining, By Hadi Mohammadzadeh
 
Unidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De ClasesUnidad 10 Mad Diagrama De Clases
Unidad 10 Mad Diagrama De Clases
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Big Data: Social Network Analysis
Big Data: Social Network AnalysisBig Data: Social Network Analysis
Big Data: Social Network Analysis
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
 
Apache tika
Apache tikaApache tika
Apache tika
 
Extending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with KubernetesExtending DevOps to Big Data Applications with Kubernetes
Extending DevOps to Big Data Applications with Kubernetes
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Modelos de Base de Datos
Modelos de Base de DatosModelos de Base de Datos
Modelos de Base de Datos
 
Modelo relacional
Modelo relacionalModelo relacional
Modelo relacional
 

Similar to Apache Mahout Algorithms

Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
pappupassindia
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
beaknit
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
beaknit
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
DataWorks Summit
 

Similar to Apache Mahout Algorithms (20)

Hadoop interview question
Hadoop interview questionHadoop interview question
Hadoop interview question
 
Introduction to Apache Hadoop
Introduction to Apache HadoopIntroduction to Apache Hadoop
Introduction to Apache Hadoop
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
 
Aws dc elastic-mapreduce
Aws dc elastic-mapreduceAws dc elastic-mapreduce
Aws dc elastic-mapreduce
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
 
Apache Spark Introduction @ University College London
Apache Spark Introduction @ University College LondonApache Spark Introduction @ University College London
Apache Spark Introduction @ University College London
 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
 
Large scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using sparkLarge scale logistic regression and linear support vector machines using spark
Large scale logistic regression and linear support vector machines using spark
 
An Overview of Apache Spark
An Overview of Apache SparkAn Overview of Apache Spark
An Overview of Apache Spark
 
Hadoop - Lessons Learned
Hadoop - Lessons LearnedHadoop - Lessons Learned
Hadoop - Lessons Learned
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
AWS Hadoop and PIG and overview
AWS Hadoop and PIG and overviewAWS Hadoop and PIG and overview
AWS Hadoop and PIG and overview
 
Hadoop Hackathon Reader
Hadoop Hackathon ReaderHadoop Hackathon Reader
Hadoop Hackathon Reader
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
 
Lecture 2 part 3
Lecture 2 part 3Lecture 2 part 3
Lecture 2 part 3
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
 
Introduction to Spark
Introduction to SparkIntroduction to Spark
Introduction to Spark
 
Artigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdfArtigo 81 - spark_tutorial.pdf
Artigo 81 - spark_tutorial.pdf
 

Recently uploaded

%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 

Recently uploaded (20)

WSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - KeynoteWSO2Con204 - Hard Rock Presentation - Keynote
WSO2Con204 - Hard Rock Presentation - Keynote
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security ProgramWSO2CON 2024 - How to Run a Security Program
WSO2CON 2024 - How to Run a Security Program
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
Abortion Pills In Pretoria ](+27832195400*)[ 🏥 Women's Abortion Clinic In Pre...
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
%in Rustenburg+277-882-255-28 abortion pills for sale in Rustenburg
 

Apache Mahout Algorithms