SlideShare una empresa de Scribd logo
1 de 29
Ofer Vugman
 May 2012
Agenda and such…


   What is ML (Machine Learning)
   ML Common Use Cases
   Mahout Overview
   Algorithms in Mahout
   Mahout Commercial Use
   Mahout Summary
What is ML



       “Machine Learning is programming
      computers to optimize a performance
       criterion using example data or past
                    experience”


 Intro. To Machine Learning by E. Alpaydin
ML Common Use Cases


 Recommendation
ML Common Use Cases


 Classification
ML Common Use Cases


 Clustering
ML Common Libraries
Mahout Overview – What ?


A mahout is a person who keeps and drives
  an elephant
Mahout Overview – What ?


 A scalable machine learning library
Mahout Overview – What ?


 Began life at 2008 as a subproject of
  Apache’s Lucene project
 On 2010 Mahout became a top-level
  Apache project in its own right
 Implemented in Java
 Built upon Apache’s Hadoop (Look ! An
  Elephant !)
Mahout Overview – Why ?


 Many open source ML libraries either:
   Lack community
   Lack documentation and examples
   Lack scalability
   Lack the Apache license
   Are research oriented
   Not well tested
   Not built over existing production quality
    libraries
Mahout Overview – Why ?


 Scalability
   Scalable to reasonably large datasets (core
    algorithms implemented in Map/Reduce,
    runnable on Hadoop)
   Scalable to support your business case
    (Apache License)
   Scalable community
Mahout Overview – Why ?


 Built over existing production quality
  libraries
Mahout Overview – Use Cases


 Mahout currently supports mainly four
  use cases:
  1. Recommendation
  2. Clustering
  3. Classification
  4. Frequent Itemset Mining
Mahout Overview - Technical


 System Requirements
     Linux (or Cygwin on Windows)
     Java 1.6.x or greater
     Maven 2.0.11 or greater to build the source
      code
     Hadoop 0.2 or greater*


* Not all algorithms are implemented to work on Hadoop clusters
Algorithms in Mahout


 We’ll focus on one example:
   Collaborative Filtering (Recommenders)



 Yet there are many (many !!) more, you
  can find them all on
  https://cwiki.apache.org/confluence/dis
  play/MAHOUT/Algorithms
Algorithms Examples –
Recommendation

 Help users find items they might like
  based on historical preferences




 Based on example by Sebastian Schelter in “Distributed Itembased
  Collaborative Filtering with Apache Mahout”
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     ?     2   5




     Peter    4     3   2
Algorithms Examples –
Recommendation

 Algorithm
   Neighborhood-based approach
   Works by finding similarly rated items in the
    user-item-matrix (e.g. cosine, Pearson-
    Correlation, Tanimoto Coefficient)
   Estimates a user's preference towards an
    item by looking at his/her preferences
    towards similar items
Algorithms Examples –
Recommendation

 Prediction: Estimate Bob's preference
  towards “The Matrix”
  1. Look at all items that
        a) are similar to “The Matrix“
        b) have been rated by Bob
           => “Alien“, “Inception“
  2. Estimate the unknown preference with a
     weighted sum
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Map – Make user the key
    (Alice, Matrix, 5)        Alice (Matrix, 5)
    (Alice, Alien, 1)         Alice (Alien, 1)
    (Alice, Inception, 4)     Alice (Inception, 4)
    (Bob, Alien, 2)           Bob (Alien, 2)
    (Bob, Inception, 5)       Bob (Inception, 5)
    (Peter, Matrix, 4)        Peter (Matrix, 4)
    (Peter, Alien, 3)         Peter (Alien, 3)
    (Peter, Inception, 2)     Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 1
   Reduce – Create inverted index
 Alice (Matrix, 5)
 Alice (Alien, 1)
 Alice (Inception, 4)     Alice (Matrix, 5) (Alien, 1) (Inception, 4)
 Bob (Alien, 2)           Bob (Alien, 2) (Inception, 5)
 Bob (Inception, 5)       Peter(Matrix, 4) (Alien, 3) (Inception, 2)
 Peter (Matrix, 4)
 Peter (Alien, 3)
 Peter (Inception, 2)
Algorithms Examples –
Recommendation

 MapReduce phase 2
    Map – Isolate all co-occurred ratings (all
      cases where a user rated both items)
                                              Matrix, Alien (5,1)
                                              Matrix, Alien (4,3)
Alice (Matrix, 5) (Alien, 1) (Inception, 4)   Alien, Inception (1,4)
Bob (Alien, 2) (Inception, 5)                 Alien, Inception (2,5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2)    Alien, Inception (3,2)
                                              Matrix, Inception (4,2)
                                              Matrix, Inception (5,4)
Algorithms Examples –
Recommendation

 MapReduce phase 2
   Reduce – Compute similarities

  Matrix, Alien (5,1)
  Matrix, Alien (4,3)
  Alien, Inception (1,4)    Matrix, Alien (-0.47)
  Alien, Inception (2,5)    Matrix, Inception (0.47)
  Alien, Inception (3,2)    Alien, Inception(-0.63)
  Matrix, Inception (4,2)
  Matrix, Inception (5,4)
Algorithms Examples –
Recommendation




      Alice   5     1   4




      Bob     1.5   2   5




     Peter    4     3   2
Mahout Commercial Use


 Commercial use
Mahout Resources

 Mahout website - http://mahout.apache.org/
 Introducing Apache Mahout –
  http://www.ibm.com/developerworks/java/lib
  rary/j-mahout/
 “Mahout In Action” by Sean Owen and Robin
  Anil
Mahout Summary


 ML is all over the web today
 Mahout is about scalable machine
  learning
 Mahout has functionality for many of
  today’s common machine learning tasks
 MapReduce magic in
  action
Mahout Summary




     Thank you and good night

Más contenido relacionado

La actualidad más candente

Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用James Chen
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engineKeeyong Han
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionVarad Meru
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Cataldo Musto
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Varad Meru
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutTed Dunning
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkEvan Casey
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahoutaneeshabakharia
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningJoaquin Delgado PhD.
 
Apache Mahout
Apache MahoutApache Mahout
Apache MahoutAjit Koti
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for EveryoneAly Abdelkareem
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache MahoutAman Adhikari
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013MLconf
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildSujit Pal
 

La actualidad más candente (20)

Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用Apache Mahout 於電子商務的應用
Apache Mahout 於電子商務的應用
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)Mahout Tutorial and Hands-on (version 2015)
Mahout Tutorial and Hands-on (version 2015)
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
Large-scale Parallel Collaborative Filtering and Clustering using MapReduce f...
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache SparkScalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
 
Mahout
MahoutMahout
Mahout
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Machine Learning for Everyone
Machine Learning for EveryoneMachine Learning for Everyone
Machine Learning for Everyone
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013
 
Learning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search GuildLearning to Rank Presentation (v2) at LexisNexis Search Guild
Learning to Rank Presentation (v2) at LexisNexis Search Guild
 

Destacado

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahoutGaurav Kasliwal
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsSmitha Mysore Lokesh
 
Vaklipi Text Analytics Tools
Vaklipi Text Analytics ToolsVaklipi Text Analytics Tools
Vaklipi Text Analytics Toolsaiaioo
 
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaVPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaHanaysha
 
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015Ertunga Arsal
 
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)Ryan Cuprak
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithmsmozgkarakaya
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Splunk
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and SolrTommaso Teofili
 
Building an Analytics Enables SOC
Building an Analytics Enables SOCBuilding an Analytics Enables SOC
Building an Analytics Enables SOCSplunk
 
Introducing OpenText Auto-Classification
Introducing OpenText Auto-ClassificationIntroducing OpenText Auto-Classification
Introducing OpenText Auto-ClassificationStephen Ludlow
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs systèmeLudovic Piot
 
Dev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leasewebDev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leasewebMicrosoft
 
Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Shavez Mirza
 
Openstack benelux 2015
Openstack benelux 2015Openstack benelux 2015
Openstack benelux 2015Microsoft
 
Corredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola CreoleCorredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola Creoleguesta96e92
 

Destacado (20)

Random forest using apache mahout
Random forest using apache mahoutRandom forest using apache mahout
Random forest using apache mahout
 
Movie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIsMovie recommendation system using Apache Mahout and Facebook APIs
Movie recommendation system using Apache Mahout and Facebook APIs
 
Apache tika
Apache tikaApache tika
Apache tika
 
Vaklipi Text Analytics Tools
Vaklipi Text Analytics ToolsVaklipi Text Analytics Tools
Vaklipi Text Analytics Tools
 
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq HanayshaVPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
VPN Types, Vulnerabilities & Solutions - Tareq Hanaysha
 
Data Science for Cyber Risk
Data Science for Cyber RiskData Science for Cyber Risk
Data Science for Cyber Risk
 
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
SAP Security - Real life Attacks to Business Processes - Hack in Paris 2015
 
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
Combining R With Java For Data Analysis (Devoxx UK 2015 Session)
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Apache Mahout Algorithms
Apache Mahout AlgorithmsApache Mahout Algorithms
Apache Mahout Algorithms
 
Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session Building an Analytics - Enabled SOC Breakout Session
Building an Analytics - Enabled SOC Breakout Session
 
Text categorization with Lucene and Solr
Text categorization with Lucene and SolrText categorization with Lucene and Solr
Text categorization with Lucene and Solr
 
Building an Analytics Enables SOC
Building an Analytics Enables SOCBuilding an Analytics Enables SOC
Building an Analytics Enables SOC
 
Introducing OpenText Auto-Classification
Introducing OpenText Auto-ClassificationIntroducing OpenText Auto-Classification
Introducing OpenText Auto-Classification
 
PerfUG 3 - perfs système
PerfUG 3 - perfs systèmePerfUG 3 - perfs système
PerfUG 3 - perfs système
 
Dev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leasewebDev opsmeetup sept2013-leaseweb
Dev opsmeetup sept2013-leaseweb
 
Resume Shavez Hasan (1)
Resume Shavez Hasan (1)Resume Shavez Hasan (1)
Resume Shavez Hasan (1)
 
Openstack benelux 2015
Openstack benelux 2015Openstack benelux 2015
Openstack benelux 2015
 
DailyTranslate Brochure
DailyTranslate BrochureDailyTranslate Brochure
DailyTranslate Brochure
 
Corredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola CreoleCorredor Norte De La Isla Hispaniola Creole
Corredor Norte De La Isla Hispaniola Creole
 

Similar a Intro to Mahout

Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)Gautam Rege
 
A tour on Spur for non-VM experts
A tour on Spur for non-VM expertsA tour on Spur for non-VM experts
A tour on Spur for non-VM expertsESUG
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningRobin Anil
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at ScaleEoin Hurrell, PhD
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine LearningTom Maiaroto
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to RankSease
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongPROIDEA
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchJisang Yoon
 
Download Materials
Download MaterialsDownload Materials
Download Materialsbutest
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsGIScRG
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningAI Frontiers
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDatabricks
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the WildTomer Gabel
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarMaarten Balliauw
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Sparkelephantscale
 
DriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous CarsDriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous CarsUniversity of Passau
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETMaarten Balliauw
 

Similar a Intro to Mahout (20)

mahout-cf
mahout-cfmahout-cf
mahout-cf
 
Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)Ruby and rails - Advanced Training (Cybage)
Ruby and rails - Advanced Training (Cybage)
 
A tour on Spur for non-VM experts
A tour on Spur for non-VM expertsA tour on Spur for non-VM experts
A tour on Spur for non-VM experts
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine Learning
 
Recommender Systems at Scale
Recommender Systems at ScaleRecommender Systems at Scale
Recommender Systems at Scale
 
MongoDB & Machine Learning
MongoDB & Machine LearningMongoDB & Machine Learning
MongoDB & Machine Learning
 
Explainability for Learning to Rank
Explainability for Learning to RankExplainability for Learning to Rank
Explainability for Learning to Rank
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
 
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From ScratchPPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
PPT - AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
 
Download Materials
Download MaterialsDownload Materials
Download Materials
 
MEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational ExperimentsMEME – An Integrated Tool For Advanced Computational Experiments
MEME – An Integrated Tool For Advanced Computational Experiments
 
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement LearningYuandong Tian at AI Frontiers : Planning in Reinforcement Learning
Yuandong Tian at AI Frontiers : Planning in Reinforcement Learning
 
AI in Production
AI in ProductionAI in Production
AI in Production
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
Scala in the Wild
Scala in the WildScala in the Wild
Scala in the Wild
 
Exploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinarExploring .NET memory management - JetBrains webinar
Exploring .NET memory management - JetBrains webinar
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
Machine Learning with Spark
Machine Learning with SparkMachine Learning with Spark
Machine Learning with Spark
 
DriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous CarsDriveBuild: Automation of Tests in the Field of Autonomous Cars
DriveBuild: Automation of Tests in the Field of Autonomous Cars
 
DotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NETDotNetFest - Let’s refresh our memory! Memory management in .NET
DotNetFest - Let’s refresh our memory! Memory management in .NET
 

Más de Uri Lavi

JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDDUri Lavi
 
API Best Practices
API Best PracticesAPI Best Practices
API Best PracticesUri Lavi
 
Web Performance 101
Web Performance 101Web Performance 101
Web Performance 101Uri Lavi
 
Cloud Aware Architecture
Cloud Aware ArchitectureCloud Aware Architecture
Cloud Aware ArchitectureUri Lavi
 
Software craftsmanship - 4
Software craftsmanship - 4Software craftsmanship - 4
Software craftsmanship - 4Uri Lavi
 
Software Craftsmanship - 3
Software Craftsmanship - 3Software Craftsmanship - 3
Software Craftsmanship - 3Uri Lavi
 
Software Craftsmanship - 2
Software Craftsmanship - 2Software Craftsmanship - 2
Software Craftsmanship - 2Uri Lavi
 
Software Craftsmanship - 1 Meeting
Software Craftsmanship - 1 MeetingSoftware Craftsmanship - 1 Meeting
Software Craftsmanship - 1 MeetingUri Lavi
 
Effective Code Review
Effective Code ReviewEffective Code Review
Effective Code ReviewUri Lavi
 

Más de Uri Lavi (9)

JavaScript TDD
JavaScript TDDJavaScript TDD
JavaScript TDD
 
API Best Practices
API Best PracticesAPI Best Practices
API Best Practices
 
Web Performance 101
Web Performance 101Web Performance 101
Web Performance 101
 
Cloud Aware Architecture
Cloud Aware ArchitectureCloud Aware Architecture
Cloud Aware Architecture
 
Software craftsmanship - 4
Software craftsmanship - 4Software craftsmanship - 4
Software craftsmanship - 4
 
Software Craftsmanship - 3
Software Craftsmanship - 3Software Craftsmanship - 3
Software Craftsmanship - 3
 
Software Craftsmanship - 2
Software Craftsmanship - 2Software Craftsmanship - 2
Software Craftsmanship - 2
 
Software Craftsmanship - 1 Meeting
Software Craftsmanship - 1 MeetingSoftware Craftsmanship - 1 Meeting
Software Craftsmanship - 1 Meeting
 
Effective Code Review
Effective Code ReviewEffective Code Review
Effective Code Review
 

Último

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Último (20)

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Intro to Mahout

  • 2. Agenda and such…  What is ML (Machine Learning)  ML Common Use Cases  Mahout Overview  Algorithms in Mahout  Mahout Commercial Use  Mahout Summary
  • 3. What is ML “Machine Learning is programming computers to optimize a performance criterion using example data or past experience”  Intro. To Machine Learning by E. Alpaydin
  • 4. ML Common Use Cases  Recommendation
  • 5. ML Common Use Cases  Classification
  • 6. ML Common Use Cases  Clustering
  • 8. Mahout Overview – What ? A mahout is a person who keeps and drives an elephant
  • 9. Mahout Overview – What ?  A scalable machine learning library
  • 10. Mahout Overview – What ?  Began life at 2008 as a subproject of Apache’s Lucene project  On 2010 Mahout became a top-level Apache project in its own right  Implemented in Java  Built upon Apache’s Hadoop (Look ! An Elephant !)
  • 11. Mahout Overview – Why ?  Many open source ML libraries either:  Lack community  Lack documentation and examples  Lack scalability  Lack the Apache license  Are research oriented  Not well tested  Not built over existing production quality libraries
  • 12. Mahout Overview – Why ?  Scalability  Scalable to reasonably large datasets (core algorithms implemented in Map/Reduce, runnable on Hadoop)  Scalable to support your business case (Apache License)  Scalable community
  • 13. Mahout Overview – Why ?  Built over existing production quality libraries
  • 14. Mahout Overview – Use Cases  Mahout currently supports mainly four use cases: 1. Recommendation 2. Clustering 3. Classification 4. Frequent Itemset Mining
  • 15. Mahout Overview - Technical  System Requirements  Linux (or Cygwin on Windows)  Java 1.6.x or greater  Maven 2.0.11 or greater to build the source code  Hadoop 0.2 or greater* * Not all algorithms are implemented to work on Hadoop clusters
  • 16. Algorithms in Mahout  We’ll focus on one example:  Collaborative Filtering (Recommenders)  Yet there are many (many !!) more, you can find them all on https://cwiki.apache.org/confluence/dis play/MAHOUT/Algorithms
  • 17. Algorithms Examples – Recommendation  Help users find items they might like based on historical preferences  Based on example by Sebastian Schelter in “Distributed Itembased Collaborative Filtering with Apache Mahout”
  • 18. Algorithms Examples – Recommendation Alice 5 1 4 Bob ? 2 5 Peter 4 3 2
  • 19. Algorithms Examples – Recommendation  Algorithm  Neighborhood-based approach  Works by finding similarly rated items in the user-item-matrix (e.g. cosine, Pearson- Correlation, Tanimoto Coefficient)  Estimates a user's preference towards an item by looking at his/her preferences towards similar items
  • 20. Algorithms Examples – Recommendation  Prediction: Estimate Bob's preference towards “The Matrix” 1. Look at all items that  a) are similar to “The Matrix“  b) have been rated by Bob => “Alien“, “Inception“ 2. Estimate the unknown preference with a weighted sum
  • 21. Algorithms Examples – Recommendation  MapReduce phase 1  Map – Make user the key (Alice, Matrix, 5) Alice (Matrix, 5) (Alice, Alien, 1) Alice (Alien, 1) (Alice, Inception, 4) Alice (Inception, 4) (Bob, Alien, 2) Bob (Alien, 2) (Bob, Inception, 5) Bob (Inception, 5) (Peter, Matrix, 4) Peter (Matrix, 4) (Peter, Alien, 3) Peter (Alien, 3) (Peter, Inception, 2) Peter (Inception, 2)
  • 22. Algorithms Examples – Recommendation  MapReduce phase 1  Reduce – Create inverted index Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) Bob (Alien, 2) (Inception, 5) Bob (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2)
  • 23. Algorithms Examples – Recommendation  MapReduce phase 2  Map – Isolate all co-occurred ratings (all cases where a user rated both items) Matrix, Alien (5,1) Matrix, Alien (4,3) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Alien, Inception (1,4) Bob (Alien, 2) (Inception, 5) Alien, Inception (2,5) Peter(Matrix, 4) (Alien, 3) (Inception, 2) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 24. Algorithms Examples – Recommendation  MapReduce phase 2  Reduce – Compute similarities Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Matrix, Alien (-0.47) Alien, Inception (2,5) Matrix, Inception (0.47) Alien, Inception (3,2) Alien, Inception(-0.63) Matrix, Inception (4,2) Matrix, Inception (5,4)
  • 25. Algorithms Examples – Recommendation Alice 5 1 4 Bob 1.5 2 5 Peter 4 3 2
  • 26. Mahout Commercial Use  Commercial use
  • 27. Mahout Resources  Mahout website - http://mahout.apache.org/  Introducing Apache Mahout – http://www.ibm.com/developerworks/java/lib rary/j-mahout/  “Mahout In Action” by Sean Owen and Robin Anil
  • 28. Mahout Summary  ML is all over the web today  Mahout is about scalable machine learning  Mahout has functionality for many of today’s common machine learning tasks  MapReduce magic in action
  • 29. Mahout Summary Thank you and good night

Notas del editor

  1. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers (2008)Apache Lucene(TM) is a high-performance, full-featured text search engine library  (2005)