SlideShare una empresa de Scribd logo
1 de 37
Descargar para leer sin conexión
Practical Machine Learning
  A Tutorial on Apache Mahout


               Biju B
         NLP R&D Division
         365Media Pvt. Ltd.
         bijub@365Media.in

             FOSSMEET NITC,
                 Calicut


          4-6 February 2011




   Biju B & Jaganadh G   Practical Machine Learning
nlp r d $ whoweare




     Working in Natural Language Processing (NLP), Machine Learning,
     Data Mining
     Passionate about Free and Open source :-)
     When gets free time teaches Python and blogs at
     http://jaganadhg.freeflux.net/blog and contributes to
     Openstreetmap
     Works for 365Media Pvt. Ltd. Coimbatore India.
     twitter handle : @jaganadhg, @bijub




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning




  Machine Learning
  Machine learning is a subfield of artificial intelligence (AI) concerned with
  algorithms that allow computers to learn.

      This talk is not aimed to give introduction about Machine Learning
      Dont expect some mathy equations here




                         Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering




                      Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis




                       Biju B & Jaganadh G   Practical Machine Learning
Machine Learning and Our Life



     Do you think that Machine Learning has any impact in our life ??
     Yes
     In our day to day life we may use many Machine Learning powered
     tools
     Recommendation Engines
     Clustering
     Classification , Spam Filtering
     Sentiment Analysis
     Fraud Detraction




                        Biju B & Jaganadh G   Practical Machine Learning
Mahout



  Mahout
  Open Source project by Apache Foundation
  Goal of this project is to build scalable machine learning libraries




                          Biju B & Jaganadh G   Practical Machine Learning
Mahout




  Mahout
  Mahout: a person who drives elephant ;-)
  The name comes from the project’s use of Apache Hadoop.




                       Biju B & Jaganadh G   Practical Machine Learning
Why a new library ?



  There are more than 30 Java libraries/ tools available for Machine
  Learning.
  Weka , Mallet, Classifier4j, Rapidminer ........
      Large Amount of data processing is not an easy task
      Machine Learning tools are supposed to produce quick results
      If the amount of data is too large it is not easy to process with a
      single machine (Even if it is powerful)
      Mahout is scalable: the core algorithms in Mahout are implemented
      on top of Apache Hadoop using the map/reduce paradigm




                        Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout




                Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Algorithms in Apache Mahout



     Collaborative Filtering
     User and Item based recommenders
     K-Means, Fuzzy K-Means clustering
     Mean Shift clustering
     Dirichlet process clustering
     Latent Dirichlet Allocation
     Singular value decomposition
     Parallel Frequent Pattern mining
     Complementary Naive Bayes classifier
     Random forest decision tree based classifier




                       Biju B & Jaganadh G   Practical Machine Learning
Recommendation




    Filter information based on user preference
    Searching a large set of people and finding a smaller set with tastes
    similar to you
    e.g :- Amazon’s book recommendation , Netflix movie
    recommendation




                      Biju B & Jaganadh G   Practical Machine Learning
Document Classification




     Classify documents based on its content
     e.g: - spam filtering,priority inbox




                       Biju B & Jaganadh G   Practical Machine Learning
Demo


       Building recommendations engines with Mahout
       Document Classification with Mahout




                       Biju B & Jaganadh G   Practical Machine Learning
Reference




            Biju B & Jaganadh G   Practical Machine Learning
Reference


     Mahout in Action - Book by Sean Owen and Robin Anil, published
     by Manning Publications.
     Taming Text - By Grant Ingersoll and Tom Morton, published by
     Manning Publications.
     Introducing Apache Mahout - Grant Ingersoll - Intro to Apache
     Mahout focused on clustering, classification and collaborative
     filtering. https://www.ibm.com/developerworks/java/library/j-
     mahout/index.html
     Programming Collective Intelligence: Building Smart Web 2.0
     Applications
     http://www.amazon.com/Programming-Collective-Intelligence-
     Building-Applications/dp/0596529325




                      Biju B & Jaganadh G   Practical Machine Learning
Useful Resources




     Apache Mahout Site http://mahout.apache.org/
     Apache Mahout Mailing List user@mahout.apache.org
     The code which I used for Mahout demo is available at
     http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
     Twenty News Group data set
     http://people.csail.mit.edu/jrennie/20Newsgroups/20news-
     bydate.tar.gz




                      Biju B & Jaganadh G   Practical Machine Learning
Questions ??




               Biju B & Jaganadh G   Practical Machine Learning
Acknowledgments



  Thanks to :
      Manning Publications for Review Copy of the book ”Mahout in
      Action”
      Apache Mahout mailing list members
      Ted Dunning and Robin Anil for suggestions
      @chelakkandupoda for review and criticism
      Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and
      encouragement




                       Biju B & Jaganadh G   Practical Machine Learning
Finally




          Biju B & Jaganadh G   Practical Machine Learning

Más contenido relacionado

Similar a Practical Machine Learning with Apache Mahout

Python Machine Learning Tutorial
Python Machine Learning TutorialPython Machine Learning Tutorial
Python Machine Learning Tutorialgrinu
 
Brief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptxBrief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptxkprasad8
 
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxjameshodgkinson9
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorialssuser8a512c
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorialAshokKumarC18
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 RoadmapDr. Mohan K. Bavirisetty
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...Edureka!
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Rik Marselis
 
JAM23-24 session 2 .pptx
JAM23-24 session 2 .pptxJAM23-24 session 2 .pptx
JAM23-24 session 2 .pptxAbrarSharif2
 
VIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANTVIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANTIRJET Journal
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-storyChetan Khatri
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...AgileNetwork
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaEdureka!
 
power-of-generative-ai.pdf
power-of-generative-ai.pdfpower-of-generative-ai.pdf
power-of-generative-ai.pdfyaswantuj99
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with PythonBenjamin Bengfort
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderProduct School
 

Similar a Practical Machine Learning with Apache Mahout (20)

BotConf..pptx
BotConf..pptxBotConf..pptx
BotConf..pptx
 
Cognitive Automation - Your AI Coworker
Cognitive Automation - Your AI CoworkerCognitive Automation - Your AI Coworker
Cognitive Automation - Your AI Coworker
 
Python Machine Learning Tutorial
Python Machine Learning TutorialPython Machine Learning Tutorial
Python Machine Learning Tutorial
 
AI Training in Lucknow
AI Training in LucknowAI Training in Lucknow
AI Training in Lucknow
 
Projects
ProjectsProjects
Projects
 
Brief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptxBrief Presentation on Machine Learning In Power BI.pptx
Brief Presentation on Machine Learning In Power BI.pptx
 
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptxSession 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
Session 1 AI literacy What is AI and how do we use it (Slide Presentation).pptx
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Machine learning tutorial
Machine learning tutorialMachine learning tutorial
Machine learning tutorial
 
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 RoadmapCitizen AI Engineer Program 2018 CAI 500  Fast Track AI Week1 Roadmap
Citizen AI Engineer Program 2018 CAI 500 Fast Track AI Week1 Roadmap
 
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
AI vs Machine Learning vs Deep Learning | Machine Learning Training with Pyth...
 
Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...Testing of artificial intelligence; AI quality engineering skils - an introdu...
Testing of artificial intelligence; AI quality engineering skils - an introdu...
 
JAM23-24 session 2 .pptx
JAM23-24 session 2 .pptxJAM23-24 session 2 .pptx
JAM23-24 session 2 .pptx
 
VIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANTVIRTUAL GYM ASSISTANT
VIRTUAL GYM ASSISTANT
 
Pycon india-2016-success-story
Pycon india-2016-success-storyPycon india-2016-success-story
Pycon india-2016-success-story
 
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
Agile Network India | Agility Day @Noida | Enterprise agility through enginee...
 
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | EdurekaSupervised vs Unsupervised vs Reinforcement Learning | Edureka
Supervised vs Unsupervised vs Reinforcement Learning | Edureka
 
power-of-generative-ai.pdf
power-of-generative-ai.pdfpower-of-generative-ai.pdf
power-of-generative-ai.pdf
 
Building Data Apps with Python
Building Data Apps with PythonBuilding Data Apps with Python
Building Data Apps with Python
 
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM LeaderWebinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
Webinar: Using GenAI for Increasing Productivity in PM by Amazon PM Leader
 

Más de Jaganadh Gopinadhan

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment AnalysisJaganadh Gopinadhan
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - IJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language ProcessingJaganadh Gopinadhan
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with PerJaganadh Gopinadhan
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Jaganadh Gopinadhan
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Jaganadh Gopinadhan
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestJaganadh Gopinadhan
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine TranslationJaganadh Gopinadhan
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for OooJaganadh Gopinadhan
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands Jaganadh Gopinadhan
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Jaganadh Gopinadhan
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Jaganadh Gopinadhan
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Jaganadh Gopinadhan
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...Jaganadh Gopinadhan
 

Más de Jaganadh Gopinadhan (20)

Introduction to Sentiment Analysis
Introduction to Sentiment AnalysisIntroduction to Sentiment Analysis
Introduction to Sentiment Analysis
 
Elements of Text Mining Part - I
Elements of Text Mining Part - IElements of Text Mining Part - I
Elements of Text Mining Part - I
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Practical Natural Language Processing
Practical Natural Language ProcessingPractical Natural Language Processing
Practical Natural Language Processing
 
Natural Language Processing with Per
Natural Language Processing with PerNatural Language Processing with Per
Natural Language Processing with Per
 
Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org Indian Language Spellchecker Development for OpenOffice.org
Indian Language Spellchecker Development for OpenOffice.org
 
Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic Sanskrit and Computational Linguistic
Sanskrit and Computational Linguistic
 
Script to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latestScript to Sentiment : on future of Language TechnologyMysore latest
Script to Sentiment : on future of Language TechnologyMysore latest
 
A tutorial on Machine Translation
A tutorial on Machine TranslationA tutorial on Machine Translation
A tutorial on Machine Translation
 
Linguistic localization framework for Ooo
Linguistic localization framework for OooLinguistic localization framework for Ooo
Linguistic localization framework for Ooo
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Ilucbe python v1.2
Ilucbe python v1.2Ilucbe python v1.2
Ilucbe python v1.2
 
Social Media Analytics
Social Media Analytics Social Media Analytics
Social Media Analytics
 
Success Factor
Success Factor Success Factor
Success Factor
 
ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands ntroduction to GNU/Linux Linux Installation and Basic Commands
ntroduction to GNU/Linux Linux Installation and Basic Commands
 
Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python Let’s Learn Python An introduction to Python
Let’s Learn Python An introduction to Python
 
Introduction to Free and Open Source Software
Introduction to Free and Open Source Software Introduction to Free and Open Source Software
Introduction to Free and Open Source Software
 
Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges Opinion Mining and Sentiment Analysis Issues and Challenges
Opinion Mining and Sentiment Analysis Issues and Challenges
 
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
What they think about my brand/product ?!?!? An Introduction to Sentiment Ana...
 
Hdfs
HdfsHdfs
Hdfs
 

Último

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Último (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Practical Machine Learning with Apache Mahout

  • 1. Practical Machine Learning A Tutorial on Apache Mahout Biju B NLP R&D Division 365Media Pvt. Ltd. bijub@365Media.in FOSSMEET NITC, Calicut 4-6 February 2011 Biju B & Jaganadh G Practical Machine Learning
  • 2. nlp r d $ whoweare Working in Natural Language Processing (NLP), Machine Learning, Data Mining Passionate about Free and Open source :-) When gets free time teaches Python and blogs at http://jaganadhg.freeflux.net/blog and contributes to Openstreetmap Works for 365Media Pvt. Ltd. Coimbatore India. twitter handle : @jaganadhg, @bijub Biju B & Jaganadh G Practical Machine Learning
  • 3. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 4. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. Biju B & Jaganadh G Practical Machine Learning
  • 5. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Biju B & Jaganadh G Practical Machine Learning
  • 6. Machine Learning Machine Learning Machine learning is a subfield of artificial intelligence (AI) concerned with algorithms that allow computers to learn. This talk is not aimed to give introduction about Machine Learning Dont expect some mathy equations here Biju B & Jaganadh G Practical Machine Learning
  • 7. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Biju B & Jaganadh G Practical Machine Learning
  • 8. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes Biju B & Jaganadh G Practical Machine Learning
  • 9. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Biju B & Jaganadh G Practical Machine Learning
  • 10. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Biju B & Jaganadh G Practical Machine Learning
  • 11. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Biju B & Jaganadh G Practical Machine Learning
  • 12. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Biju B & Jaganadh G Practical Machine Learning
  • 13. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Biju B & Jaganadh G Practical Machine Learning
  • 14. Machine Learning and Our Life Do you think that Machine Learning has any impact in our life ?? Yes In our day to day life we may use many Machine Learning powered tools Recommendation Engines Clustering Classification , Spam Filtering Sentiment Analysis Fraud Detraction Biju B & Jaganadh G Practical Machine Learning
  • 15. Mahout Mahout Open Source project by Apache Foundation Goal of this project is to build scalable machine learning libraries Biju B & Jaganadh G Practical Machine Learning
  • 16. Mahout Mahout Mahout: a person who drives elephant ;-) The name comes from the project’s use of Apache Hadoop. Biju B & Jaganadh G Practical Machine Learning
  • 17. Why a new library ? There are more than 30 Java libraries/ tools available for Machine Learning. Weka , Mallet, Classifier4j, Rapidminer ........ Large Amount of data processing is not an easy task Machine Learning tools are supposed to produce quick results If the amount of data is too large it is not easy to process with a single machine (Even if it is powerful) Mahout is scalable: the core algorithms in Mahout are implemented on top of Apache Hadoop using the map/reduce paradigm Biju B & Jaganadh G Practical Machine Learning
  • 18. Algorithms in Apache Mahout Biju B & Jaganadh G Practical Machine Learning
  • 19. Algorithms in Apache Mahout Collaborative Filtering Biju B & Jaganadh G Practical Machine Learning
  • 20. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders Biju B & Jaganadh G Practical Machine Learning
  • 21. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Biju B & Jaganadh G Practical Machine Learning
  • 22. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Biju B & Jaganadh G Practical Machine Learning
  • 23. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Biju B & Jaganadh G Practical Machine Learning
  • 24. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Biju B & Jaganadh G Practical Machine Learning
  • 25. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Biju B & Jaganadh G Practical Machine Learning
  • 26. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Biju B & Jaganadh G Practical Machine Learning
  • 27. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Biju B & Jaganadh G Practical Machine Learning
  • 28. Algorithms in Apache Mahout Collaborative Filtering User and Item based recommenders K-Means, Fuzzy K-Means clustering Mean Shift clustering Dirichlet process clustering Latent Dirichlet Allocation Singular value decomposition Parallel Frequent Pattern mining Complementary Naive Bayes classifier Random forest decision tree based classifier Biju B & Jaganadh G Practical Machine Learning
  • 29. Recommendation Filter information based on user preference Searching a large set of people and finding a smaller set with tastes similar to you e.g :- Amazon’s book recommendation , Netflix movie recommendation Biju B & Jaganadh G Practical Machine Learning
  • 30. Document Classification Classify documents based on its content e.g: - spam filtering,priority inbox Biju B & Jaganadh G Practical Machine Learning
  • 31. Demo Building recommendations engines with Mahout Document Classification with Mahout Biju B & Jaganadh G Practical Machine Learning
  • 32. Reference Biju B & Jaganadh G Practical Machine Learning
  • 33. Reference Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications. Taming Text - By Grant Ingersoll and Tom Morton, published by Manning Publications. Introducing Apache Mahout - Grant Ingersoll - Intro to Apache Mahout focused on clustering, classification and collaborative filtering. https://www.ibm.com/developerworks/java/library/j- mahout/index.html Programming Collective Intelligence: Building Smart Web 2.0 Applications http://www.amazon.com/Programming-Collective-Intelligence- Building-Applications/dp/0596529325 Biju B & Jaganadh G Practical Machine Learning
  • 34. Useful Resources Apache Mahout Site http://mahout.apache.org/ Apache Mahout Mailing List user@mahout.apache.org The code which I used for Mahout demo is available at http://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/ Twenty News Group data set http://people.csail.mit.edu/jrennie/20Newsgroups/20news- bydate.tar.gz Biju B & Jaganadh G Practical Machine Learning
  • 35. Questions ?? Biju B & Jaganadh G Practical Machine Learning
  • 36. Acknowledgments Thanks to : Manning Publications for Review Copy of the book ”Mahout in Action” Apache Mahout mailing list members Ted Dunning and Robin Anil for suggestions @chelakkandupoda for review and criticism Mukundhanchari R&D Director 365Media Pvt. Ltd. for support and encouragement Biju B & Jaganadh G Practical Machine Learning
  • 37. Finally Biju B & Jaganadh G Practical Machine Learning