SlideShare una empresa de Scribd logo
1 de 21
MapReduce for Machine
Learning
by
pranya prabhakar
S4 MCA
05
CONTENTS
 Introduction
 Machine Learning
 MapReduce
 ML on MapReduce
 Apache mahout and its installation
steps
 Conclusion
Introduction
• Data increasing rapidly
• It is necessary to process and to analyze the
data
• Analyzing the data by machine as a human
being. …Different
Machine Learning
 Supervised Learning:
Generate a function based upon assigned
labels that maps inputs to desired outputs.
 Unsupervised Learning:
Looks for patterns native to a dataset, and
models it like clustering (e.g. Data mining
&knowledge discovery).
 Reinforcement Learning:
Learns how to act given reward(or
punishment) from the world.
Types of problems
 Classification:
data is labeled means it assigned a class
- Learn a model from a manually classified data
- Predict the class of a new object based on its
features and the learned model
e.g.: spam/non-spam, fraud/non-fraud
 Clustering
data is not labelled,but can be divided into groups based on
similarity
- Group similar looking objects
- Notion of similarity: Distance measure:
eg:organizing pictures by faces without names.
 Regression
 Data is labeled with real value rather than a label
eg:time series data like the price of a stock over time.
Supervised Learning
Algorithms
 Decision Trees
 k-Nearest Neighbours
 Naive Bayes
 Logistic Regression
 Perceptron and Multi-level
Perceptions
 Neural Networks
 SVM and Kernel estimation
Unsupervised Learning
Algorithms
 Clustering
◦ k-Means, MinHash, Hierarchical
Clustering
 Hidden Markov Models
 Feature Extraction methods
 Self-organizing Maps (Neural Nets)
uses
 Spam filtering
 Credit card Fraud detection
 Face recognition(computer vision)
 Speech understanding
 Medical diagnosis
and so on…
Current state of ML libraries
 Lack scalability
 Lack documentations and examples
 Lack Apache licensing
 Are not well tested
 Are Research oriented
 Not built over existing production
quality libraries
 Lack “Deployability”
MapReduce
 It’s a programming framework
 Used for parallel processing over large
data sets
 Application divided into small
fragments of works and distributed
across the cluster
 Computation unit of Hadoop
 Two functions: Map() and Reduce()
Apache mahout
 The starting place for MapReduce-
based machine learning
 A disparate collection of algorithms for
 Recommendation
 Clustering
 Classification
 Frequency item Mining
Mahout installation
 Prerequisites
java
Hadoop
maven
 Java installation
1. sudo apt-get install sun java jdk
2. sudo gedit .bashrc
set JAVA_HOME in .bashrc file
 Installation of maven
1. sudo apt-get install maven2
2. open .bashrc and add the lines
############## Apache-Maven #########
export M2_HOME=/usr/local/apache-maven-3.0.4
export M2=$M2_HOME/bin
export PATH=$M2:$PATH
export JAVA_HOME=$HOME/programs/jdk
Contd..
 Run mvn --version to verify that it is
correctly installed.
 Hadoop installation
single node hadoop cluster has been set up as how java
installed
 Installation of Mahout
1. http://www.apache.org/dyn/closer.cgi/lucene/mahout/
2. Create a folder and move the download file to the created directory
say, mkdir usr/local/mahout
3.Mvn install..it shows as
Example showing 20news
group’s database
Application of Mahout
 Collaborative Filtering
Matrix factorization based recommenders
A user based Recommender
 Clustering
Canopy Clustering
K-Means Clustering
Fuzzy K-Means
Affinity Propagation Clustering
 Classification
Naive Bayes
Conclusion
 By using the mapReduce framework,
we could parallelize a wide range of
machine learning algorithms and
apache mahout provide s a platform
for machine learning in mapReduce
paradigm.

Más contenido relacionado

La actualidad más candente

Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
Benjamin Bengfort
 

La actualidad más candente (20)

What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Meetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo casesMeetup sthlm - introduction to Machine Learning with demo cases
Meetup sthlm - introduction to Machine Learning with demo cases
 
C3 w5
C3 w5C3 w5
C3 w5
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
C3 w2
C3 w2C3 w2
C3 w2
 
Machine learning
Machine learningMachine learning
Machine learning
 
Visual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learningVisual diagnostics for more effective machine learning
Visual diagnostics for more effective machine learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Journey to learn Machine Learning & Neural Network - Basics
Journey to learn Machine Learning & Neural Network - BasicsJourney to learn Machine Learning & Neural Network - Basics
Journey to learn Machine Learning & Neural Network - Basics
 
Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)Evolutionary Design of Swarms (SSCI 2014)
Evolutionary Design of Swarms (SSCI 2014)
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Machine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An IntroductionMachine Learning and Apache Mahout : An Introduction
Machine Learning and Apache Mahout : An Introduction
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Pareto depth for multiple-query image retrieval
Pareto depth for multiple-query image retrievalPareto depth for multiple-query image retrieval
Pareto depth for multiple-query image retrieval
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
Primer to Machine Learning
Primer to Machine LearningPrimer to Machine Learning
Primer to Machine Learning
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 
Software defect estimation using machine learning algorithms
Software defect estimation using machine learning algorithmsSoftware defect estimation using machine learning algorithms
Software defect estimation using machine learning algorithms
 

Similar a mapReduce for machine learning

Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
DataWorks Summit
 

Similar a mapReduce for machine learning (20)

Large Scale Machine learning with Spark
Large Scale Machine learning with SparkLarge Scale Machine learning with Spark
Large Scale Machine learning with Spark
 
Spark m llib
Spark m llibSpark m llib
Spark m llib
 
Machine learning for java developers
Machine learning for java developersMachine learning for java developers
Machine learning for java developers
 
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
DeepLearning001&ApacheMXNetWithSparkForInference-ACNA2018
 
NYC_2016_slides
NYC_2016_slidesNYC_2016_slides
NYC_2016_slides
 
Nose Dive into Apache Spark ML
Nose Dive into Apache Spark MLNose Dive into Apache Spark ML
Nose Dive into Apache Spark ML
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Energy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshopEnergy analytics with Apache Spark workshop
Energy analytics with Apache Spark workshop
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
In Memory Analytics with Apache Spark
In Memory Analytics with Apache SparkIn Memory Analytics with Apache Spark
In Memory Analytics with Apache Spark
 
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
2014-10-20 Large-Scale Machine Learning with Apache Spark at Internet of Thin...
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Combining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache SparkCombining Machine Learning Frameworks with Apache Spark
Combining Machine Learning Frameworks with Apache Spark
 
Combining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache SparkCombining Machine Learning frameworks with Apache Spark
Combining Machine Learning frameworks with Apache Spark
 
Spark1
Spark1Spark1
Spark1
 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130Hadoop Spark Introduction-20150130
Hadoop Spark Introduction-20150130
 
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
Data Science and Deep Learning on Spark with 1/10th of the Code with Roope As...
 

Último

Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 

Último (20)

Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Interdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptxInterdisciplinary_Insights_Data_Collection_Methods.pptx
Interdisciplinary_Insights_Data_Collection_Methods.pptx
 

mapReduce for machine learning

  • 2. CONTENTS  Introduction  Machine Learning  MapReduce  ML on MapReduce  Apache mahout and its installation steps  Conclusion
  • 3. Introduction • Data increasing rapidly • It is necessary to process and to analyze the data • Analyzing the data by machine as a human being. …Different
  • 4. Machine Learning  Supervised Learning: Generate a function based upon assigned labels that maps inputs to desired outputs.  Unsupervised Learning: Looks for patterns native to a dataset, and models it like clustering (e.g. Data mining &knowledge discovery).  Reinforcement Learning: Learns how to act given reward(or punishment) from the world.
  • 5.
  • 6.
  • 7. Types of problems  Classification: data is labeled means it assigned a class - Learn a model from a manually classified data - Predict the class of a new object based on its features and the learned model e.g.: spam/non-spam, fraud/non-fraud  Clustering data is not labelled,but can be divided into groups based on similarity - Group similar looking objects - Notion of similarity: Distance measure: eg:organizing pictures by faces without names.  Regression  Data is labeled with real value rather than a label eg:time series data like the price of a stock over time.
  • 8. Supervised Learning Algorithms  Decision Trees  k-Nearest Neighbours  Naive Bayes  Logistic Regression  Perceptron and Multi-level Perceptions  Neural Networks  SVM and Kernel estimation
  • 9. Unsupervised Learning Algorithms  Clustering ◦ k-Means, MinHash, Hierarchical Clustering  Hidden Markov Models  Feature Extraction methods  Self-organizing Maps (Neural Nets)
  • 10. uses  Spam filtering  Credit card Fraud detection  Face recognition(computer vision)  Speech understanding  Medical diagnosis and so on…
  • 11. Current state of ML libraries  Lack scalability  Lack documentations and examples  Lack Apache licensing  Are not well tested  Are Research oriented  Not built over existing production quality libraries  Lack “Deployability”
  • 12. MapReduce  It’s a programming framework  Used for parallel processing over large data sets  Application divided into small fragments of works and distributed across the cluster  Computation unit of Hadoop  Two functions: Map() and Reduce()
  • 13. Apache mahout  The starting place for MapReduce- based machine learning  A disparate collection of algorithms for  Recommendation  Clustering  Classification  Frequency item Mining
  • 14. Mahout installation  Prerequisites java Hadoop maven  Java installation 1. sudo apt-get install sun java jdk 2. sudo gedit .bashrc set JAVA_HOME in .bashrc file  Installation of maven 1. sudo apt-get install maven2 2. open .bashrc and add the lines ############## Apache-Maven ######### export M2_HOME=/usr/local/apache-maven-3.0.4 export M2=$M2_HOME/bin export PATH=$M2:$PATH export JAVA_HOME=$HOME/programs/jdk
  • 15. Contd..  Run mvn --version to verify that it is correctly installed.
  • 16.  Hadoop installation single node hadoop cluster has been set up as how java installed  Installation of Mahout 1. http://www.apache.org/dyn/closer.cgi/lucene/mahout/ 2. Create a folder and move the download file to the created directory say, mkdir usr/local/mahout 3.Mvn install..it shows as
  • 17.
  • 18.
  • 20. Application of Mahout  Collaborative Filtering Matrix factorization based recommenders A user based Recommender  Clustering Canopy Clustering K-Means Clustering Fuzzy K-Means Affinity Propagation Clustering  Classification Naive Bayes
  • 21. Conclusion  By using the mapReduce framework, we could parallelize a wide range of machine learning algorithms and apache mahout provide s a platform for machine learning in mapReduce paradigm.