SlideShare a Scribd company logo
1 of 16
 Random forest is a classifier
 An ensemble classifier using many decision tree models.
 Can be used for classification and regression
 Accuracy and variable importance information is provided with the result
 A random forest is a collection of unpruned CART-like trees following specific
rules for
 Tree growing
 Tree combination
 Self-testing
 Post-processing
 Trees are grown using binary partitioning
 Similar to decision tree with a few differences
 For each split-point, the search is not over all variables but just over a part of variables
 No pruning necessary. Trees can be grown until each node contain just very few
observations
 Advantages over decision tree
 Better prediction (in general)
 No parameter tuning necessary with RF
 Terminology
 Training size (N)
 Total number of attributes (M)
 Number of attributes used (m)
 Total number of trees (n)
 A random seed is chosen which pulls out at random a collection of samples from
training dataset while maintaining the class distribution
 With this selected dataset, a random set of attributes from original dataset is
chosen based on user defined values. All the input variables are not considered
because of enormous computation and high chances of over fitting
 In a dataset, where M is the total number of input attributes in the dataset, only
m attributes are chosen at random for each tree where m<M
 The attribute for this set creates the best possible split using the gini index to
develop a decision tree model. This process repeats for each of the branches until
the termination condition stating that the leaves are the nodes that are too small
to split.
 Information from random forest
 Classification accuracy
 Variable importance
 Outliers (Classification)
 Missing Data Estimation
 Error Rates for Random Forest Object
 Advantages
 No need for pruning trees
 Accuracy and variable importance generated automatically
 Overfitting is not a problem
 Not very sensitive to outliers in training data
 Easy to set parameters
 Limitations
 Regression cant predict beyond range in the training data
 Extreme values are not predicted accurately
 Applications
 Classification
 Land cover classification
 Cloud screening
 Regression
 Continuous field mapping
 Biomass mapping
 Efficient use of Multi-Core Technology
 Though it is OS dependent, but the usage of Hadoop guarantees efficient use of
multi-core
 Its a technique from machine learning for learning a linear classifier from labelled
examples
 Similar to perceptron algorithm
 While perceptron algorithm uses additive weight-update scheme, winnowing uses
a multiplicative weight-update scheme
 Performs well when many of the features given to the learner turns out to be
irrelevant
 During training, its shown a sequence of positive and negative examples. From
these it learn a decision hyperplane which can be used to novel examples as
positive or negative
 Uses linear threshold function (like the perceptron training algorithm) as
hypothesis and performs incremental updates to its current hypothesis
 Initialize the weights w1,…….wn to 1
 Both winnow and perceptron algorithm uses the same classification scheme
 The winnowing algorithms differs form the perceptron algorithm in its updating
scheme.
 When misclassifying a positive training example x (i.e. a prediction was negative because
w.x was too small)
 When misclassifying a negative training example x (i.e. Prediction was positive because
w.x was too large)
SPAM Example – each email is a Boolean vector indicating which phase appears
and which don’t
SPAM if at least one of the phrase in S is present
 Initialize the weights w1, …..wn = 1 on the n variables
 Given an example x = (x1,……..xn), output 1 if
 Else output 0
 If the algorithm makes a mistake:
 On positive – if it predicts 0 when f(x)=1, then for each xi equal to 1, double the value of
wi
 On negative – if it predicts 1 when f(x)=0, then for each xi equal to 1 cut the value of wi
in half
 The principle of maximum entropy states that, subject to precisely stated prior
data, the probability distribution which best represents the current state of
knowledge is the one with the largest entropy.
 Commonly used in Natural Language Processing, speech and Information
Retrieval
 What is maximum entropy classifier?
 Probabilistic classifier which belongs to the class of exponential models
 Does not assume the features that are conditionally independent of each other
 Based on the principle of maximum entropy and forms all models that fit our training
data and selects the one which has the largest entropy
 A piece of information is testable if it can be determined whether a given
distribution is consistent with it
 The expectation of variable x is 2.87
 And p2 + p3 > 0.6
 Are statements of testable information
 Maximum entropy procedure consist of seeking the probability distribution which
maximizes information entropy, subject to constrains of the information.
 Entropy maximization takes place under a single constrain: the sum of
probabilities must be one
 When to use maximum entropy?
 Since it makes minimum assumptions, we use it when we don’t know about the prior
distribution
 Used when we cannot assume conditional independence of the features
 The principle of maximum entropy is commonly applied in two ways to inferential
problems
 Prior Probabilities: its often used to obtain prior probability distribution for Bayesian
inference
 Maximum Entropy Models: involved in model specifications which are widely used in
natural language processing. Ex. Logistic regression

More Related Content

What's hot

Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Simplilearn
 

What's hot (20)

Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Decision trees & random forests
Decision trees & random forestsDecision trees & random forests
Decision trees & random forests
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Random forest
Random forestRandom forest
Random forest
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision tree
Decision treeDecision tree
Decision tree
 
Principal component analysis and lda
Principal component analysis and ldaPrincipal component analysis and lda
Principal component analysis and lda
 
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
Decision Tree Algorithm With Example | Decision Tree In Machine Learning | Da...
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 

Similar to Random forest

Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
Matthew Magistrado
 
Download It
Download ItDownload It
Download It
butest
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
butest
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
kevinlan
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
butest
 

Similar to Random forest (20)

13 random forest
13 random forest13 random forest
13 random forest
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
Efficient Disease Classifier Using Data Mining Techniques: Refinement of Rand...
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 
Download It
Download ItDownload It
Download It
 
dm1.pdf
dm1.pdfdm1.pdf
dm1.pdf
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHESIMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
IMAGE CLASSIFICATION USING DIFFERENT CLASSICAL APPROACHES
 
Machine learning and reinforcement learning
Machine learning and reinforcement learningMachine learning and reinforcement learning
Machine learning and reinforcement learning
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
03 Data Mining Techniques
03 Data Mining Techniques03 Data Mining Techniques
03 Data Mining Techniques
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
Data Mining in Market Research
Data Mining in Market ResearchData Mining in Market Research
Data Mining in Market Research
 
Data Mining In Market Research
Data Mining In Market ResearchData Mining In Market Research
Data Mining In Market Research
 
PNN and inversion-B
PNN and inversion-BPNN and inversion-B
PNN and inversion-B
 
32_Nov07_MachineLear..
32_Nov07_MachineLear..32_Nov07_MachineLear..
32_Nov07_MachineLear..
 

More from Ujjawal (10)

fMRI in machine learning
fMRI in machine learningfMRI in machine learning
fMRI in machine learning
 
Neural network for machine learning
Neural network for machine learningNeural network for machine learning
Neural network for machine learning
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 
Support vector machines
Support vector machinesSupport vector machines
Support vector machines
 
Vector space classification
Vector space classificationVector space classification
Vector space classification
 
Scoring, term weighting and the vector space
Scoring, term weighting and the vector spaceScoring, term weighting and the vector space
Scoring, term weighting and the vector space
 
Bayes’ theorem and logistic regression
Bayes’ theorem and logistic regressionBayes’ theorem and logistic regression
Bayes’ theorem and logistic regression
 
Introduction to data mining
Introduction to data miningIntroduction to data mining
Introduction to data mining
 

Recently uploaded

Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
ranjankumarbehera14
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Recently uploaded (20)

Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book nowVadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
Vadodara 💋 Call Girl 7737669865 Call Girls in Vadodara Escort service book now
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 

Random forest

  • 1.
  • 2.  Random forest is a classifier  An ensemble classifier using many decision tree models.  Can be used for classification and regression  Accuracy and variable importance information is provided with the result  A random forest is a collection of unpruned CART-like trees following specific rules for  Tree growing  Tree combination  Self-testing  Post-processing  Trees are grown using binary partitioning
  • 3.  Similar to decision tree with a few differences  For each split-point, the search is not over all variables but just over a part of variables  No pruning necessary. Trees can be grown until each node contain just very few observations  Advantages over decision tree  Better prediction (in general)  No parameter tuning necessary with RF  Terminology  Training size (N)  Total number of attributes (M)  Number of attributes used (m)  Total number of trees (n)
  • 4.  A random seed is chosen which pulls out at random a collection of samples from training dataset while maintaining the class distribution  With this selected dataset, a random set of attributes from original dataset is chosen based on user defined values. All the input variables are not considered because of enormous computation and high chances of over fitting  In a dataset, where M is the total number of input attributes in the dataset, only m attributes are chosen at random for each tree where m<M  The attribute for this set creates the best possible split using the gini index to develop a decision tree model. This process repeats for each of the branches until the termination condition stating that the leaves are the nodes that are too small to split.
  • 5.  Information from random forest  Classification accuracy  Variable importance  Outliers (Classification)  Missing Data Estimation  Error Rates for Random Forest Object  Advantages  No need for pruning trees  Accuracy and variable importance generated automatically  Overfitting is not a problem  Not very sensitive to outliers in training data  Easy to set parameters
  • 6.  Limitations  Regression cant predict beyond range in the training data  Extreme values are not predicted accurately  Applications  Classification  Land cover classification  Cloud screening  Regression  Continuous field mapping  Biomass mapping
  • 7.  Efficient use of Multi-Core Technology  Though it is OS dependent, but the usage of Hadoop guarantees efficient use of multi-core
  • 8.  Its a technique from machine learning for learning a linear classifier from labelled examples  Similar to perceptron algorithm  While perceptron algorithm uses additive weight-update scheme, winnowing uses a multiplicative weight-update scheme  Performs well when many of the features given to the learner turns out to be irrelevant  During training, its shown a sequence of positive and negative examples. From these it learn a decision hyperplane which can be used to novel examples as positive or negative  Uses linear threshold function (like the perceptron training algorithm) as hypothesis and performs incremental updates to its current hypothesis
  • 9.  Initialize the weights w1,…….wn to 1  Both winnow and perceptron algorithm uses the same classification scheme  The winnowing algorithms differs form the perceptron algorithm in its updating scheme.  When misclassifying a positive training example x (i.e. a prediction was negative because w.x was too small)  When misclassifying a negative training example x (i.e. Prediction was positive because w.x was too large)
  • 10. SPAM Example – each email is a Boolean vector indicating which phase appears and which don’t SPAM if at least one of the phrase in S is present
  • 11.
  • 12.  Initialize the weights w1, …..wn = 1 on the n variables  Given an example x = (x1,……..xn), output 1 if  Else output 0  If the algorithm makes a mistake:  On positive – if it predicts 0 when f(x)=1, then for each xi equal to 1, double the value of wi  On negative – if it predicts 1 when f(x)=0, then for each xi equal to 1 cut the value of wi in half
  • 13.
  • 14.  The principle of maximum entropy states that, subject to precisely stated prior data, the probability distribution which best represents the current state of knowledge is the one with the largest entropy.  Commonly used in Natural Language Processing, speech and Information Retrieval  What is maximum entropy classifier?  Probabilistic classifier which belongs to the class of exponential models  Does not assume the features that are conditionally independent of each other  Based on the principle of maximum entropy and forms all models that fit our training data and selects the one which has the largest entropy
  • 15.  A piece of information is testable if it can be determined whether a given distribution is consistent with it  The expectation of variable x is 2.87  And p2 + p3 > 0.6  Are statements of testable information  Maximum entropy procedure consist of seeking the probability distribution which maximizes information entropy, subject to constrains of the information.  Entropy maximization takes place under a single constrain: the sum of probabilities must be one
  • 16.  When to use maximum entropy?  Since it makes minimum assumptions, we use it when we don’t know about the prior distribution  Used when we cannot assume conditional independence of the features  The principle of maximum entropy is commonly applied in two ways to inferential problems  Prior Probabilities: its often used to obtain prior probability distribution for Bayesian inference  Maximum Entropy Models: involved in model specifications which are widely used in natural language processing. Ex. Logistic regression