SlideShare a Scribd company logo
1 of 25
ACTIVE LEARNING
ASSIGNMENT FOR THE
SUBJECT
“DATA MINING
&
BUSINESS INTELLIGENCE”
CART – Classification & Regression Trees
Guided By : -
Mitali Sonar
Prepared By :-
Hemant H. Chetwani
(130410107010 LY CE-II)
CART ??
CART ??
Classification
And
Regression Trees
CART ??
Classification
 Classification is a data mining technique used for
systematic placement of group membership of data.
 It maps the data into predefined groups or classes
and searches for new patterns.
 For example, you may wish to use classification to
predict whether the weather on a particular day will
be “sunny”, “rainy”, or “cloudy”.
Regression
 Used to predict for individuals on the basis of information
gained from a previous sample of similar individuals.
 For example, A person wants do some savings for future and
then It will be based on his current values and several past
values. He uses a linear regression formula to predict his
future savings.
 It may also be used in modelling the effect of doses in
medicines or agriculture, response of a customer to a mail
and evaluate the risk that the client will not pay back the loan
taken from the bank.
What is CART?
 Classification And Regression Trees
 Developed by Breiman, Friedman, Olshen, Stone in early 80’s.
 Introduced tree-based modeling into the statistical mainstream,
rigorous approach involving cross-validation to select the optimal
tree.
 One of many tree-based modeling techniques.
 CART -- the classic
 CHAID
 C5.0
 Software package variants (SAS, S-Plus, R…)
Philosophy
“Data analysis can be done from a number of different
viewpoints. Tree structured regression offers an interesting
alternative for looking at regression type problems. It has
sometimes given clues to data structure not apparent from a
linear regression analysis. Like any tool, its greatest benefit lies
in its intelligent and sensible application.”
--Breiman, Friedman, Olshen,
Stone
Working
When & What ?
 If the dependent variable is categorical, CART produces a
classification tree. And if the variable is continuous, it
produces a regression tree.
THE KEY IDEA
Recursive Partitioning
 Take all of your data.
 Consider all possible values of all variables.
 Select the variable/value (X=t1) that produces the greatest
“separation” in the target.
 (X=t1) is called a “split”.
 If (X< t1) then send the data to the “left”; otherwise, send data point
to the “right”.
 Now repeat same process on these two “nodes”
You get a “tree”
Note: CART only uses binary splits.
CART GENERATION
STEPS
STEP 1
 Starting with the first variable, CART splits a variable at all of
its possible split points. At each possible split point of the
variable, the sample splits into two binary or child nodes.
 Cases with the “yes” response to the question posed are sent
to the left node and the “no” responses are sent to the right
node.
 It is also possible to define these split based on linear
combinations of variables.
STEP 2
 CART the applies its goodness of a split criteria to each split
point and evaluates the reduction in impurity, or
heterogeneity due to the split.
 This is based on the “Split criterion”. This works in the
following fashion:
Suppose the dependent variable is categorical, taking on
the value of 1 and 2.
The probability distribution of these variables at a given
node t are p(1|t) & p(2|t), respectively.
STEP 2
 A measure of heterogeneity, or impurity at node, i(t) is a
function of these probabilities,
 In the case of categorical dependent variables, CART allows
for a number of specifications of this function.
 The objective is to maximize the reduction in the degree of
heterogeneity in i(t).
i(t) = N ( p(1|t), p(2|t) ).
where, i(t) is a generic function.
STEPS 3, 4 & 5
 It selects the best split on the variable as that split for which
reduction in impurity is the highest, as described in step 2.
 Steps 1-3 are repeated for each of the remaining variables at
the root node. CART then ranks all the “best” splits on each
variable according to the reduction in impurity achieved by
each split.
 It selects the variable and its split point that most reduced
impurity of the root or parent node.
STEPS 6 & 7
 CART then assigns classes to these nodes according to a rule
that minimizes misclassification costs. Although all
classification tree procedures will generate some errors, there
are algorithms within CART designed to minimize these.
 Steps 1-6 are repeatedly applied to each non – terminal child
node at each of the successive stages.
STEP 8
 CART continues the splitting process and builds a large tree.
The large tree can be achieved if the splitting process
continues until every observation constitutes a terminal node.
 Obviously, such a tree will have a large number of terminal
nodes that are either pure or very small in content.
 Having generated a large tree, CART then prunes the result
using cross – validation & creates a sequence of a nested
trees. This also produce a cross – validation error rate & from
this the optimal tree is selected.
Simple Example
 Goal: Classify a record as “is owner” or “not”
 Rule might be “If lot size < 19, and if income > 84.75, then class =
“owner”.
 Recursive partitioning
Repeatedly split the records into two parts so as to achieve
maximum homogeneity within the new parts
 Pruning the tree
Simplify the tree by pruning peripheral branches to avoid overfitting.
Impurity
 Obtain overall impurity measure (weighted avg. of individual
rectangles).
 At each successive stage, compare this measure across all
possible splits in all variables.
 Choose the split that reduces impurity the most.
 Chosen split points become nodes on the tree.
First Split – The Tree
Tree after three splits
Tree after all splits
Summary
 Classification and Regression Trees are an easily
understandable and transparent method for predicting or
classifying new records.
 A tree is a graphical representation of a set of rules.
 Trees must be pruned to avoid over-fitting of the training
data.
 As trees do not make any assumptions about the data
structure, they usually require large samples.
CART – Classification & Regression Trees

More Related Content

What's hot

Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchEshanAgarwal4
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityRushali Deshmukh
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Simplilearn
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersFunctional Imperative
 

What's hot (20)

Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Data mining primitives
Data mining primitivesData mining primitives
Data mining primitives
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Implement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratchImplement principal component analysis (PCA) in python from scratch
Implement principal component analysis (PCA) in python from scratch
 
Data mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarityData mining Measuring similarity and desimilarity
Data mining Measuring similarity and desimilarity
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
 
Hierachical clustering
Hierachical clusteringHierachical clustering
Hierachical clustering
 
Clustering
ClusteringClustering
Clustering
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Decision tree
Decision treeDecision tree
Decision tree
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Introduction to Machine Learning Classifiers
Introduction to Machine Learning ClassifiersIntroduction to Machine Learning Classifiers
Introduction to Machine Learning Classifiers
 

Similar to CART – Classification & Regression Trees

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification treesLeonardo Auslender
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...Yao Wu
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET Journal
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...INFOGAIN PUBLICATION
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 

Similar to CART – Classification & Regression Trees (20)

Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
 
Advanced cart 2007
Advanced cart 2007Advanced cart 2007
Advanced cart 2007
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision tree
Decision tree Decision tree
Decision tree
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...A General Framework for Accurate and Fast Regression by Data Summarization in...
A General Framework for Accurate and Fast Regression by Data Summarization in...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
Introduction to cart_2009
Introduction to cart_2009Introduction to cart_2009
Introduction to cart_2009
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
Ijaems apr-2016-23 Study of Pruning Techniques to Predict Efficient Business ...
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 

More from Hemant Chetwani

More from Hemant Chetwani (12)

Simulated annealing in n - queens
Simulated annealing in n - queensSimulated annealing in n - queens
Simulated annealing in n - queens
 
Channel Capacity and transmission media
Channel Capacity and transmission mediaChannel Capacity and transmission media
Channel Capacity and transmission media
 
Pseudo Random Number
Pseudo Random NumberPseudo Random Number
Pseudo Random Number
 
Types of Compilers
Types of CompilersTypes of Compilers
Types of Compilers
 
Properties and indexers in C#
Properties and indexers in C#Properties and indexers in C#
Properties and indexers in C#
 
Socket & Server Socket
Socket & Server SocketSocket & Server Socket
Socket & Server Socket
 
Pumming Lemma
Pumming LemmaPumming Lemma
Pumming Lemma
 
Hash table
Hash tableHash table
Hash table
 
First pass of assembler
First pass of assemblerFirst pass of assembler
First pass of assembler
 
130410107010 exception handling
130410107010 exception handling130410107010 exception handling
130410107010 exception handling
 
Counters &amp; time delay
Counters &amp; time delayCounters &amp; time delay
Counters &amp; time delay
 
Bucket sort
Bucket sortBucket sort
Bucket sort
 

Recently uploaded

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"mphochane1998
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Servicemeghakumariji156
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...Health
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.Kamal Acharya
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdfKamal Acharya
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Call Girls Mumbai
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksMagic Marks
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 

Recently uploaded (20)

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best ServiceTamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
Tamil Call Girls Bhayandar WhatsApp +91-9930687706, Best Service
 
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
+97470301568>> buy weed in qatar,buy thc oil qatar,buy weed and vape oil in d...
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Employee leave management system project.
Employee leave management system project.Employee leave management system project.
Employee leave management system project.
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Learn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic MarksLearn the concepts of Thermodynamics on Magic Marks
Learn the concepts of Thermodynamics on Magic Marks
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 

CART – Classification & Regression Trees

  • 1. ACTIVE LEARNING ASSIGNMENT FOR THE SUBJECT “DATA MINING & BUSINESS INTELLIGENCE” CART – Classification & Regression Trees Guided By : - Mitali Sonar Prepared By :- Hemant H. Chetwani (130410107010 LY CE-II)
  • 5. Classification  Classification is a data mining technique used for systematic placement of group membership of data.  It maps the data into predefined groups or classes and searches for new patterns.  For example, you may wish to use classification to predict whether the weather on a particular day will be “sunny”, “rainy”, or “cloudy”.
  • 6. Regression  Used to predict for individuals on the basis of information gained from a previous sample of similar individuals.  For example, A person wants do some savings for future and then It will be based on his current values and several past values. He uses a linear regression formula to predict his future savings.  It may also be used in modelling the effect of doses in medicines or agriculture, response of a customer to a mail and evaluate the risk that the client will not pay back the loan taken from the bank.
  • 7. What is CART?  Classification And Regression Trees  Developed by Breiman, Friedman, Olshen, Stone in early 80’s.  Introduced tree-based modeling into the statistical mainstream, rigorous approach involving cross-validation to select the optimal tree.  One of many tree-based modeling techniques.  CART -- the classic  CHAID  C5.0  Software package variants (SAS, S-Plus, R…)
  • 8. Philosophy “Data analysis can be done from a number of different viewpoints. Tree structured regression offers an interesting alternative for looking at regression type problems. It has sometimes given clues to data structure not apparent from a linear regression analysis. Like any tool, its greatest benefit lies in its intelligent and sensible application.” --Breiman, Friedman, Olshen, Stone
  • 10. When & What ?  If the dependent variable is categorical, CART produces a classification tree. And if the variable is continuous, it produces a regression tree.
  • 11. THE KEY IDEA Recursive Partitioning  Take all of your data.  Consider all possible values of all variables.  Select the variable/value (X=t1) that produces the greatest “separation” in the target.  (X=t1) is called a “split”.  If (X< t1) then send the data to the “left”; otherwise, send data point to the “right”.  Now repeat same process on these two “nodes” You get a “tree” Note: CART only uses binary splits.
  • 13. STEP 1  Starting with the first variable, CART splits a variable at all of its possible split points. At each possible split point of the variable, the sample splits into two binary or child nodes.  Cases with the “yes” response to the question posed are sent to the left node and the “no” responses are sent to the right node.  It is also possible to define these split based on linear combinations of variables.
  • 14. STEP 2  CART the applies its goodness of a split criteria to each split point and evaluates the reduction in impurity, or heterogeneity due to the split.  This is based on the “Split criterion”. This works in the following fashion: Suppose the dependent variable is categorical, taking on the value of 1 and 2. The probability distribution of these variables at a given node t are p(1|t) & p(2|t), respectively.
  • 15. STEP 2  A measure of heterogeneity, or impurity at node, i(t) is a function of these probabilities,  In the case of categorical dependent variables, CART allows for a number of specifications of this function.  The objective is to maximize the reduction in the degree of heterogeneity in i(t). i(t) = N ( p(1|t), p(2|t) ). where, i(t) is a generic function.
  • 16. STEPS 3, 4 & 5  It selects the best split on the variable as that split for which reduction in impurity is the highest, as described in step 2.  Steps 1-3 are repeated for each of the remaining variables at the root node. CART then ranks all the “best” splits on each variable according to the reduction in impurity achieved by each split.  It selects the variable and its split point that most reduced impurity of the root or parent node.
  • 17. STEPS 6 & 7  CART then assigns classes to these nodes according to a rule that minimizes misclassification costs. Although all classification tree procedures will generate some errors, there are algorithms within CART designed to minimize these.  Steps 1-6 are repeatedly applied to each non – terminal child node at each of the successive stages.
  • 18. STEP 8  CART continues the splitting process and builds a large tree. The large tree can be achieved if the splitting process continues until every observation constitutes a terminal node.  Obviously, such a tree will have a large number of terminal nodes that are either pure or very small in content.  Having generated a large tree, CART then prunes the result using cross – validation & creates a sequence of a nested trees. This also produce a cross – validation error rate & from this the optimal tree is selected.
  • 19. Simple Example  Goal: Classify a record as “is owner” or “not”  Rule might be “If lot size < 19, and if income > 84.75, then class = “owner”.  Recursive partitioning Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new parts  Pruning the tree Simplify the tree by pruning peripheral branches to avoid overfitting.
  • 20. Impurity  Obtain overall impurity measure (weighted avg. of individual rectangles).  At each successive stage, compare this measure across all possible splits in all variables.  Choose the split that reduces impurity the most.  Chosen split points become nodes on the tree.
  • 21. First Split – The Tree
  • 23. Tree after all splits
  • 24. Summary  Classification and Regression Trees are an easily understandable and transparent method for predicting or classifying new records.  A tree is a graphical representation of a set of rules.  Trees must be pruned to avoid over-fitting of the training data.  As trees do not make any assumptions about the data structure, they usually require large samples.