ML Decision Tree_2.pptx

BCSE209L Machine Learning
Module- : Decision Trees
Dr. R. Jothi
Decision Tree Induction Algorithms
■ ID3
■ Can handle both numerical and categorical features
■ Feature selection – Entropy
■ CART (continuous features and continuous
label)
■ Can handle both numerical and categorical features
■ Feature selection – Gini
■ Generally used for both regression and
classification
Measure of Impurity: GINI
• The Gini Index is the probability that a variable will not
be classified correctly if it was chosen randomly.
• Gini Index for a given node t with classes j
NOTE: p( j | t) is computed as the relative frequency of class j at
node t



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
3
GINI Index : Example
• Example: Two classes C1 & C2 and node t has 5 C1 and
5 C2 examples. Compute Gini(t)
• 1 – [p(C1|t) + p(C2|t)] = 1 – [(5/10)2 + [(5/10)2 ]
• 1 – [¼ + ¼] = ½.



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
4
The Gini index will always be between [0, 0.5], where 0 is
a selection that perfectly splits each class in your dataset
(pure), and 0.5 means that neither of the classes was
correctly classified (impure).
More on Gini
• Worst Gini corresponds to probabilities of 1/nc, where nc is the number of
classes.
• For 2-class problems the worst Gini will be ½
• How do we get the best Gini? Come up with an example for node t with 10
examples for classes C1 and C2
• 10 C1 and 0 C2
• Now what is the Gini?
• 1 – [(10/10)2 + (0/10)2 = 1 – [1 + 0] = 0
• So 0 is the best Gini
• So for 2-class problems:
• Gini varies from 0 (best) to ½ (worst).
5
Some More Examples
• Below we see the Gini values for 4 nodes with
different distributions. They are ordered from best to
worst. See next slide for details
• Note that thus far we are only computing GINI for one node.
We need to compute it for a split and then compute the
change in Gini from the parent node.
C1 0
C2 6
Gini=0.000
C1 2
C2 4
Gini=0.444
C1 3
C2 3
Gini=0.500
C1 1
C2 5
Gini=0.278
6
Examples for computing GINI
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0



j
t
j
p
t
GINI 2
)]
|
(
[
1
)
(
P(C1) = 1/6 P(C2) = 5/6
Gini = 1 – (1/6)2 – (5/6)2 = 0.278
P(C1) = 2/6 P(C2) = 4/6
Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Examples for Computing Error
C1 0
C2 6
C1 2
C2 4
C1 1
C2 5
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Error = 1 – max (0, 1) = 1 – 1 = 0
P(C1) = 1/6 P(C2) = 5/6
Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6
P(C1) = 2/6 P(C2) = 4/6
Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
)
|
(
max
1
)
( t
i
P
t
Error i


8
CSEDIU
Comparison among Splitting Criteria
For a 2-class problem:
9
CSEDIU
Example: Construct Decision tree using Gini
index
Example: Construct Decision tree using Gini
index
Therefore, attribute B will be chosen to split the node.
Gini vs Entropy
•Computationally, entropy is more complex since it makes use
of logarithms and consequently, the calculation of the Gini Index
will be faster.
• Accuracy using the entropy criterion are slightly better (not
always).
Table 11.6
Algorithm Splitting Criteria Remark
ID3 Information Gain
𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴(D)
Where
𝐸 𝐷 = Entropy of D (a
measure of uncertainty) =
− 𝑖=1
𝑘
𝑝𝑖 log 2𝑝𝑖
where D is with set of k classes
𝑐1, 𝑐2, … , 𝑐𝑘 and 𝑝𝑖 =
|𝐶𝑖,𝐷|
|𝐷|
;
Here, 𝐶𝑖,𝐷 is the set of tuples
with class 𝑐𝑖 in D.
𝐸𝐴 (D) = Weighted average
entropy when D is partitioned
on the values of attribute A =
𝑗=1
𝑚 |𝐷𝑗|
|𝐷|
𝐸(𝐷𝑗)
Here, m denotes the distinct
values of attribute A.
• The algorithm calculates
𝛼(𝐴𝑖,D) for all 𝐴𝑖 in D
and choose that attribute
which has maximum
𝛼(𝐴𝑖,D).
• The algorithm can handle
both categorical and
numerical attributes.
• It favors splitting those
attributes, which has a
large number of distinct
values.
13
Algorithm Splitting Criteria Remark
CART Gini Index
𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴(D)
where
𝐺 𝐷 = Gini index (a measure of
impurity)
= 1 − 𝑖=1
𝑘
𝑝𝑖
2
Here, 𝑝𝑖 =
|𝐶𝑖,𝐷|
|𝐷|
and D is with k
number of classes and
GA(D) =
|𝐷1|
|𝐷|
𝐺(𝐷1) +
|𝐷2|
|𝐷|
𝐺(𝐷2),
when D is partitioned into two
data sets 𝐷1 and 𝐷2 based on
some values of attribute A.
• The algorithm calculates
all binary partitions for
all possible values of
attribute A and choose
that binary partition
which has the maximum
𝛾 𝐴, 𝐷 .
• The algorithm is
computationally very
expensive when the
attribute A has a large
number of values.
14
Algorithm Splitting Criteria Remark
C4.5 Gain Ratio
𝛽 𝐴, 𝐷 =
𝛼 𝐴, 𝐷
𝐸𝐴
∗
(D)
where
𝛼 𝐴, 𝐷 = Information gain of
D (same as in ID3, and
𝐸𝐴
∗
(D) = splitting information
= − 𝑗=1
𝑚 |𝐷𝑗|
|𝐷|
𝑙𝑜𝑔2
|𝐷𝑗|
|𝐷|
when D is partitioned into 𝐷1,
𝐷2, … , 𝐷𝑚 partitions
corresponding to m distinct
attribute values of A.
• The attribute A with
maximum value of
𝛽 𝐴, 𝐷 is selected for
splitting.
• Splitting information is a
kind of normalization, so
that it can check the
biasness of information
gain towards the
choosing attributes with a
large number of distinct
values.
In addition to this, we also highlight few important characteristics
of decision tree induction algorithms in the following.
15
1 de 15

Recomendados

BAS 250 Lecture 8 por
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8Wake Tech BAS
367 vistas44 diapositivas
ID3 Algorithm & ROC Analysis por
ID3 Algorithm & ROC AnalysisID3 Algorithm & ROC Analysis
ID3 Algorithm & ROC AnalysisTalha Kabakus
6.3K vistas51 diapositivas
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts por
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsData Mining:Concepts and Techniques, Chapter 8. Classification: Basic Concepts
Data Mining:Concepts and Techniques, Chapter 8. Classification: Basic ConceptsSalah Amean
30.9K vistas81 diapositivas
Decision Tree por
Decision Tree Decision Tree
Decision Tree Konkuk University, Korea
130 vistas20 diapositivas
Decision tree of cart por
Decision tree of cartDecision tree of cart
Decision tree of cartkalung0313
202 vistas20 diapositivas
Random forest por
Random forestRandom forest
Random forestMusa Hawamdah
53.5K vistas23 diapositivas

Más contenido relacionado

Similar a ML Decision Tree_2.pptx

Chapter 8. Classification Basic Concepts.ppt por
Chapter 8. Classification Basic Concepts.pptChapter 8. Classification Basic Concepts.ppt
Chapter 8. Classification Basic Concepts.pptSubrata Kumer Paul
76 vistas81 diapositivas
08ClassBasic.ppt por
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.pptGauravWani20
1 vista105 diapositivas
08ClassBasic.ppt por
08ClassBasic.ppt08ClassBasic.ppt
08ClassBasic.pptharsh708944
18 vistas81 diapositivas
Basics of Classification.ppt por
Basics of Classification.pptBasics of Classification.ppt
Basics of Classification.pptNBACriteria2SICET
4 vistas81 diapositivas
[系列活動] Data exploration with modern R por
[系列活動] Data exploration with modern R[系列活動] Data exploration with modern R
[系列活動] Data exploration with modern R台灣資料科學年會
4.7K vistas61 diapositivas
08ClassBasic VT.ppt por
08ClassBasic VT.ppt08ClassBasic VT.ppt
08ClassBasic VT.pptGaneshaAdhik
7 vistas42 diapositivas

Similar a ML Decision Tree_2.pptx(20)

Asymptotic analysis por Soujanya V
Asymptotic analysisAsymptotic analysis
Asymptotic analysis
Soujanya V3.8K vistas
Data Mining Concepts and Techniques.ppt por Rvishnupriya2
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya217 vistas
Data Mining Concepts and Techniques.ppt por Rvishnupriya2
Data Mining Concepts and Techniques.pptData Mining Concepts and Techniques.ppt
Data Mining Concepts and Techniques.ppt
Rvishnupriya228 vistas
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro... por Chiheb Ben Hammouda
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
Hierarchical Deterministic Quadrature Methods for Option Pricing under the Ro...
unit classification.pptx por ssuser908de6
unit  classification.pptxunit  classification.pptx
unit classification.pptx
ssuser908de611 vistas
11 Machine Learning Important Issues in Machine Learning por Andres Mendez-Vazquez
11 Machine Learning Important Issues in Machine Learning11 Machine Learning Important Issues in Machine Learning
11 Machine Learning Important Issues in Machine Learning
Pair-Atomic Resolution-of-the-Identity por patrime
Pair-Atomic Resolution-of-the-IdentityPair-Atomic Resolution-of-the-Identity
Pair-Atomic Resolution-of-the-Identity
patrime418 vistas
Multivalued Subsets Under Information Theory por Indraneel Dabhade
Multivalued Subsets Under Information TheoryMultivalued Subsets Under Information Theory
Multivalued Subsets Under Information Theory
Indraneel Dabhade28 vistas

Último

PRIVACY AWRE PERSONAL DATA STORAGE por
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGEantony420421
7 vistas56 diapositivas
Product Research sample.pdf por
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdfAllenSingson
33 vistas29 diapositivas
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx por
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptxDataScienceConferenc1
6 vistas21 diapositivas
shivam tiwari.pptx por
shivam tiwari.pptxshivam tiwari.pptx
shivam tiwari.pptxAanyaMishra4
6 vistas14 diapositivas
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf por
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf10urkyr34
7 vistas259 diapositivas
VoxelNet por
VoxelNetVoxelNet
VoxelNettaeseon ryu
17 vistas21 diapositivas

Último(20)

PRIVACY AWRE PERSONAL DATA STORAGE por antony420421
PRIVACY AWRE PERSONAL DATA STORAGEPRIVACY AWRE PERSONAL DATA STORAGE
PRIVACY AWRE PERSONAL DATA STORAGE
antony4204217 vistas
Product Research sample.pdf por AllenSingson
Product Research sample.pdfProduct Research sample.pdf
Product Research sample.pdf
AllenSingson33 vistas
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx por DataScienceConferenc1
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
[DSC Europe 23] Ivan Dundovic - How To Treat Your Data As A Product.pptx
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf por 10urkyr34
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
6498-Butun_Beyinli_Cocuq-Daniel_J.Siegel-Tina_Payne_Bryson-2011-259s.pdf
10urkyr347 vistas
K-Drama Recommendation Using Python por FridaPutriassa
K-Drama Recommendation Using PythonK-Drama Recommendation Using Python
K-Drama Recommendation Using Python
FridaPutriassa5 vistas
LIVE OAK MEMORIAL PARK.pptx por ms2332always
LIVE OAK MEMORIAL PARK.pptxLIVE OAK MEMORIAL PARK.pptx
LIVE OAK MEMORIAL PARK.pptx
ms2332always7 vistas
DGST Methodology Presentation.pdf por maddierlegum
DGST Methodology Presentation.pdfDGST Methodology Presentation.pdf
DGST Methodology Presentation.pdf
maddierlegum7 vistas
Ukraine Infographic_22NOV2023_v2.pdf por AnastosiyaGurin
Ukraine Infographic_22NOV2023_v2.pdfUkraine Infographic_22NOV2023_v2.pdf
Ukraine Infographic_22NOV2023_v2.pdf
AnastosiyaGurin1.4K vistas
Data about the sector workshop por info828217
Data about the sector workshopData about the sector workshop
Data about the sector workshop
info82821729 vistas
CRIJ4385_Death Penalty_F23.pptx por yvettemm100
CRIJ4385_Death Penalty_F23.pptxCRIJ4385_Death Penalty_F23.pptx
CRIJ4385_Death Penalty_F23.pptx
yvettemm1007 vistas
Customer Data Cleansing Project.pptx por Nat O
Customer Data Cleansing Project.pptxCustomer Data Cleansing Project.pptx
Customer Data Cleansing Project.pptx
Nat O6 vistas
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f... por DataScienceConferenc1
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23] Matteo Molteni - Implementing a Robust CI Workflow with dbt f...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P... por DataScienceConferenc1
[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...[DSC Europe 23][AI:CSI]  Dragan Pleskonjic - AI Impact on Cybersecurity and P...
[DSC Europe 23][AI:CSI] Dragan Pleskonjic - AI Impact on Cybersecurity and P...
Listed Instruments Survey 2022.pptx por secretariat4
Listed Instruments Survey  2022.pptxListed Instruments Survey  2022.pptx
Listed Instruments Survey 2022.pptx
secretariat493 vistas

ML Decision Tree_2.pptx

  • 1. BCSE209L Machine Learning Module- : Decision Trees Dr. R. Jothi
  • 2. Decision Tree Induction Algorithms ■ ID3 ■ Can handle both numerical and categorical features ■ Feature selection – Entropy ■ CART (continuous features and continuous label) ■ Can handle both numerical and categorical features ■ Feature selection – Gini ■ Generally used for both regression and classification
  • 3. Measure of Impurity: GINI • The Gini Index is the probability that a variable will not be classified correctly if it was chosen randomly. • Gini Index for a given node t with classes j NOTE: p( j | t) is computed as the relative frequency of class j at node t    j t j p t GINI 2 )] | ( [ 1 ) ( 3
  • 4. GINI Index : Example • Example: Two classes C1 & C2 and node t has 5 C1 and 5 C2 examples. Compute Gini(t) • 1 – [p(C1|t) + p(C2|t)] = 1 – [(5/10)2 + [(5/10)2 ] • 1 – [¼ + ¼] = ½.    j t j p t GINI 2 )] | ( [ 1 ) ( 4 The Gini index will always be between [0, 0.5], where 0 is a selection that perfectly splits each class in your dataset (pure), and 0.5 means that neither of the classes was correctly classified (impure).
  • 5. More on Gini • Worst Gini corresponds to probabilities of 1/nc, where nc is the number of classes. • For 2-class problems the worst Gini will be ½ • How do we get the best Gini? Come up with an example for node t with 10 examples for classes C1 and C2 • 10 C1 and 0 C2 • Now what is the Gini? • 1 – [(10/10)2 + (0/10)2 = 1 – [1 + 0] = 0 • So 0 is the best Gini • So for 2-class problems: • Gini varies from 0 (best) to ½ (worst). 5
  • 6. Some More Examples • Below we see the Gini values for 4 nodes with different distributions. They are ordered from best to worst. See next slide for details • Note that thus far we are only computing GINI for one node. We need to compute it for a split and then compute the change in Gini from the parent node. C1 0 C2 6 Gini=0.000 C1 2 C2 4 Gini=0.444 C1 3 C2 3 Gini=0.500 C1 1 C2 5 Gini=0.278 6
  • 7. Examples for computing GINI C1 0 C2 6 C1 2 C2 4 C1 1 C2 5 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0    j t j p t GINI 2 )] | ( [ 1 ) ( P(C1) = 1/6 P(C2) = 5/6 Gini = 1 – (1/6)2 – (5/6)2 = 0.278 P(C1) = 2/6 P(C2) = 4/6 Gini = 1 – (2/6)2 – (4/6)2 = 0.444
  • 8. Examples for Computing Error C1 0 C2 6 C1 2 C2 4 C1 1 C2 5 P(C1) = 0/6 = 0 P(C2) = 6/6 = 1 Error = 1 – max (0, 1) = 1 – 1 = 0 P(C1) = 1/6 P(C2) = 5/6 Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6 P(C1) = 2/6 P(C2) = 4/6 Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3 ) | ( max 1 ) ( t i P t Error i   8 CSEDIU
  • 9. Comparison among Splitting Criteria For a 2-class problem: 9 CSEDIU
  • 10. Example: Construct Decision tree using Gini index
  • 11. Example: Construct Decision tree using Gini index Therefore, attribute B will be chosen to split the node.
  • 12. Gini vs Entropy •Computationally, entropy is more complex since it makes use of logarithms and consequently, the calculation of the Gini Index will be faster. • Accuracy using the entropy criterion are slightly better (not always).
  • 13. Table 11.6 Algorithm Splitting Criteria Remark ID3 Information Gain 𝛼 𝐴, 𝐷 = 𝐸 𝐷 − 𝐸𝐴(D) Where 𝐸 𝐷 = Entropy of D (a measure of uncertainty) = − 𝑖=1 𝑘 𝑝𝑖 log 2𝑝𝑖 where D is with set of k classes 𝑐1, 𝑐2, … , 𝑐𝑘 and 𝑝𝑖 = |𝐶𝑖,𝐷| |𝐷| ; Here, 𝐶𝑖,𝐷 is the set of tuples with class 𝑐𝑖 in D. 𝐸𝐴 (D) = Weighted average entropy when D is partitioned on the values of attribute A = 𝑗=1 𝑚 |𝐷𝑗| |𝐷| 𝐸(𝐷𝑗) Here, m denotes the distinct values of attribute A. • The algorithm calculates 𝛼(𝐴𝑖,D) for all 𝐴𝑖 in D and choose that attribute which has maximum 𝛼(𝐴𝑖,D). • The algorithm can handle both categorical and numerical attributes. • It favors splitting those attributes, which has a large number of distinct values. 13
  • 14. Algorithm Splitting Criteria Remark CART Gini Index 𝛾 𝐴, 𝐷 = 𝐺 𝐷 − 𝐺𝐴(D) where 𝐺 𝐷 = Gini index (a measure of impurity) = 1 − 𝑖=1 𝑘 𝑝𝑖 2 Here, 𝑝𝑖 = |𝐶𝑖,𝐷| |𝐷| and D is with k number of classes and GA(D) = |𝐷1| |𝐷| 𝐺(𝐷1) + |𝐷2| |𝐷| 𝐺(𝐷2), when D is partitioned into two data sets 𝐷1 and 𝐷2 based on some values of attribute A. • The algorithm calculates all binary partitions for all possible values of attribute A and choose that binary partition which has the maximum 𝛾 𝐴, 𝐷 . • The algorithm is computationally very expensive when the attribute A has a large number of values. 14
  • 15. Algorithm Splitting Criteria Remark C4.5 Gain Ratio 𝛽 𝐴, 𝐷 = 𝛼 𝐴, 𝐷 𝐸𝐴 ∗ (D) where 𝛼 𝐴, 𝐷 = Information gain of D (same as in ID3, and 𝐸𝐴 ∗ (D) = splitting information = − 𝑗=1 𝑚 |𝐷𝑗| |𝐷| 𝑙𝑜𝑔2 |𝐷𝑗| |𝐷| when D is partitioned into 𝐷1, 𝐷2, … , 𝐷𝑚 partitions corresponding to m distinct attribute values of A. • The attribute A with maximum value of 𝛽 𝐴, 𝐷 is selected for splitting. • Splitting information is a kind of normalization, so that it can check the biasness of information gain towards the choosing attributes with a large number of distinct values. In addition to this, we also highlight few important characteristics of decision tree induction algorithms in the following. 15