10 decision tree

Prof. Neeraj Bhargava
Vishal Dutt
Department of Computer Science, School of
Engineering & System Sciences
MDS University, Ajmer

Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive manner
 At start, all the training examples are at the root
 Attributes are categorical (we will talk about continuous-valued
attributes later)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
 Conditions for stopping partitioning
 All examples for a given node belong to the same class
 There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
 There are no examples left
2

Building a compact tree
 The key to building a decision tree - which attribute to
choose in order to branch.
 The heuristic is to choose the attribute with the
maximum Information Gain based on information
theory.
 Another explanation is to reduce uncertainty as much
as possible.
3

Information theory
 Information theory provides a mathematical basis
for measuring the information content.
 To understand the notion of information, think
about it as providing the answer to a question, for
example, whether a coin will come up heads.
 If one already has a good guess about the answer, then the
actual answer is less informative.
 If one already knows that the coin is rigged so that it will
come with heads with probability 0.99, then a message
(advanced information) about the actual outcome of a flip
is worth less than it would be for a honest coin.
4

Information theory (cont …)
 For a fair (honest) coin, you have no information,
and you are willing to pay more (say in terms of $)
for advanced information - less you know, the
more valuable the information.
 Information theory uses this same intuition, but
instead of measuring the value for information in
dollars, it measures information contents in bits.
One bit of information is enough to answer a
yes/no question about which one has no idea, such
as the flip of a fair coin
5

Information theory
 In general, if the possible answers vi have probabilities P(vi),
then the information content I (entropy) of the actual
answer is given by
 For example, for the tossing of a fair coin we get
 If the coin is loaded to give 99% head we get I = 0.08, and as
the probability of heads goes to 1, the information of the
actual answer goes to 0
6
I p v p vi i
i
n
 
 ( )log ( )2
1
I bit( ) log logCoin toss    
1
2
1
2
1
2
1
2
12 2

Back to decision tree learning
 For a given example, what is the correct classification?
 We may think of a decision tree as conveying
information about the classification of examples in the
table (of examples);
 The entropy measure characterizes the (im)purity of
an arbitrary collection of examples.
7

10 decision tree

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (18)

Similar a 10 decision tree

Similar a 10 decision tree (20)

Más de Vishal Dutt

Más de Vishal Dutt (20)

Último

Último (20)

10 decision tree