Professional Documents
Culture Documents
Decision Trees
LESSON 24
Learning Decision Trees
Keywords: Learning, Training Data, Axis-Parallel Decision Tree
Money
25
200
100
125
30
300
55
140
20
175
110
90
Has-exams
no
no
no
yes
yes
yes
yes
no
yes
yes
no
yes
weather
fine
hot
rainy
rainy
rainy
fine
hot
hot
fine
fine
fine
fine
Goes-to-movie
no
yes
no
no
no
yes
no
no
no
yes
yes
no
Im(n) = 12
= 0.9183
8
8
log2 12
12
We need to now consider all the three attributes for the first split and
chose the one with the most information gain.
Money
Let us divide the feature values of money into three parts < 50, between
50-150 and > 150 .
1. Money < 50, has 3 patterns belonging to goes-to-movie=no and 0 patterns belonging to goes-to-movie=yes. The entropy for money < 50 is
Im(M oney < 50) = 0
2. Money 50-150 has 5 patterns belonging to goes-to-movie=no and 1 pattern belonging to goes-to-movie=yes . Entropy for money 50-150 is
Im(M oney50 150) = 61 log2 61 56 log2 65
= 0.65
3. Money > 150 has 3 patterns belonging to goes-to-movie=yes and 0 patterns belonging to goes-to-movie=no. The entropy for money > 150 is
Im(Money>150) = 0
4. Gain(Money)
Gain(money) = 0.9183
3
12
6
12
0.65
3
12
0 = 0.5933
Has-exams
1. (Has-exams=yes)
Has a total of 7 patterns with 2 patterns belonging to goes-to-movie=yes
and five patterns belonging to goes-to-movies=no. The entropy for hasexams=yes is
Im(has exams = yes) = 27 log2 27 75 log2 57
= 0.6717
2. (Has exams=no)
Has a total of 5 patterns with 2 patterns belonging to goes-to-movie=yes
and 3 patterns belonging to goes-to-movies=no. The entropy for hasexams=no is
Im(has exams = no) = 52 log2 52 35 log2 53
= 0.9710
3. Gain for has-exams
Gain(has exams) = 0.9183
= 0.1219
7
12
0.6717
5
12
9710
Weather
1. (Weather=hot)
Has a total of 3 patterns with 1 pattern belonging to goes-to-movie=yes
and 2 patterns belonging to goes-to-movie=no. The entropy for weather=hot
is
Im(weather = hot) = 13 log2 13 32 log2 23
= 0.9183
2. (Weather=fine)
Has a total of 6 patterns with 3 patterns belonging to goes-to-movie=yes
and 3 patterns belonging to goes-to-movie=no. The entropy for weather=fine
is
3
12
0.9183
6
12
3
12
All the three attributes have been investigated and here are the gain values :
Gain(money) = 0.5933
Gain(has-exams) = 0.1219
Gain(weather) = 0.1887
Since Gain(money) has the maximimum value, money is taken as the first
attribute.
When we take money as the first decision node, the training data gets
split into three portions for money < 50, money = 50-150, and money > 150.
There are three patterns along the outcome money < 50, 6 patterns along the
outcome money = 50-150 and 3 patterns along the outcome money > 150.
We will consider each of these three branches and think of the next decision
node as a new decision tree.
M oney < 50
Out of the 3 patterns along this branch, all the patterns belong to goesto-movie=no, so this is a leaf node and need not be investigated further.
M oney > 150
Out of the 3 patterns along this branch, all the patterns belong to goesto-movie=yes, so this is a leaf node and need not be investigated further.
M oney = 50 150
This has a total of 6 patterns with 1 pattern belonging to goes-to-movie=yes
and 5 patterns belonging to goes-to-movie=no. So the information in this
branch is
Im(n) = 16 log2 16 56 log2 56
= 0.65
Now we need to check the attributes has-exams and weather to see which
is the next attribute to be chosen.
Has-exams
1. (has-exams=yes)
There are a total of 3 patterns with has-exams=yes out of the 6 patterns along this branch. Out of these 3 patterns, 3 patterns belong to
goes-to-movie=no and 0 patterns belong to goes-to-movie=yes. So the
entropy of has-exams=yes is
Im(has exams = yes) = 03 log2 03 33 log2 33
=0
2. (has-exams=no)
There are two patterns out of six which belong to weather=rainy and
both of them belong to goes-to-movie=no. The entropy for weather=rainy
is
Im(weather = rainy) = 0
4. Gain for weather is
Gain(weather) = 0.65 62 1.0 = 0.3167
The values for gain for has-exams and weather are
Gain(has-exams) = 0.1909
Gain(weather) = 0.3167
Since weather has the higher gain value, it is the attribute to be chosen.
The remaining attribute is then chosen. For weather=hot and weather=rainy,
all the patterns belong to goes-to-movie=no and therefore it is the leaf node.
Only the node weather=fine needs to be further expanded. The entire decision tree is given in the Figure 1.
The following points may be noted after going through the example :
The decision tree can be used effectively to chose among several courses
of action.
In what way the decision tree comes up with a decision can be easily
explained. Each path in the decision tree corresponds to simple rules.
At each node, the attribute chosen to make the split is the one with
the highest drop in impurity or highest increase in gain.
has money
<50
>150
50150
goes to
a movie
= false
goes to
a movie
= true
weather?
hot
goes to
a movie
= false
fine
rainy
goes to
a movie
= false
has exams
goes to
a movie
= false
goes to
a movie
= true
Assignment
1. There are four coins 1, 2, 3, 4 out of which three coins are of equal
weight and one coin is heavier. Use a decision tree to identify the
heavier coin.
2. Consider the three-class problem characterized by the training data
given in the following table. Obtain the axis-parallel decision tree for
the data.
Professor
Sam
Sam
Sam
Pam
Pam
Pam
Ram
Ram
Ram
Low
Low
High
High
Low
High
Low
High
Low
Medium
Medium
High
Low
Low
Low
Low
Medium
High
Class Label
Total No in Class
1
2
3
40
30
30
0
20
20
7. Consider the two-class problem in a two-dimensional space characterized by the following training data. Obtain an axis-parallel decision
tree for the data.
Class1 : (1, 1)t , (2, 2)t , (6, 7)t , (7, 7)t
Class2 : (6, 1)t , (6, 2)t , (7, 1)t , (7, 2)t
8. Consider the data given in problem 7. Suggest an oblique decision tree.
References
V. Susheela Devi and M. N. Murty (2011) Pattern Recognition: An
Introduction Universities Press, Hyderabad.
Buntine and T. Niblett (1992) A further comparison of splitting rules for
decision-tree induction,Machine Learning, Vol. 8, pp. 75-85.
B. Chandra and P. Paul Varghese (2009) Moving towards efficient decision tree construction, Information Science, Vol. 179, Issue 8, pp. 1059-1069.
George H. Hohn (1994) Finding multivariate splits in decision trees using
function optimization, Proceedings, AAAI.
Esmeir, Markovitch (2008) Anytime induction of low-cost, low-error classifiers : a sampling-based approach, JAIR, Vol. 33, pp1-31.
Nunez, (1991) The use of background knowledge in decision tree induction,
Machine Learning, Vol. 6, pp. 231-250. Sreerama K. Murthy, Simon
Kasif and Steven Salzberg (1994) A system for induction of oblique decision trees, Journal of Artificial Intelligence Research, Vol. 2, pp. 1-32.
Olcay Taner Yildiz and Onur Dikmen (2007) Parallel univariate decision trees, Pattern Recognition Letters, Vol. 28, Issue 7, pp. 825-832.
Peter D. Turney (1995) Cost-sensitive classification : Empirical evaluation
of a hybrid genetic decision tree induction algorithm, JAIR, Vol. 2, pp. 36911
409.
Yen-Liang Chen, Chia-Chi Wu and Kwei Tang (2009) Building a costconstrained decision tree with multiple condition attributes, Information Science, Vol. 179, Issue 7, pp. 967-979.
J.R. Quinlan, (1986) Induction of decision trees, Machine Learning, Vol.
1, pp.81-106.
J.R. Quinlan (1992) C4.5-Programs for Machine Learning, San Mateo CA:
Morgan Kaufmann.
J.R. Quinlan (1996) Improved use of continuous attributes in C4.5, Journal
of Artificial Intelligence Research, Vol. 4, pp.77-90.
12